Exercise 3

Exercise 3: Profiling queries (Preparations)


Before you begin this exercise, make sure that you have loaded the flights_2019_1k data in the previous exercise.

This is what you will see when you click the database icon database icon.

AfterExercise2

If your database does not have the same nodes and relationships, you can execute these Cypher statements to reset your graph to what it must be before you start this exercise:

MATCH (n) DETACH DELETE n;
MERGE (a1:Airport {code: 'LAS'})
MERGE (a2:Airport {code:'LAX'})
MERGE (a3:Airport {code:'ABQ'})
MERGE (a1)-[:CONNECTED_TO {airline:'WN',flightNumber:'82',date:'2019-1-3',departure:'1715',arrival:'1820'}]->(a2)
MERGE (a1)-[:CONNECTED_TO {airline:'WN',flightNumber:'500',date:'2019-1-3',departure:'1445',arrival:'1710'}]->(a3);
LOAD CSV WITH HEADERS FROM 'https://r.neo4j.com/flights_2019_1k' AS row
MERGE (origin:Airport {code: row.Origin})
MERGE (destination:Airport {code: row.Dest})
MERGE (origin)-[connection:CONNECTED_TO {
  airline: row.UniqueCarrier,
  flightNumber: row.FlightNum,
  date: toInteger(row.Year) + '-' + toInteger(row.Month) + '-' + toInteger(row.DayofMonth)}]->(destination)
ON CREATE SET connection.departure = toInteger(row.CRSDepTime), connection.arrival = toInteger(row.CRSArrTime)

Note: You must use the "file:///" syntax for loading the data from the import folder.

Exercise 3: Profile queries (Overview)


As part of our modeling workflow, we must analyze queries for our domain and understand their performance with a set of data. We currently have data for 1000 connections between airports (flights) and we want to see the cost our queries against the existing data. Later in these exercises, you will add more data to the graph, but for now we will analyze queries for the existing data.

Note: The PROFILE numbers you see in this browser guide may differ slightly from what you see on your system, because of differences in versions of Neo4j.

In this exercise you will profile and understand the behavior of a couple of queries:

  • Exercise 3.1: What is the cost of finding all flights that land in Las Vegas (LAS)?

  • Exercise 3.2: What is the cost of finding all flights for airline 'WN' with flight number '1016'?

Go to the next page to start this exercise.

Exercise 3.1: What is the cost of finding all flights that land in Las Vegas (LAS)? (Instructions)


Write and execute the Cypher code to profile the query to retrieve all flights that land in Las Vegas (LAS)?

Exercise 3.1: What is the cost of finding all flights that land in Las Vegas (LAS)? (Solution)


Write and execute the Cypher code to profile the query to retrieve all flights that land in Las Vegas (LAS)?

PROFILE
MATCH (origin:Airport)-
[c:CONNECTED_TO]->(destination:Airport)
WHERE destination.code = 'LAS'
RETURN origin, destination, c

The result returned will be:

Ex3_Profile1

Contract and expand each section of the profile shown. One thing that we could do to reduce the number of db hits is to create an index on Airport.code and the current query is reading all Airport node labels to find the airport for LAS. This query does return 13 nodes and 268 relationships so getting the number of db hits lower than 223 (285 - 62) is probably all we can do with this model.

Exercise 3.2: What is the cost of finding all flights for airline 'WN' with flight number '1016'? (Instructions)


Write and execute the Cypher code to profile the query to retrieve all flights for airline 'WN' with flight number '1016'?

Exercise 3.2: What is the cost of finding all flights for airline 'WN' with flight number '1016'? (Solution)


Write and execute the Cypher code to profile the query to retrieve all flights for airline 'WN' with flight number '1016'?

PROFILE
MATCH  (origin:Airport)-[connection:CONNECTED_TO]->(destination:Airport)
WHERE connection.airline = 'WN' AND connection.flightNumber = '1016'
RETURN origin.code, destination.code, connection.date, connection.departure, connection.arrival

The result returned will be:

Ex3_Profile2

Again, creating an index on Airport.code will reduce the number of db hits by 62. The query has to read all data for all CONNECTED_TO relationships to find the correct flights. The number of db hits of 5379 is too high. What we are going to do is change (refactor) our current model so that we do not have to visit all CONNECTED_TO relationships to satisfy the query. We will do this by creating a Flight node from a relationship.

Exercise 3: Profiling queries (Summary)


In this exercise, you profiled the performance of a couple of queries and found that we need to at a minimum:

  • Add an index to the Airport.code property.

  • Model a flight as a Flight node.