Flight as a first class citizen

Flight as a first class citizen


In this section we’re going to profile the query from the end of the last section and see what’s going on under the covers.

We’ll then refactor the model to introduce Flight as a first class citizen in our domain to make it easier to write queries centered around flights.

Before we profile our query a quick explanation:

slides

Profiling flight WN 1016


We can profile our query by prefixing it with the PROFILE keyword:

PROFILE
MATCH  (origin:Airport)-[connection:CONNECTED_TO]->(destination:Airport)
WHERE connection.airline = "WN" AND connection.flightNumber = "1016"
RETURN origin.code, destination.code, connection.date, connection.departure, connection.arrival

What we get back is an execution plan which describes the Cypher operators used to execute this query. You can read more about these in the developer manual

In this one the query starts with a NodeByLabelScan on the :Airport label, which means that we first scanned all the airports. Next we followed the FLIGHT relationship to origin airports, and we can see from the estimated rows count that we followed 1,000 of these.

That’s a lot of flights!

Scanning all the flights


In fact we actually looked at every single flight, which we can confirm by executing the following query:

MATCH ()-[:CONNECTED_TO]->()
RETURN COUNT(*)

So it’s clear that our model isn’t optimal - we’re doing far too much work just to find the destinations and origins of one flight.

It’s time to refactor the model.

Introducing flight as a first class citizen


Having convinced ourselves that our current model is quite inefficient, we’re going to refactor the model.

Let’s quick look at how we’re going to evolve the model:

slides

Ensuring flight uniqueness


When we refactor the model we want to make sure we only create each flight once.

Neo4j allows us to create unique constraints to ensure uniqueness across a label/property pair, but at the moment we can only create constraints on single properties. We want to ensure uniqueness across several properties so we’ll combine those together into a single dummy property.

The combination of airline, flight number, and date makes a flight unique. As we saw in the previous section, however, some flights can have multiple legs so we’ll need to consider departure and arrival airports as well. We’ll create a flightId with this format: {airline}{flightNumber}{year}-{month}-{day}_{origin}_{destination}

Run the following query to create a unique constraint on the Flight/id label/property pair:

CREATE CONSTRAINT ON (f:Flight)
ASSERT f.id IS UNIQUE

Ok, now let the refactoring begin.

Creating flights


We’re now ready to create Flight nodes. Remember this is the model we’re going to end up at:

flight first class

Creating flights


Run the following query to create Flight nodes for every CONNECTED_TO relationship:

MATCH (origin:Airport)-[connection:CONNECTED_TO]->(destination:Airport)
MERGE (newFlight:Flight { id: connection.airline + connection.flightNumber + "_" + connection.date +  "_" + origin.code + "_" + destination.code }   )
ON CREATE SET newFlight.date = connection.date,
              newFlight.airline = connection.airline,
              newFlight.number = connection.flightNumber,
              newFlight.departure = connection.departure,
              newFlight.arrival = connection.arrival
MERGE (origin)<-[:ORIGIN]-(newFlight)
MERGE (newFlight)-[:DESTINATION]->(destination)

This query:

  • finds all (origin, connection, destination) paths

  • creates a Flight node if one doesn’t already exist

  • creates an ORIGIN relationship to the origin airport and a DESTINATION relationship to the destination airport

Now we can try our earlier query again.

Find all the flights for flight number WN 1016


First let’s create an index on (Flight, number) so that we can quickly find the appropriate flights.

CREATE INDEX ON :Flight(number)

Now we’re ready to find those flights!

MATCH (origin)<-[:ORIGIN]-(flight:Flight)-[:DESTINATION]->(destination)
WHERE flight.airline = "WN" AND flight.number = "1016"
RETURN origin, destination, flight

Before we delete the CONNECTED_TO relationship we should profile the two versions of the query to see whether our refactoring has improved things.

Exercise: Profiling the two flight models


Run the following query to profile the new version of the query to see if we’ve improved things:

PROFILE
MATCH (origin)<-[:ORIGIN]-(flight:Flight)-[:DESTINATION]->(destination)
WHERE flight.airline = "WN" AND flight.number = "1016"
RETURN origin, destination, flight

For reference, you can re-run the profile on our other query as well:

PROFILE
MATCH (origin:Airport)-[flight:CONNECTED_TO]->(destination:Airport)
WHERE flight.airline = "WN" AND flight.flightNumber = "1016"
RETURN origin, destination, flight

What do you notice?

Answer: Profiling the two flight models


It looks like a good refactoring - we can see by comparing the db hits that our new query does much less work (6 db hits vs. 1,062 db hits).

This is because we no longer have to scan every connection looking for ones which have an airline of WN and a number of 1016. Instead we’re able to use the :Flight(number) index to find the flights we’re interested in and only determine the origins and destinations for those flights.

Deleting the CONNECTED_TO relationship


Since the CONNECTED_TO relationship between airports doesn’t seem to be much use anymore let’s delete it.

Run the following query:

MATCH ()-[connection:CONNECTED_TO]->()
DELETE connection

Next Step


In the next section we’re going to write some more queries against the dataset, but this time with a different user in mind.