Flight booking

Flight booking


In this section we’re going to write some queries from the perspective of a frequent traveller trying to find flights to book.

slides

Flight booking


We’ve got a new requirement to satisfy:

As a frequent traveller
I want to find flights from <origin> to <destination> on <date>
So that I can book my business flight

Before we write queries to satisfy this requirement, let’s import some more data.

Import more flights


We initially loaded 1,000 flights. That was a fun initial dataset to play with, but now that we’ve got a better model we’re happy with a bit more data.

Just before importing, let’s create an index for our Airport nodes:

CREATE INDEX ON :Airport(code)

flights_10k.csv contains 10,000 flights. We can run the following query to import those flights:

LOAD CSV WITH HEADERS FROM "file:///flights_10k.csv" AS row
MERGE (origin:Airport {code: row.Origin})
MERGE (destination:Airport {code: row.Dest})
MERGE (newFlight:Flight { id: row.UniqueCarrier + row.FlightNum + "_" + row.Year + "-" + row.Month + "-" + row.DayofMonth + "_" + row.Origin + "_" + row.Dest }   )
ON CREATE SET newFlight.date = toInteger(row.Year) + "-" + toInteger(row.Month) + "-" + toInteger(row.DayofMonth),
              newFlight.airline = row.UniqueCarrier,
              newFlight.number = row.FlightNum,
              newFlight.departure = toInteger(row.CRSDepTime),
              newFlight.arrival = toInteger(row.CRSArrTime)
MERGE (newFlight)-[:ORIGIN]->(origin)
MERGE (newFlight)-[:DESTINATION]->(destination)

Now it’s time to write a query to find available flights between two airports on a specific date.

Finding flights to book


Let’s find all the flights going from Los Angeles (LAS) to Chicago Midway International (MDW) on the 3rd January. Run the following query:

MATCH path = (origin:Airport {code: "LAS"})<-[:ORIGIN]-(flight:Flight)-[:DESTINATION]->(destination:Airport {code: "MDW"})
WHERE flight.date = "2008-1-3"
RETURN path

This returns quite quickly but try prefixing it with PROFILE. What do you notice?

Profiling the finding flights to book query


The query starts by using an index to find MDW but then has to traverse all incoming DESTINATION relationships and check the date property on the :Flight nodes on the other side. The more flights an airport has the more we’ll have to scan through, and since we’re only working with 10,000 flights we should probably find a better way to model our data before importing any more rows.

Can you think of a way that we can change our model to avoid doing all these property lookups?

Bundling relationships


One way that we can tweak our model to be more aligned with our queries is by bundling flights by day.

slides

Exercise: Introducing Airport Day


We want to introduce :AirportDay nodes so that we don’t have to scan through all the flights going from an airport when we’re only interested in a subset of them.

Try and write a query to evolve our current model to include this new concept.

airport day

Note We’ll have one :AirportDay node for each airport so keep that in mind when writing the refactoring query.

Click through for the answers


If you really want to see them…​

Answer: Introducing Airport Day


Before we create anything let’s put a unique constraint on :AirportDay so we don’t create any duplicates:

CREATE CONSTRAINT ON (airportDay:AirportDay)
ASSERT airportDay.id IS UNIQUE

We’ll use the combination of origin and the flight date as our unique key for an :AirportDay:

MATCH (origin:Airport)<-[:ORIGIN]-(flight:Flight)-[:DESTINATION]->(destination:Airport)
MERGE (originAirportDay:AirportDay {id: origin.code + "_" + flight.date})
SET originAirportDay.date = flight.date
MERGE (destinationAirportDay:AirportDay {id: destination.code + "_" + flight.date})
SET destinationAirportDay.date = flight.date
MERGE (origin)-[:HAS_DAY]->(originAirportDay)
MERGE (flight)-[:ORIGIN]->(originAirportDay)

MERGE (flight)-[:DESTINATION]->(destinationAirportDay)
MERGE (destination)-[:HAS_DAY]->(destinationAirportDay)

Find flights to book


Now let’s try finding those flights between Los Angeles and Chicago Midway International again. To recap, this was our original query:

MATCH path = (origin:Airport {code: "LAS"})<-[:ORIGIN]-(flight:Flight)-[:DESTINATION]->(destination:Airport {code: "MDW"})
WHERE flight.date = "2008-1-3"
RETURN path

This is the equivalent query which makes use of :AirportDay:

MATCH (origin:Airport {code: "LAS"})-[:HAS_DAY]->(:AirportDay {date: "2008-1-3"})<-[:ORIGIN]-(flight:Flight),
      (flight)-[:DESTINATION]->(:AirportDay {date: "2008-1-3"})<-[:HAS_DAY]-(destination:Airport {code: "MDW"})
RETURN *

Try profiling the queries. What do you notice?

Modeling guidelines


Before we move onto the next guide we’re going to go over some modeling guidelines which you can apply when working with graphs.

slides

Next Step


In the next stage we’re going to look at specific relationship types.