Tutorial: Import Relational Data Into Neo4j
This guide will teach you the process for exporting data from a relational database (PostgreSQL) and importing into a graph database (Neo4j). You will learn how to take data from the relational system and to the graph by translating the schema and using import tools. This example uses a specific data set, but the principles in this guide can be applied and reused with any data domain.
You should have a basic understanding of the property graph model and know how to model data as a graph. If you haven’t already, download and install Neo4j Desktop, so you can follow along with the examples.
Beginner
About the Data Domain
In this guide, we will be using the NorthWind dataset, an often-used SQL dataset. This data depicts a product sale system - storing and tracking customers, products, customer orders, warehouse stock, shipping, suppliers, and even employees and their sales territories. Although the NorthWind dataset is often used to demonstrate SQL and relational databases, the data also can be structured as a graph.
An entity-relationship diagram (ERD) of the Northwind dataset is shown below.

First, this is a rather large and detailed model. We can scale this down a bit for our example and choose the entities that are most critical for our graph - in other words, those that might benefit most from seeing the connections. For our use case, we really want to optimize the relationships with orders - what products were involved (with the categories and suppliers for those products), which employees worked on them and those employees' managers.
Using these business requirements, we can narrow our model down to these essential entities.

Developing a Graph Model
The first thing we will need to do to get data from a relational database into a graph is to translate the relational data model to a graph data model. Determining how we want to structure tables and rows as nodes and relationships may vary depending on what is most important to your business needs.
For more information on adapting your graph model to different scenarios, check out our modeling designs guide. |
When deriving a graph model from a relational model, we should keep a couple of general guidelines in mind.
-
A row is a node
-
A table name is a label name
-
A join or foreign key is a relationship
With these principles in mind, we can map our relational model to a graph with the following steps:
-
Each row on our
Orders
table becomes a node in our graph withOrder
as the label. -
Each row on our
Products
table becomes a node withProduct
as the label. -
Each row on our
Suppliers
table becomes a node withSupplier
as the label. -
Each row on our
Categories
table becomes a node withCategory
as the label. -
Each row on our
Employees
table becomes a node withEmployee
as the label.
-
Join between
Suppliers
andProducts
becomes a relationship namedSUPPLIES
(where supplier supplies product). -
Join between
Products
andCategories
becomes a relationship namedPART_OF
(where product is part of a category). -
Join between
Employees
andOrders
becomes a relationship namedSOLD
(where employee sold an order). -
Join between
Employees
and itself (unary relationship) becomes a relationship namedREPORTS_TO
(where employees have a manager). -
Join with join table (
Order Details
) betweenOrders
andProducts
becomes a relationship namedCONTAINS
with properties ofunitPrice
,quantity
, anddiscount
(where order contains a product).
If we draw our translation out on the whiteboard, we have this graph data model.

Now, we can, of course, decide that we want to include the rest of the entities from our relational model, but for now, we will keep to this smaller graph model.
How does the Graph Model Differ from the Relational Model?
-
There are no nulls. Non-existing value entries (properties) are just not present.
-
It describes the relationships in more detail. For example, we know that an employee SOLD an order rather than having a foreign key relationship between the Orders and Employees tables. We could also choose to add more metadata about that relationship, should we wish.
-
Either model can be more normalized. For example, addresses have been denormalized in several of the tables, but could have been in a separate table. In a future version of our graph model, we might also choose to separate addresses from the
Order
(orSupplier
orEmployee
) entities and create separateAddress
nodes.
Exporting Relational Tables to CSV
Thankfully, this step has already been done for us with the Northwind data. However, if you are working with another data domain, you will need to take the data from the relational tables and put it in another format for loading to the graph. A common format that many systems can handle a flat file of comma-separated values (CSV), so let’s see how to export relational tables from a PostgreSQL database to CSV files for us to create our graph.
The PostgreSQL 'copy' command lets us execute a SQL query and write the result to a CSV file. We can assemble a short .sql script of these copy commands, as shown below.
COPY (SELECT * FROM customers) TO '/tmp/customers.csv' WITH CSV header;
COPY (SELECT * FROM suppliers) TO '/tmp/suppliers.csv' WITH CSV header;
COPY (SELECT * FROM products) TO '/tmp/products.csv' WITH CSV header;
COPY (SELECT * FROM employees) TO '/tmp/employees.csv' WITH CSV header;
COPY (SELECT * FROM categories) TO '/tmp/categories.csv' WITH CSV header;
COPY (SELECT * FROM orders
LEFT OUTER JOIN order_details ON order_details.OrderID = orders.OrderID) TO '/tmp/orders.csv' WITH CSV header;
Then, we can run that script against our northwind database with the command psql -d northwind < export_csv.sql
, and it will create the individual CSV files listed in our script.
Importing the Data using Cypher
After we have exported our data from PostgreSQL, we will use Cypher’s {cyphermanual}/clauses/load-csv/[LOAD CSV^] command to transform the contents of the CSV file into a graph structure.
First, we will likely want to place our CSV files in an easily-accessed directory.
With Neo4j Desktop, we can place them in the local database import directory (detailed instructions found in our desktop import guide).
This way, we can use the file:///
prefix in our Cypher statements to locate the files.
We can also place the files in another local or remote directory (supports HTTPS, HTTP, and FTP) and specify the full path in our Cypher statements.
Since we are using Neo4j Desktop in this example, we will use the import folder for the database and the path for our CSV files can start with the file:///
prefix.
Now that we have our files where we can access them easily, we can use Cypher’s LOAD CSV
command to read each file and add Cypher statements after it to take the row/column data and transform it to the graph.
The entire Cypher script is available on Github for you to copy and run, but we will step through each section below to explain what each piece of the script is doing.
// Create orders
LOAD CSV WITH HEADERS FROM 'file:///orders.csv' AS row
MERGE (order:Order {orderID: row.OrderID})
ON CREATE SET order.shipName = row.ShipName;
// Create products
LOAD CSV WITH HEADERS FROM 'file:///products.csv' AS row
MERGE (product:Product {productID: row.ProductID})
ON CREATE SET product.productName = row.ProductName, product.unitPrice = toFloat(row.UnitPrice);
// Create suppliers
LOAD CSV WITH HEADERS FROM 'file:///suppliers.csv' AS row
MERGE (supplier:Supplier {supplierID: row.SupplierID})
ON CREATE SET supplier.companyName = row.CompanyName;
// Create employees
LOAD CSV WITH HEADERS FROM 'file:///employees.csv' AS row
MERGE (e:Employee {employeeID:row.EmployeeID})
ON CREATE SET e.firstName = row.FirstName, e.lastName = row.LastName, e.title = row.Title;
// Create categories
LOAD CSV WITH HEADERS FROM 'file:///categories.csv' AS row
MERGE (c:Category {categoryID: row.CategoryID})
ON CREATE SET c.categoryName = row.CategoryName, c.description = row.Description;
You might notice that we have not imported all of the field columns in our CSV file.
With our statements, we can choose which properties are needed on a node, which can be left out, and which might need imported to another node type or relationship.
You might also notice that we used the {cyphermanual}/clauses/merge/[MERGE
keyword^], instead of {cyphermanual}/clauses/create/[CREATE
^].
Though we feel pretty confident there are no duplicates in our CSV files, we can use MERGE
as good practice for ensuring unique entities in our database.
For very large commercial or enterprise datasets, you may find out-of-memory errors, especially on smaller machines.
To avoid these situations, you can prefix the statement with the |
After the nodes are created, we need to create the relationships between them. Importing the relationships will mean looking up the nodes we just created and adding a relationship between those existing entities. To ensure the lookup of nodes is optimized, we will want to create indexes for any node properties we want to use in the lookups (often the id or another unique value).
We also want to create a constraint (also creates an index with it) that will disallow orders with the same id from getting created, preventing duplicates.
Finally, as the indexes are created after the nodes are inserted, their population happens asynchronously, so we use the schema await
(a shell command) to block until they are populated.
CREATE INDEX product_id FOR (p:Product) ON (p.productID);
CREATE INDEX product_name FOR (p:Product) ON (p.productName);
CREATE INDEX supplier_id FOR (s:Supplier) ON (s.supplierID);
CREATE INDEX employee_id FOR (e:Employee) ON (e.employeeID);
CREATE INDEX category_id FOR (c:Category) ON (c.categoryID);
CREATE CONSTRAINT order_id ON (o:Order) ASSERT o.orderID IS UNIQUE;
schema await
Initial nodes and indexes in place, we can now create the relationships for orders to products and orders to employees.
// Create relationships between orders and products
LOAD CSV WITH HEADERS FROM 'file:///orders.csv' AS row
MATCH (order:Order {orderID: row.OrderID})
MATCH (product:Product {productID: row.ProductID})
MERGE (order)-[op:CONTAINS]->(product)
ON CREATE SET op.unitPrice = toFloat(row.UnitPrice), op.quantity = toFloat(row.Quantity);
// Create relationships between orders and employees
LOAD CSV WITH HEADERS FROM "file:///orders.csv" AS row
MATCH (order:Order {orderID: row.OrderID})
MATCH (employee:Employee {employeeID: row.EmployeeID})
MERGE (employee)-[:SOLD]->(order);
Next, create relationships between products, suppliers, and categories:
// Create relationships between products and suppliers
LOAD CSV WITH HEADERS FROM "file:///products.csv" AS row
MATCH (product:Product {productID: row.ProductID})
MATCH (supplier:Supplier {supplierID: row.SupplierID})
MERGE (supplier)-[:SUPPLIES]->(product);
// Create relationships between products and categories
LOAD CSV WITH HEADERS FROM "file:///products.csv" AS row
MATCH (product:Product {productID: row.ProductID})
MATCH (category:Category {categoryID: row.CategoryID})
MERGE (product)-[:PART_OF]->(category);
Lastly, we will create the 'REPORTS_TO' relationship between employees to represent the reporting structure:
// Create relationships between employees (reporting hierarchy)
LOAD CSV WITH HEADERS FROM "file:///employees.csv" AS row
MATCH (employee:Employee {employeeID: row.EmployeeID})
MATCH (manager:Employee {employeeID: row.ReportsTo})
MERGE (employee)-[:REPORTS_TO]->(manager);
You can also run the whole script at once using bin/neo4j-shell -path northwind.db -file import_csv.cypher
.
We can now query the resulting graph to find out what it can tell us about our newly-imported data.
Querying the Graph
We might start with a couple of general queries to verify that our data matches the model we designed earlier in the guide. Here are some example queries:
//find a sample of employees who sold orders with their ordered products
MATCH (e:Employee)-[rel:SOLD]->(o:Order)-[rel2:CONTAINS]->(p:Product)
RETURN e, rel, o, rel2, p LIMIT 25;
//find the supplier and category for a specific product
MATCH (s:Supplier)-[r1:SUPPLIES]->(p:Product {productName: 'Chocolade'})-[r2:PART_OF]->(c:Category)
RETURN s, r1, p, r2, c;
Once we are comfortable that the data aligns with our data model and everything looks correct, we can start querying to gather information and insight for business decisions. One question we might be interested in is the following:
MATCH (choc:Product {productName:'Chocolade'})<-[:CONTAINS]-(:Order)<-[:SOLD]-(employee),
(employee)-[:SOLD]->(o2)-[:CONTAINS]->(other:Product)
RETURN employee.employeeID as employee, other.productName as otherProduct, count(distinct o2) as count
ORDER BY count DESC
LIMIT 5;
Looks like employee No. 4 was busy, though employee No. 1 also did well!
employee | otherProduct | count |
---|---|---|
4 |
Gnocchi di nonna Alice |
14 |
4 |
Pâté chinois |
12 |
1 |
Flotemysost |
12 |
3 |
Gumbär Gummibärchen |
12 |
1 |
Pavlova |
11 |
We might also like to answer the following question:
MATCH (e:Employee)<-[:REPORTS_TO]-(sub)
RETURN e.employeeID AS manager, sub.employeeID AS employee;
manager | employee |
---|---|
2 |
3 |
2 |
4 |
2 |
5 |
2 |
1 |
2 |
8 |
5 |
9 |
5 |
7 |
5 |
6 |
Notice that employee No. 5 has people reporting to them but also reports to employee No. 2.
Let’s investigate that a bit more:
MATCH path = (e:Employee)<-[:REPORTS_TO*]-(sub)
WITH e, sub, [person in NODES(path) | person.employeeID][1..-1] AS path
RETURN e.employeeID AS manager, path as middleManager, sub.employeeID AS employee
ORDER BY size(path);
manager | middleManager | employee |
---|---|---|
2 |
[] |
3 |
2 |
[] |
4 |
2 |
[] |
5 |
2 |
[] |
1 |
2 |
[] |
8 |
5 |
[] |
9 |
5 |
[] |
7 |
5 |
[] |
6 |
2 |
[5] |
9 |
2 |
[5] |
7 |
2 |
[5] |
6 |
MATCH (e:Employee)
OPTIONAL MATCH (e)<-[:REPORTS_TO*0..]-(sub)-[:SOLD]->(order)
RETURN e.employeeID as employee, [x IN COLLECT(DISTINCT sub.employeeID) WHERE x <> e.employeeID] AS reportsTo, COUNT(distinct order) AS totalOrders
ORDER BY totalOrders DESC;
employee | reportsTo | totalOrders |
---|---|---|
2 |
[8,1,5,6,7,9,4,3] |
830 |
5 |
[6,7,9] |
224 |
4 |
[] |
156 |
3 |
[] |
127 |
1 |
[] |
123 |
8 |
[] |
104 |
7 |
[] |
72 |
6 |
[] |
67 |
9 |
[] |
43 |
What’s Next?
If you followed along with each step through this tutorial as a guide, then you might want to explore the data set with more queries and try to answer additional questions you came up with for the data. You may also want to apply these same principles to your own or another data set for analysis.
If you used this as a process flow to apply to a different data set or you would like to do that next, feel free to start at the top and work through this guide again with another domain. The steps and processes still apply (though, of course, the data model, queries, and business questions will need adjusted).
If you have data that needs additional cleansing and manipulation than what is covered in this guide, the APOC library may be able to help. It contains hundreds of procedures and functions for handling large amounts of data, translating values, cleaning messy data sources, and more!
If you are interested in doing a one-time initial dump of relational data to Neo4j, then the Neo4j ETL Tool might be what you are looking for. The application is designed with a point-and-click user interface with the goal of fast, simple relational-to-graph loads that help new and existing users gain faster value from seeing their data as a graph without Cypher, import procedures, or other code.
Resources
-
Northwind SQL, CSV and Cypher data files, also as zip file
-
{cyphermanual}/clauses/load-csv/[LOAD CSV^]: Cypher’s command for importing CSV files
-
APOC library: Neo4j’s utility library
-
Neo4j ETL Tool: Loading relational data without code