Importing the Buzzfeed TrumpWorld Dataset into Neo4j
I came across this tweet by Sanchez Castro (@SCHZCAS) which I’m more than happy to support.
Hey @neo4j please do this once again!!! #panamapapers #neo4j
— sanchezcastro (@SCHZCAS) January 15, 2017
https://t.co/B8pjxNKCyA
As part of the Buzzfeed article Help Us Map TrumpWorld, the four investigative journalists, John Templon, Alex Campbell, Anthony Cormier, and Jeremy Singer-Vine asked the public to help them map and analyze the data that they investigated, confirmed and published:
Now we are asking the public to use our data to find connections we may have missed, and to give us context we don’t currently understand. We hope you will help us — and the public — learn more about TrumpWorld and how this unprecedented array of businesses might affect public policy.

Setup with Local Database
If you want to work with this data locally, please download and install Neo4j 3.1.
When using Neo4j Desktop, you can find configuration and directories on the Options
pane which you open with the button of the same name.
If you want to run this guide inside of your Neo4j-Browser, you can edit the database configuration from the Options pane and add the following setting:
browser.remote_content_hostname_whitelist=http://portal.graphgist.org
Then restart the database and you’re ready to roll, by running this command:
:play http://portal.graphgist.org/graph_gists/trumpworld-graph/graph_guide
Looking at the Data
The Buzzfeed article linked int he tweet points to a Google Spreadsheet with collected and verified/researched data of relationships of Trump organizations to other organizations, in total we have 770 organizations (as of today) with 611 relationships.
The data is fortunately available as a public google doc, so we can use our old trick of loading the CSV-Download URL with LOAD CSV
into Neo4j.
Let’s have a look at the data first, here as a sample:
WITH
'https://docs.google.com/spreadsheets/u/1/d/1Z5Vo5pbvxKJ5XpfALZXvCzW26Cl4we3OaN73K9Ae5Ss/export?format=csv&gid=634968401' AS url
LOAD CSV WITH HEADERS FROM url AS row
RETURN row.`Organization A`,row.`Organization B`,row.Connection, row.`Source(s)`
LIMIT 5
╒════════════════════════════════════╤═══════════════════════════════╤════════════════╤══════════════════════════════════════════════════════════════════════════════════════╕ │"row.`Organization A`" │"row.`Organization B`" │"row.Connection"│"row.`Source(s)`" │ ╞════════════════════════════════════╪═══════════════════════════════╪════════════════╪══════════════════════════════════════════════════════════════════════════════════════╡ │"4 SHADOW TREE LANE MEMBER CORP." │"4 SHADOW TREE LANE LLC" │"Ownership" │"https://www.documentcloud.org/documents/2838696-Trump-2016-Financial-Disclosure.html"│ ├────────────────────────────────────┼───────────────────────────────┼────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤ │"40 WALL DEVELOPMENT ASSOCIATES LLC"│"40 WALL STREET LLC" │"Ownership" │"https://www.documentcloud.org/documents/2838696-Trump-2016-Financial-Disclosure.html"│ ├────────────────────────────────────┼───────────────────────────────┼────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤ │"40 WALL STREET LLC" │"40 WALL STREET COMMERCIAL LLC"│"Ownership" │"https://www.documentcloud.org/documents/2838696-Trump-2016-Financial-Disclosure.html"│ ├────────────────────────────────────┼───────────────────────────────┼────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤ │"40 WALL STREET MEMBER CORP." │"40 WALL STREET LLC" │"Ownership" │"https://www.documentcloud.org/documents/2838696-Trump-2016-Financial-Disclosure.html"│ ├────────────────────────────────────┼───────────────────────────────┼────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤ │"401 MEZZ VENTURE LLC" │"401 NORTH WABASH VENTURE LLC" │"Ownership" │"https://www.documentcloud.org/documents/2838696-Trump-2016-Financial-Disclosure.html"│ └────────────────────────────────────┴───────────────────────────────┴────────────────┴──────────────────────────────────────────────────────────────────────────────────────┘
The most frequently mentioned organizations found by this query:
WITH
'https://docs.google.com/spreadsheets/u/1/d/1Z5Vo5pbvxKJ5XpfALZXvCzW26Cl4we3OaN73K9Ae5Ss/export?format=csv&gid=634968401' AS url
LOAD CSV FROM url AS row
WITH row
UNWIND row[0..1] AS org
RETURN org, count(*)
ORDER BY count(*) DESC LIMIT 10
We see the usual suspect on top.
╒═══════════════════════════════════════════╤══════════╕ │"org" │"count(*)"│ ╞═══════════════════════════════════════════╪══════════╡ │"THRIVE CAPITAL" │79 │ ├───────────────────────────────────────────┼──────────┤ │"DJT HOLDINGS LLC" │32 │ ├───────────────────────────────────────────┼──────────┤ │"TRUMP ORGANIZATION LLC" │14 │ ├───────────────────────────────────────────┼──────────┤ │"KUSHNER COMPANIES" │13 │ ├───────────────────────────────────────────┼──────────┤ │"TTTT VENTURE LLC" │7 │ ├───────────────────────────────────────────┼──────────┤ │"MNC GROUP" │5 │ ├───────────────────────────────────────────┼──────────┤ │"DEUTSCHE BANK" │5 │ ├───────────────────────────────────────────┼──────────┤ │"GOLDMAN SACHS" │4 │ ├───────────────────────────────────────────┼──────────┤ │"TRUMP INTERNATIONAL HOTELS MANAGEMENT LLC"│3 │ ├───────────────────────────────────────────┼──────────┤ │"THE DONALD J. TRUMP REVOCABLE TRUST" │3 │ └───────────────────────────────────────────┴──────────┘
What kind of relationships exist in the data:
WITH
'https://docs.google.com/spreadsheets/u/1/d/1Z5Vo5pbvxKJ5XpfALZXvCzW26Cl4we3OaN73K9Ae5Ss/export?format=csv&gid=634968401' AS url
LOAD CSV WITH HEADERS FROM url AS row
RETURN row.Connection AS type, count(*)
ORDER BY count(*) DESC
╒════════════════════════════════════════════════════════════════════════════════════════════════════╤══════════╕ │"type" │"count(*)"│ ╞════════════════════════════════════════════════════════════════════════════════════════════════════╪══════════╡ │"Ownership" │309 │ ├────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┤ │"Investor" │90 │ ├────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┤ │"Owns collateralized debt" │54 │ ├────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┤ │"Subsidiary" │41 │ ├────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┤ │"NKA/FKA" │28 │ ├────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┤ │"DBA" │16 │ .... │"Anbang would become one of the equity partners in the [666 Fifth Avenue] redevelopment if an agreem│1 │ │ent is finalized, per New York Times" │ │ ├────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┤ │"an investment firm involved in Trump Organization projects abroad, per New York Times" │1 │ ├────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┤ │"Kushner-affiliated corporate entity, per New York Magazine" │1 │ │"Lobbied for" │1 │ ├────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┤ │"Partners on Trump Hotel Rio de Janeiro" │1 │ └────────────────────────────────────────────────────────────────────────────────────────────────────┴──────────┘
While some of them like Ownership
, or Investor
are straightforward others (Anbang would become one of the equity partners in the [666 Fifth Avenue] redevelopment if an agreement is finalized, per New York Times
) are very specific, and probably not a good choice for relationship-type to query on.
So we have two options, one could be to use a generic relationship and put all the Connection
information into a property, or alternatively we do some cleanup/unification and have a richer set of relationships.
Simple, Direct Data Import
We look at the simpler variant here, to quickly get results. THe unification approach is demonstrated here, for those of you that want to have a bit more interesting graph model.
After creating the two needed constraints, we can directly import the data with Neo4j’s LOAD CSV
command.
CREATE CONSTRAINT ON (o:Organization) ASSERT o.name IS UNIQUE;
CREATE CONSTRAINT ON (p:Person) ASSERT p.name IS UNIQUE;
Connect organizations with other organizations from the 1st tab.
WITH 'https://docs.google.com/spreadsheets/u/1/d/1Z5Vo5pbvxKJ5XpfALZXvCzW26Cl4we3OaN73K9Ae5Ss/export?format=csv&gid=634968401' AS url
LOAD CSV WITH HEADERS FROM url AS row
MERGE (o1:Organization {name:row.`Organization A`})
MERGE (o2:Organization {name:row.`Organization B`})
CREATE (o1)-[con:CONNECTED_TO]->(o2)
SET con.connection=row.Connection, con.source=row.`Source(s)`
Connect people with organizations from the 2nd tab.
WITH 'https://docs.google.com/spreadsheets/u/1/d/1Z5Vo5pbvxKJ5XpfALZXvCzW26Cl4we3OaN73K9Ae5Ss/export?format=csv&gid=1368567920' AS url
LOAD CSV WITH HEADERS FROM url AS row
MERGE (p:Person {name:row.Person})
MERGE (o:Organization {name:row.Organization})
CREATE (p)-[con:INVOLVED_WITH]->(o)
SET con.connection=row.Connection, con.source=row.`Source(s)`
Connect people with other people from the 3rd tab.
WITH 'https://docs.google.com/spreadsheets/u/1/d/1Z5Vo5pbvxKJ5XpfALZXvCzW26Cl4we3OaN73K9Ae5Ss/export?format=csv&gid=905294723' AS url
LOAD CSV WITH HEADERS FROM url AS row
MERGE (p1:Person {name:row.`Person A`})
MERGE (p2:Person {name:row.`Person B`})
CREATE (p2)-[con:RELATED_TO]->(p1)
SET con.connection=row.Connection, con.source=row.`Source(s)`
Now we have all data of the Buzzfeed spreadsheet imported and can start asking some interesting questions.
This is what TrumpWorld looks like.

Data Enrichment
We can enrich this graph, eg. by labeling organizations:
MATCH (o:Organization)
WHERE o.name CONTAINS "BANK" SET o:Bank
We find 17 banks (there are certainly more).
MATCH (o:Organization)
WHERE o.name CONTAINS "HOTEL" SET o:Hotel
We find 34 hotels (there are certainly more).
MATCH (o:Organization)
WHERE any(term in ["TRUMP","DT","DJT"] WHERE o.name CONTAINS (term + " "))
SET o:Trump
We find 405 "Trump" organizations (there are certainly more).
There should be more labeling, but we leave that for later.
Banks in the Graph
Now we can start running some queries:
E.g. what relationships do banks have to which other organizations in our dataset:
MATCH (n:Bank)--(o) RETURN *
Which YIELDs this interesting graph:

Connections between Organizations
Looking at orgnaizations, Trump Tower Commercial LLC and 40 Wall Street LLC are some of Trump’s most prized possessions. Let’s have a look between and around them.
MATCH (o1:Organization {name:"TRUMP TOWER COMMERCIAL LLC"}) MATCH (o2:Organization {name:"40 WALL STREET LLC"}) MATCH path = (o1)-[*..3]-(o2) RETURN path

Kushner Network
We can now look for instance at the 2nd degree network of "Jared Kushner":
MATCH network = (:Person {name:"JARED KUSHNER"})-[*..2]-()
RETURN network

Most Connected People
MATCH (p:Person)-[r]-()
RETURN p.name, type(r) AS type, count(*) AS degree
ORDER BY degree DESC
LIMIT 10
╒══════════════════╤═══════════════╤════════╕ │"p.name" │"type" │"degree"│ ╞══════════════════╪═══════════════╪════════╡ │"DONALD J. TRUMP" │"INVOLVED_WITH"│550 │ ├──────────────────┼───────────────┼────────┤ │"WILBUR ROSS" │"INVOLVED_WITH"│127 │ ├──────────────────┼───────────────┼────────┤ │"DONALD J. TRUMP" │"RELATED_TO" │54 │ ├──────────────────┼───────────────┼────────┤ │"BETSY DEVOS" │"INVOLVED_WITH"│45 │ ├──────────────────┼───────────────┼────────┤ │"ELAINE CHAO" │"INVOLVED_WITH"│36 │ ├──────────────────┼───────────────┼────────┤ │"JOSHUA KUSHNER" │"INVOLVED_WITH"│23 │ ├──────────────────┼───────────────┼────────┤ │"REX TILLERSON" │"INVOLVED_WITH"│20 │ ├──────────────────┼───────────────┼────────┤ │"JARED KUSHNER" │"INVOLVED_WITH"│19 │ ├──────────────────┼───────────────┼────────┤ │"DONALD TRUMP JR."│"INVOLVED_WITH"│15 │ ├──────────────────┼───────────────┼────────┤ │"BEN CARSON" │"INVOLVED_WITH"│14 │ └──────────────────┴───────────────┴────────┘
Connection from Putin to Trump
As expected via Rex Tillerson, but interestingly the connection between Tillerson and Trump as Nominee for Secretary of State seems to be missing from the data.
MATCH (vp:Person {name:"VLADIMIR PUTIN"}),(dt:Person {name:"DONALD J. TRUMP"})
MATCH path = allShortestPaths( (vp)-[*]-(dt) )
RETURN path

So let’s check the data.
Nominees of Trumps Cabinet
And really Tillerson is missing, there are only 19 in the list.
MATCH (p:Person)-[con:RELATED_TO]->()
WHERE con.connection CONTAINS "Nominee"
RETURN p.name, con.connection
ORDER BY p.name ASC
╒═══════════════════╤══════════════════════════════════════════════════════════════════════════╕ │"p.name" │"con.connection" │ ╞═══════════════════╪══════════════════════════════════════════════════════════════════════════╡ │"ANDY PUZDER" │"Nominee for Secretary of Labor" │ ├───────────────────┼──────────────────────────────────────────────────────────────────────────┤ │"BEN CARSON" │"Nominee for Secretary of HUD" │ ├───────────────────┼──────────────────────────────────────────────────────────────────────────┤ │"BETSY DEVOS" │"Nominee for Secretary of Education" │ ├───────────────────┼──────────────────────────────────────────────────────────────────────────┤ │"DAN COATS" │"Nominee for Director of National Intelligence" │ ├───────────────────┼──────────────────────────────────────────────────────────────────────────┤ │"ELAINE CHAO" │"Nominee for Secretary of Transportation" │ ├───────────────────┼──────────────────────────────────────────────────────────────────────────┤ │"JAMES MATTIS" │"Nominee for Secretary of Defense" │ ├───────────────────┼──────────────────────────────────────────────────────────────────────────┤ │"JEFF SESSIONS" │"Nominee for Attorney General" │ ├───────────────────┼──────────────────────────────────────────────────────────────────────────┤ │"JOHN F. KELLY" │"Nominee for Secretary of Department of Homeland Security" │ ├───────────────────┼──────────────────────────────────────────────────────────────────────────┤ │"LINDA MCMAHON" │"Nominee for Administrator of the Small Business Administration" │ ├───────────────────┼──────────────────────────────────────────────────────────────────────────┤ │"MICHAEL POMPEO" │"Nominee for Director of CIA" │ ├───────────────────┼──────────────────────────────────────────────────────────────────────────┤ │"MICK MULVANEY" │"Nominee for Director of Office of Management and Budget" │ ├───────────────────┼──────────────────────────────────────────────────────────────────────────┤ │"NIKKI HALEY" │"Nominee for Ambassador to the United Nations" │ ├───────────────────┼──────────────────────────────────────────────────────────────────────────┤ │"RICK PERRY" │"Nominee for Secretary of Energy" │ ├───────────────────┼──────────────────────────────────────────────────────────────────────────┤ │"ROBERT LIGHTHIZER"│"Nominee for U.S. Trade Representative" │ ├───────────────────┼──────────────────────────────────────────────────────────────────────────┤ │"RYAN ZINKE" │"Nominee for Secretary of Interior" │ ├───────────────────┼──────────────────────────────────────────────────────────────────────────┤ │"SCOTT PRUITT" │"Nominee for Administrator of the Environmental Protection Administration"│ ├───────────────────┼──────────────────────────────────────────────────────────────────────────┤ │"STEVEN MNUCHIN" │"Nominee for Secretary of Treasury" │ ├───────────────────┼──────────────────────────────────────────────────────────────────────────┤ │"TOM PRICE" │"Nominee for Secretary of Health and Human Services" │ ├───────────────────┼──────────────────────────────────────────────────────────────────────────┤ │"WILBUR ROSS" │"Nominee for Secretary of Commerce" │ └───────────────────┴──────────────────────────────────────────────────────────────────────────┘
Our friends from Linkurious used part of my work for a blog post demonstrating how to visualize this data with their tool.