Twitter election graph


think global large

Powered By

neo4j logo

This demo will guide the user through exploring the "Twitter Election Graph". The dataset consists of over 5 million tweets about the US election from over 1.7 million Twitter users during the month of July 2016. The data was collected from streaming results from the Twitter API for tweets from election candidates or those using hashtags related to the US 2016 presidental election.

Data model


datamodel

Neo4j uses the property graph data model, which consists of nodes (the entities or objects in the graph) and relationships that connect them. We can see the graph data model of our Twitter election graph:

Sample the graph and return the data model
call apoc.meta.graphSample(2000)

To execute this query, please click on the statement above to put the query in the query editor above.
Hit the triangular button or press Ctrl+Enter to run it and see the resulting visualization.

As you can see from the results of this query, the data model consists of:

Nodes

  • Tweet - a status update posted to Twitter

  • User - a Twitter user, uniquely identified by a screen_name property

  • Source - the application used by the user to post a tweet (Twitter website, iOS app, etc.)

  • Link - a url embedded in a tweet

  • Hashtag - a word or phrase preceded by a hash (#) and used to identify messages on a specific topic

Relationships

  • (:User)-[POSTS]→(:Tweet)

  • (:Tweet)-[:MENTIONS]→(:User)

  • (:Tweet)-[:TAGS]→(:Hashtag)

  • (:Tweet)-[:QUOTES|RETWEETS|REPLY_TO]→(:Tweet)

Cypher Introduction


Graph Patterns

Neo4j’s query language, Cypher, is centered around graph patterns which represents entities with parentheses, for example, (:User) and connections with arrows, for example -[:POSTS]->.

:User and :POSTS are the types of the node and the connection, respectively.

Here is an example pattern: (u:User)-[:POSTS]->(:Tweet)-[:TAGS]->(ht:Hashtag). These patterns may be found with the MATCH clause.

Other Clauses

The following clauses may follow a MATCH clause. They work with the properties stored at the nodes and relationships found in the graph matching that pattern.

filter

WHERE u.screen_name CONTAINS 'realDonaldTrump'

aggregate

WITH ht.name AS hashtag, count(*) AS frequency

return

RETURN hashtag, frequency

order

ORDER BY frequency DESC

limit

LIMIT 20;

Most common hashtags used by Donald Trump
MATCH (u:User)-[:POSTS]->(:Tweet)-[:TAGS]->(ht:Hashtag)
WHERE u.screen_name CONTAINS "realDonaldTrump"
WITH ht.name AS hashtag, count(*) AS frequency
RETURN hashtag, frequency
ORDER BY frequency DESC LIMIT 20;

To execute this query, please click on the statement above to put the query in the query editor above.
Hit the triangular button or press Ctrl+Enter to run it and see the resulting visualization.

Query for graph results


graph results

You can think of the Neo4j Browser as a workbench for executing queries against Neo4j and visualizing the results. The Neo4j Browser includes tools for graph visualization. Let’s examine some results in graph form:

// Show the first few Tweets, Users and Hashtags
MATCH p=(:User)-[:POSTS]->(t:Tweet)-->()
RETURN p LIMIT 10

To execute this query, please click on the statement above to put the query in the query editor above.
Hit the triangular button or press Ctrl+Enter to run it and see the resulting visualization.

Query for tabular results


We can also return results in tabular format. Let’s look at the most common hashtags used by verified Twitter users (those likely to be politicans and official accounts in this dataset):

// Most common hashtags for verified Twitter users
MATCH (u:User)-[:POSTS]->(:Tweet)-[:TAGS]->(h:Hashtag) WHERE u.verified = true
RETURN h.name, count(*) AS num ORDER BY num DESC

To execute this query, please click on the statement above to put the query in the query editor above.
Hit the triangular button or press Ctrl+Enter to run it and see the resulting visualization.

Note that we’re traversing over millions of paths here and aggregating on the results of our traversal to find the most common hashtags.

Top mentions of a user


Who is mentioning Donald Trump the most in this dataset?

MATCH
  (u:User)-[:POSTS]->(t:Tweet)-[:MENTIONS]->(m:User {screen_name:'realDonaldTrump'})
WHERE
  u.screen_name <> 'realDonaldTrump'
RETURN
  u.screen_name AS screen_name, COUNT(u.screen_name) AS count
ORDER BY
  count
DESC LIMIT 10

To execute this query, please click on the statement above to put the query in the query editor above.
Hit the triangular button or press Ctrl+Enter to run it and see the resulting visualization.

Top mentions of a user - verified only


Let’s filter the mentions to only include those from verified users.

MATCH
  (u:User)-[:POSTS]->(t:Tweet)-[:MENTIONS]->(m:User {screen_name:'realDonaldTrump'})
WHERE
  u.screen_name <> 'realDonaldTrump' AND u.verified = true
RETURN
  u.screen_name AS screen_name, COUNT(u.screen_name) AS count
ORDER BY
  count
DESC LIMIT 10

To execute this query, please click on the statement above to put the query in the query editor above.
Hit the triangular button or press Ctrl+Enter to run it and see the resulting visualization.

Text search


We can incorporate text search into our queries as well. Let’s now examine some tweets that contain hashtags about the "Brexit" - Britain’s recent vote to leave the European Union.

MATCH (h:Hashtag) WHERE h.name CONTAINS "brexit"
MATCH (h)<-[r:TAGS]-(t)-[a]-(o)
RETURN * LIMIT 25

To execute this query, please click on the statement above to put the query in the query editor above.
Hit the triangular button or press Ctrl+Enter to run it and see the resulting visualization.

Links from interesting retweets


Many tweets contain urls - links to articles and other media. We can use the power of the graph to find potentially interesting content. Here we look for links contained in popular tweets retweeted by Bernie Sanders.

MATCH (:User {screen_name: 'BernieSanders'})-[:POSTS]->
  (t:Tweet)-[:RETWEETS]-(rt:Tweet)-[:CONTAINS]->(link:Link)
OPTIONAL MATCH (rt)-[:TAGS]->(ht:Hashtag)
RETURN t.text AS tweet, coalesce(link.expanded_url, link.url) AS url, collect(ht.name) AS hashtags, rt.favorites AS favorites
ORDER BY favorites DESC LIMIT 10

To execute this query, please click on the statement above to put the query in the query editor above.
Hit the triangular button or press Ctrl+Enter to run it and see the resulting visualization.

Users tweeting with common tags


What users are posting tweets using hashtags most similar to those used by President Barack Obama?

MATCH (me:User {screen_name:'BarackObama'})-[:POSTS]->(tweet:Tweet)-[:TAGS]->(ht)
OPTIONAL MATCH (tweet)<-[:RETWEETS]-(retweet)
WITH me,ht, collect(distinct retweet) as retweets
MATCH (ht)<-[:TAGS]-(tweet2:Tweet)<-[:POSTS]-(sugg:User)
WHERE sugg <> me and NOT(tweet2 IN retweets)
WITH sugg, count(distinct(ht)) as common
RETURN sugg.screen_name as friend, common
ORDER BY common DESC
LIMIT 20

To execute this query, please click on the statement above to put the query in the query editor above.
Hit the triangular button or press Ctrl+Enter to run it and see the resulting visualization.