Congressional Graph Algorithms

Network Analysis With Neo4j Graph Algorithms


lg datamodel
  • Centralities

  • Clustering

  • Triadic closures and inferred relationships *

  • Global graph algorithms with APOC

Bill Cosponsorships


Let’s start by using bill cosponsorships for our analysis. A cosponorship occurs any time two legislators sponsor the same bill.

Cosponsors for a single Bill
MATCH (b:Bill) WITH b LIMIT 1
MATCH (l:Legislator)<-[:SPONSORED_BY]-(b)
RETURN *

Bill cosponsorship is a specific example of the concept of inferred relationships or triadic closures. In this case we are inferring a relationship between two legislators who sponsor the same bill.

bill cosponsors

Persisting Inferred Relationships

Persist all COSPONSORED relationships
// Persist inferred COSPONSORED relationships
MATCH (l1:Legislator)<-[:SPONSORED_BY]-(b:Bill)-[:SPONSORED_BY]->(l2:Legislator)
WHERE id(l1) < id(l2)
WITH l1, l2, COUNT(*) AS weight
CREATE (l1)-[r:COSPONSORED]->(l2)
SET r.weight = weight

Centrality measures


In graph theory and network analysis, indicators of centrality identify the most important vertices within a graph. Applications include identifying the most influential person(s) in a social network, key infrastructure nodes in the Internet or urban networks, and super-spreaders of disease.

Degree Centrality

Degree centrality is the number of relationships connected to a specific node. In the context of this network, it is the number of COSPONSORED relationships.

MATCH (l:Legislator)
RETURN l.firstName + " " + l.lastName AS legislator, size ((l)<-[:COSPONSORED]-()) AS degree ORDER BY degree DESC LIMIT 25

Weighted Degree Centrality

MATCH (l:Legislator)-[r:COSPONSORED]-()
RETURN l.firstName + " " + l.lastName AS legislator, sum(r.weight) AS weightedDegree ORDER BY weightedDegree DESC LIMIT 25

Apoc Procedures


apoc proc

User Defined Procedures

User defined procedures are written in Java, deployed to the database and callable from Cypher.

Apoc library

Library of procedures for many common Neo4j tasks, including graph algorithsm, data import, refactoring, indexing, system monitoring, …​

Betweenness Centrality


The betweenness centrality of a node in a network is the number of shortest paths between two other members in the network on which a given node appears.

Betweenness centality is an important metric because it can be used to identify “brokers of information” in the network or nodes that connect disparate clusters.

betweenness centrality
The red nodes have a high betweenness centrality and are connectors of clusters.
MATCH (l:Legislator)
WITH collect(l) AS legislators
CALL apoc.algo.betweenness(['COSPONSORED'], legislators, 'OUTGOING') YIELD node, score
SET node.betweenness = score
RETURN node.firstName + " " + node.lastName AS legislator, score ORDER BY score DESC LIMIT 25

Closeness Centrality


Nodes with high closeness centality are often highly connected within clusters in the graph, but not necessarily highly connected outside of the cluster.

Closeness centrality is the inverse of the average distance to all other characters in the network.

closeness centrality
Nodes with high closeness centrality are connected to many other nodes in a network.
MATCH (l:Legislator)
WITH collect(l) AS legislators
CALL apoc.algo.closeness(['COSPONSORED'], legislators, 'OUTGOING') YIELD node, score
RETURN node.firstName + " " + node.lastName AS legislator, score ORDER BY score DESC LIMIT 25

PageRank


page rank
The size of each node is proportional to the size and number of nodes with an outgoing relationship to it.
MATCH (l:Legislator) WITH collect(l) AS ls
CALL apoc.algo.pageRank(ls) YIELD node, score
RETURN node.firstName + " " + node.lastName AS legislator, score ORDER BY score DESC LIMIT 10

PageRank - Inferred Relationships


Inferred Relationships and Political Influence

The main sponsor of a bill can be said to have demonstrated political influence over cosponsors. We can find these inferred INFLUENCED relationships in the graph with this query:

Find inferred INFLUENCED relationships
MATCH (b:Bill)-[r:SPONSORED_BY]->(sponsor:Legislator)
WHERE r.cosponsor = False
MATCH (b)-[s:SPONSORED_BY]->(cosponsor:Legislator)
WHERE s.cosponsor = True
RETURN id(sponsor) AS source, id(cosponsor) AS target, count(*) AS weight ORDER BY weight DESC LIMIT 25

PageRank On Inferred Relationships

We can run PageRank using inferred relationships without actually persisting these to the graph:

Run PageRank on inferred INFLUENCED relationships (that are not persisted in the graph)
CALL apoc.algo.pageRankWithCypher({iterations:20, write:true, node_cypher: 'MATCH (l:Legislator) RETURN id(l) AS id', rel_cypher:'MATCH (b:Bill)-[r:SPONSORED_BY]->(sponsor:Legislator)
WHERE r.cosponsor = False
MATCH (b)-[s:SPONSORED_BY]->(cosponsor:Legislator)
WHERE s.cosponsor = True
RETURN id(sponsor) AS source, id(cosponsor) AS target, count(*) AS weight ORDER BY weight DESC'})

Most influential Senator with influence over certain topics


MATCH (b:Body {type: "Senate"})<-[:ELECTED_TO]-(l:Legislator)<-[:SPONSORED_BY]-(:Bill)-[d:DEALS_WITH]->(s:Subject)
WHERE s.title CONTAINS "Technology"
RETURN l, COUNT(*) AS num ORDER BY l.pagerank DESC LIMIT 10

MATCH (l:Rep)