Louvain Modularity

Louvain Modularity (Preparations)


The database you start with must contain all of the data you loaded in the setup for this course.

This is what you will see when you click the database icon database icon.

DataLoaded

If you do not see this in your Neo4j Browser, you will need to perform the setup and Load Data steps.

Louvain Modularity (Graph Catalog)


The projected graphs roads and dach-region must be created and stored in the GDS graph catalog.

This is what you will see when you execute the following query:

CALL gds.graph.list()
YIELD graphName, nodeCount, relationshipCount
LoadedDatabase

 

If you do not see this in your Neo4j Browser, you will need to perform the Graph Catalog steps.

Louvain Modularity (Overview)


The Louvain Modularity algorithm finds clusters by comparing community density as it assigns nodes to different groups. You can think of this as a “what if” analysis to try out various grouping with the goal of eventually reaching a global optimum.

It also reveals a hierarchy of communities at different scales, which is useful for understanding the structure of a network at different levels of granularity.

In this exercise, you will perform the following with the European Roads dataset:

  • Part 1: Run the Louvain Modularity algorithm to add a community property to each node.

  • Part 2: Inspect the results of the algorithm.

  • Part 3: Run the Louvain Modularity algorithm to discover the hierarchical structure of communities.

  • Part 4: Inspect the hierarchical results of the algorithm.

  • Part 5: View the communities on different hierarchical levels.

  • Part 6: Examine the communities at each level for a single place.

  • Part 7: Inspect hierarchical structure of a single community on the final level.

Go to the next page to start this exercise.

Part 1: Run the Louvain Modularity algorithm to add a community property to each node. (Instructions)


You will start by executing the weighted Louvain Modularity algorithm and storing the results back to Neo4j. Similarly to LPA, Louvain Modularity algorithm also deems that a higher relationship weight value represents a stronger connection, so you will use the inverse_distance relationship property.

Write Cypher code to perform the weighted Louvain Modularity algorithm on the European road network using these guidelines:

  • The algorithm must use the projected graph roads, which is stored in the graph catalog.

  • The algorithm will write a property named community_louvain to each node with the computed value.

  • The relationship weight property name is inverse_distance.

  • YIELD the following results: modularity, ranLevels, communityCount.

Hint: You will call gds.louvain.write.

Part 1: Run the Louvain Modularity algorithm to add a community property to each node. (Solution)


Write Cypher code to perform the weighted Louvain Modularity algorithm on the European road network using these guidelines:

  • The algorithm must use the projected graph roads, which is stored in the graph catalog.

  • The algorithm will write a property named community_louvain to each node with the computed value.

  • The relationship weight property name is inverse_distance.

  • YIELD the following results: modularity, ranLevels, communityCount, communityDistribution.

Hint: You will call gds.louvain.write.

Here is the solution code:

CALL gds.louvain.write('roads',{
    writeProperty:'community_louvain',
    relationshipWeightProperty: 'inverse_distance'})
YIELD modularity, ranLevels, communityCount, communityDistribution

The results returned will look like this:

EX5.1

 

The algorithm found three hierarchical levels of communities with a total of 24 groups on the last level. The weighted variant of the Louvain Modularity algorithm found fewer communities than both the weighted and unweighted variants of LPA. The size of communities seems evenly distributed between 2 and 75, with an average value of 37.25. Remember, those communities with a size of 2 are probably the disconnected components we discovered with the Weakly Connected Components algorithm. Note how the Louvain Modularity community size distribution is more evenly distributed than the unweighted LPA results. In the unweighted LPA results, the largest community contains 219 members, and the second-largest group has only 63 members. However, this is not a golden rule that will always hold, so it is recommended to try out both the LPA and Louvain Modularity algorithm to see what works best for your use case.

The communityDistribution value consists of a map containing min, max, mean as well as p50, p75, p90, p95, p99 and p999 percentile values of community size for the last level. For example, the p99 value of 75 represents that 99% of communities have the community size of 75 or less and 1% of communities have the community size greater than or equal to 75.

Part 2: Inspect the results of the algorithm. (Instructions)


Next, you will retrieve the members of the largest 10 communities uncovered by the Louvain Modularity algorithm.

Write a query to return all community_louvain values in the graph containing Place nodes. For each distinct community value, return the list of places and the community size.

  • Order the results by community size descending.

  • Limit it to the top ten results.

Part 2: Inspect the results of the algorithm. (Solution)


Write a query to return all community_louvain values in the graph containing Place nodes. For each distinct community value, return the list of places and the community size.

  • Order the results by community size descending.

  • Limit it to the top ten results.

Here is the solution code:

MATCH (place:Place)
RETURN place.community_louvain AS communityId,
       count(*) AS communitySize,
       collect(place.name) AS places
ORDER BY communitySize DESC
LIMIT 10

The results returned will look like this:

EXLM.2

 

The largest community has 75 members. It contains places from Belgium, Netherlands, France, and Germany.

Part 3: Run the Louvain Modularity algorithm to discover a hierarchy of communities within the graph. (Instructions)


When you ran the Louvain Modularity algorithm, it reported that it found three hierarchical levels of communities. By default, the intermediate communities are not stored or returned in the results. With the includeIntermediateCommunities parameter, you can specify that the results include the intermediate communities. To investigate the communities at each hierarchical level, you will first write back the intermediate communities results to Neo4j.

Write Cypher code to perform the Louvain Modularity algorithm on the European road network using these guidelines:

  • The algorithm must use the projected graph roads, which is stored in the graph catalog.

  • Specify that intermediate communities are to be returned in the results.

  • The algorithm will write a property named communities_louvain to each node with the computed value for intermediate communities.

  • The relationship weight property name is inverse_distance.

  • YIELD the following results: modularity, ranLevels, communityCount, communityDistribution.

Hint: You will call gds.louvain.write.

Part 3: Run the Louvain Modularity algorithm to discover the hierarchal structure of communities. (Solution)


Write Cypher code to perform the Louvain Modularity algorithm on the European road network using these guidelines:

  • The algorithm must use the projected graph roads, which is stored in the graph catalog.

  • Specify that intermediate communities are to be returned in the results.

  • The algorithm will write a property named communities_louvain to each node with the computed value for intermediate communities.

  • The relationship weight property name is inverse_distance.

  • YIELD the following results: modularity, ranLevels, communityCount, communityDistribution.

Hint: You will call gds.louvain.write.

Here is the solution code:

CALL gds.louvain.write('roads',{
    writeProperty:'communities_louvain',
    relationshipWeightProperty: 'inverse_distance',
    includeIntermediateCommunities: true})
YIELD modularity, ranLevels, communityCount

The results returned will look like this:

EXLM.1

The community distribution statistics in the results are identical to the one we got in Part 1, as the community distribution is only calculated for the last hierarchical level.

Part 4: Inspect the hierarchical results of the algorithm. (Instructions)


You will now inspect the intermediate communities and their structure. You will begin by inspecting the Place nodes with a shared community throughout all the levels.

Write a query to return all communities_louvain values in the graph containing Place nodes. For each distinct community value, return the list of places and the community size.

Part 4: Inspect the hierarchical results of the algorithm. (Solution)


Write a query to return all communities_louvain values in the graph containing Place nodes. For each distinct community value, return the list of places and the community size.

Here is the solution code:

MATCH (place:Place)
RETURN place.communities_louvain AS communities,
       count(*) AS communitiesSize,
       collect(place.name) AS places
ORDER BY communitiesSize DESC
LIMIT 10

The results returned will look like this:

EX5.4

 

Antwerpen, Gent, Bruxelles, and six other places share the same community through all the hierarchical levels. On the first level, they have been assigned to the community with an id 145, but later switched to community 192 on level two and final third level. As we group the results by all the hierarchical levels, the communities will only include members that share the same community on all three levels. For example, Paris and Dijon share the community 198 on the third level, but they are not grouped together in the results because they do not share the same communities on the first and second levels.

Part 5: View the communities on different hierarchical levels. (Instructions/Solution)


You can then query the graph to find which communities form at each hierarchical level. You will inspect the communities of the final level to verify they are identical to the Part 2 results.

Execute this code:

MATCH (place:Place)
RETURN place.communities_louvain[-1] AS community,
       count(*) as communitiesSize,
       collect(place.name) AS places
ORDER BY communitiesSize DESC
LIMIT 10

The results returned will look like this:

EXLM.2

 

The results are identical to before when we ran the Louvain Modularity algorithm in Part 2.

 

If you want to inspect the community structure of the first hierarchical level, you can execute this code:

MATCH (place:Place)
RETURN place.communities_louvain[0] AS community,
       count(*) as communitiesSize,
       collect(place.name) AS places
ORDER BY communitiesSize DESC
LIMIT 10
EXLM.5

 

Communities on the first hierarchical level are the most fine-grained (smallest) by definition. The largest community on the first level contains only 9 members. Try to inspect the community structure of the second level, you only need to change the array index.

Part 6: Examine the communities at each level for a single place. (Instructions/Solution)


It may be easier to see how the algorithm progresses through levels if we look at all the intermediate communities for a single place.

You can examine the communities on each hiearchical level for London by running the following query:

UNWIND range(0,2) as level
MATCH (home:Place {name: "London"})
MATCH (place:Place) WHERE place.communities_louvain[level] = home.communities_louvain[level]
RETURN level,
       place.communities_louvain[level] AS community,
       count(*) as communitiesSize,
       collect(place.name) AS places

The results returned will look like this:

EXLM.6

 

The first level, the community is relatively small and contains only London, Colchester, and Harwich. On the next level, it already contains 22 members and grows to 50 members on the third and final level.

Try looking up the hierarchical community’s progression for another place, e.g. Berlin, Paris, Amsterdam.

Part 7: Inspect hierarchical structure of a single community on the final level (Instructions/Solution)


To better understand why the Louvain Modularity algorithm is called a hierarchical community detection algorithm, you will examine a single community’s hierarchical structure. You will inspect which communities of the first and second levels are merged into a single community on the final level.

You can examine the hiearchical structure of a single community by running the following query:

MATCH (p:Place)
WHERE p.communities_louvain[-1] = 154
RETURN collect(distinct p.communities_louvain[0]) as first_level_communities,
       collect(distinct p.communities_louvain[1]) as second_level_communities,
       collect(distinct p.communities_louvain[-1]) as third_level_communities

The results returned will look like this:

EXLM.6

You can observe that a single community of the final level consists of three communities found on the second level. Those three communities are broken down into 15 different communities on the first level. This is how the concept of community hierarchy in Louvain Modularity algorithm is derived.

Louvain Modularity: Taking it further


  1. Try using the stream version of the algorithm.

  2. Try different configuration values.

Louvain Modularity (Summary)


In this exercise, you gained some experience writing Cypher to implement the Louvain Modularity algorithm using the European Roads dataset. The Louvain Modularity algorithm finds clusters by comparing community density as it assigns nodes to different groups.