Label Propagation

Label Propagation (Preparations)


The database you start with must contain all of the data you loaded in the setup for this course.

This is what you will see when you click the database icon database icon.

DataLoaded

If you do not see this in your Neo4j Browser, you will need to perform the setup and Load Data steps.

Label Propagation (Graph Catalog)


The projected graphs roads and dach-region must be created and stored in the GDS graph catalog.

This is what you will see when you execute the following query:

CALL gds.graph.list()
YIELD graphName, nodeCount, relationshipCount
LoadedDatabase

 

If you do not see this in your Neo4j Browser, you will need to perform the Graph Catalog steps.

Label Propagation (Overview)


In the Weakly Connected components exercise, you used the weakly connected components algorithm to write component information to each Place node. This property was named wcc_component. In this exercise, you will gain some experience with writing Cypher to implement the Label Propagation algorithm using the European Roads dataset. This algorithm enables you to determine discreet sets of nodes that form communities or clusters based upon how they are connected, as well as the optional weight of the connections.

In this exercise, you will:

  • Part 1: Run the Label Propagation to determine the number of communities and their sizes.

  • Part 2: Perform a basic statistical analysis of the relationship weight properties.

  • Part 3: Run the weighted Label Propagation to determine the number of communities and their sizes.

  • Part 4: Add a community property to each node using Label Propagation.

  • Part 5: Inspect the results of the algorithm.

Go to the next page to start this exercise.

Part 1: Run the Label Propagation to determine the number of communities and their sizes (Instructions)


You can get a quick overview and distribution of the community structure by using the stats mode of the LPA algorithm.

Write Cypher code to perform the Label Propagation algorithm on the European road network using these guidelines:

  • The algorithm must use the projected graph roads, which is stored in the graph catalog.

  • Use the stats mode of the LPA algorithm.

  • YIELD the following results: communityCount, communityDistribution.

Hint: You will use gds.labelPropagation.stats.

Part 1: Run the Label Propagation to determine the number of communities and their sizes (Solution)


Write Cypher code to perform the Label Propagation algorithm on the European road network using these guidelines:

  • The algorithm must use the projected graph roads, which is stored in the graph catalog.

  • Use the stats mode of the LPA algorithm.

  • YIELD the following results: communityCount, communityDistribution.

Hint: You will use gds.labelPropagation.stats.

CALL gds.labelPropagation.stats('roads')
YIELD communityCount, communityDistribution
RETURN communityCount, communityDistribution

The results returned will look like this:

EXLPA.1

 

The LPA algorithm found a total of 32 communities in the graph. The largest community has 219 members, while most of the other communities have less than 50 members. You can observe that even though the nodes are in the same connected component, they are not necessarily in the same community. Most likely, each of the five smaller connected components forms its community, but the largest component is broken down into 27 communities.

Part 2: Perform a basic statistical analysis of the relationship weight properties (Instructions/Solution)


Next, you will inspect how adding the relationship weight as an input influence the LPA algorithm results. But first, you will examine the relationship weight statistics and value distribution.

Write Cypher code to perform a basic statistical analysis of the relationship weight properties using these guidelines:

  • Analyze both distance and inverse_distance relationship properties.

Hint: You will use apoc.agg.statistics.

UNWIND ['distance','inverse_distance'] AS property
MATCH (:Place)-[r:EROAD]->(:Place)
RETURN property, apoc.agg.statistics(r[property], [0.5,0.75,0.9,0.95,0.99])

The results returned will look like this:

EXLPA.2

 

Values of distance property are in kilometers. More than half of the connections are shorter than 100km. The longest road connecting Moskva to Rostov-na-Donu is 1066km long. On the other hand, inverse_distance values are mostly between 0.3 and 0.74.

Part 3: Run the weighted Label Propagation to determine the number of communities and their sizes (Instructions)


This time, you will run the stats mode of the LPA algorithm and add the relationship weight property as an input. The LPA algorithm deems that a higher relationship weight value represents a stronger connection. For this reason, you will use the inverse_distance property as the relationship weight input.

Write Cypher code to perform the weighted Label Propagation algorithm on the European road network using these guidelines:

  • The algorithm must use the projected graph roads, which is stored in the graph catalog.

  • Use the stats mode of the LPA algorithm.

  • YIELD the following results: communityCount, communityDistribution.

  • The relationship weight property name is inverse_distance.

Hint: You will use gds.labelPropagation.stats.

Part 3: Run the weighted Label Propagation to determine the number of communities and their sizes (Solution)


Write Cypher code to perform the weighted Label Propagation algorithm on the European road network using these guidelines:

  • The algorithm must use the projected graph roads, which is stored in the graph catalog.

  • Use the stats mode of the LPA algorithm.

  • YIELD the following results: communityCount, communityDistribution.

  • The relationship weight property name is inverse_distance.

Hint: You will use gds.labelPropagation.stats.

CALL gds.labelPropagation.stats('roads',
   {relationshipWeightProperty:'inverse_distance'})
YIELD communityCount, communityDistribution
RETURN communityCount, communityDistribution

The results returned will look like this:

EXLPA.3

 

The weighted variant of the algorithm found a total of 261 communities, which is almost 10 times more than the unweighted variant. You can observe that the weighted LPA algorithm found more granular communities as the largest community contains only 13 members. Depending on your use case, you might use the configuration option that works best for you.

Part 4: Add a community property to each node using Label Propagation. (Instructions)


You will now inspect the members of communities of the weighted LPA results. First, you need to write back the results to Neo4j.

Write Cypher code to perform the Label Propagation algorithm on the European road network and store the results back to nodes using these guidelines:

  • The algorithm must use the projected graph roads, which is stored in the graph catalog.

  • Use the write mode of the LPA algorithm.

  • The algorithm will perform a maximum of 10 iterations.

  • The algorithm will write a property named community_lpa to each node with the computed value.

  • The weight property name is inverse_distance.

  • YIELD the following results: nodePropertiesWritten, communityCount, ranIterations, didConverge.

Hint: You will call gds.labelPropagation.write.

Part 4: Add a community property to each node using Label Propagation. (Solution)


Write Cypher code to perform the Label Propagation algorithm on the European road network and store the results back to nodes using these guidelines:

  • The algorithm must use the projected graph roads, which is stored in the graph catalog.

  • Use the write mode of the LPA algorithm.

  • The algorithm will perform a maximum of 10 iterations.

  • The algorithm will write a property named community_lpa to each node with the computed value.

  • The weight property name is inverse_distance.

  • YIELD the following results: nodePropertiesWritten, communityCount, ranIterations, didConverge.

Hint: You will call gds.labelPropagation.write.

Here is the solution code:

CALL gds.labelPropagation.write('roads',{
    maxIterations: 10,
    writeProperty: "community_lpa",
    relationshipWeightProperty: "inverse_distance" })
YIELD nodePropertiesWritten, communityCount, ranIterations, didConverge
RETURN nodePropertiesWritten, communityCount, ranIterations, didConverge

The results returned will look like this:

EXLPA.4

Part 5: Verify results of the algorithm. (Instructions)


As a part of the analysis, you will inspect the largest ten communities and retrieve its members.

Write a query to return all community_lpa values of the Place nodes. For each community id, return the size of the community, and the list of Place names.

  • Order the results by component size descending.

  • Limit it to the top ten results.

Part 5: Verify results of the algorithm. (Solution)


Write a query to return all community_lpa values of the Place nodes. For each community id, return the size of the community, and the list of Place names.

  • Order the results by component size descending.

  • Limit it to the top ten results.

Here is the solution code:

MATCH (node:Place)
RETURN node.community_lpa AS communityId,
       count(*) AS communitySize,
       collect(node.name) AS places
ORDER BY communitySize DESC
LIMIT 10

The results returned will look like this:

EXLPA.5

 

The largest community has 13 members and contains places mostly located in Spain like Madrid, Bilbao, and San Sebastian. It also includes Bordeaux and Toulouse, which are two cities located in France. If you inspect the world map, you can observe that Bordeaux and Toulouse are near the border with Spain, which makes sense, given that the LPA algorithm assigned them the same community as some Spanish cities. The generated value for communityId may be different for your graph, but the community sizes and members will match.

Label Propagation: Taking it further


  1. Try using the stream version of the algorithm.

  2. Try different configuration values, for example number of iterations.

  3. Try using the seedProperty parameter.

Label Propagation (Summary)


In this exercise, you gained some experience with writing Cypher to implement the Label Propagation algorithm using the European Roads dataset. This algorithm enables you to determine discreet sets of nodes that form clusters based upon how they are connected, as well as the weight of the connections.