I have constructed a Neo4j graph based on MovieLens100k with the following content:
Nodes:
(:User),(:Movie),(:Genre)Relations:
(:User)-[:RATED]->(:Movie),(:Movie)-[:HAS_GENRE]->(:Genre)
I want to compute PathSim, which is a similarity measure for equal-type nodes following a meta-path which links the two entities. Suppose the entities are (:User) and that the meta-path to follow is the following:
path1=(:User)-[:RATED]->(:Movie)<-[:RATED]-(:User)
Suppose to have computed such quantities:
N_ij = Number of path of type path1 starting from (:User) with id=i and ending in (:User) with id=j
Then PathSim for user with id 1 and id 2 is:
PathSim = 2N_12/(N_11+N_22)
To compute such quantity in Neo4j my Cypher query (1) is:
"""
MATCH path=(u1:User)-[:RATED]->(:Movie)<-[:RATED]-(u2:User)
WITH u1, u2, COUNT(path) as ct
RETURN u1.id as HeadUser, u2.id as TailUser, CASE WHEN ct is null THEN 0 ELSE ct END AS Overlap
"""
Let us focus on users 1 and 2, this correctly computes all different paths between user 1 and 2, but it does not compute all different paths between user 1 and 1. In fact, since the meta-path is symmetric, N_11 should be equal to counting the half-paths (u1:User)-[:RATED]->(:Movie) starting from (:User) with id=1, i.e. query (2):
"""
MATCH path=(u1:User {id: 1})-[:RATED]->(:Movie)
WITH COUNT(path) as ct
RETURN CASE WHEN ct is null THEN 0 ELSE ct END AS Overlap
"""
By taking into account "trivial" paths consisting of going from (:User) to (:Movie) and coming back from the same exact half-path.
However, as I understand, there is no such row in query (1) since Cypher only counts different half-paths given a symmetric path.
Is there a way to account for "trivial" paths by sort of modifying query (1)?