0

I have constructed a Neo4j graph based on MovieLens100k with the following content:

  1. Nodes: (:User), (:Movie), (:Genre)

  2. Relations: (:User)-[:RATED]->(:Movie), (:Movie)-[:HAS_GENRE]->(:Genre)

I want to compute PathSim, which is a similarity measure for equal-type nodes following a meta-path which links the two entities. Suppose the entities are (:User) and that the meta-path to follow is the following:

path1=(:User)-[:RATED]->(:Movie)<-[:RATED]-(:User)

Suppose to have computed such quantities:

N_ij = Number of path of type path1 starting from (:User) with id=i and ending in (:User) with id=j

Then PathSim for user with id 1 and id 2 is:

PathSim = 2N_12/(N_11+N_22)

To compute such quantity in Neo4j my Cypher query (1) is:

"""
MATCH path=(u1:User)-[:RATED]->(:Movie)<-[:RATED]-(u2:User)
WITH u1, u2, COUNT(path) as ct
RETURN u1.id as HeadUser, u2.id as TailUser, CASE WHEN ct is null THEN 0 ELSE ct END AS Overlap
"""

Let us focus on users 1 and 2, this correctly computes all different paths between user 1 and 2, but it does not compute all different paths between user 1 and 1. In fact, since the meta-path is symmetric, N_11 should be equal to counting the half-paths (u1:User)-[:RATED]->(:Movie) starting from (:User) with id=1, i.e. query (2):

"""
MATCH path=(u1:User {id: 1})-[:RATED]->(:Movie)
WITH COUNT(path) as ct
RETURN CASE WHEN ct is null THEN 0 ELSE ct END AS Overlap
"""

By taking into account "trivial" paths consisting of going from (:User) to (:Movie) and coming back from the same exact half-path.

However, as I understand, there is no such row in query (1) since Cypher only counts different half-paths given a symmetric path.

Is there a way to account for "trivial" paths by sort of modifying query (1)?

1 Answer 1

1

Neo4j will filter out any MATCH result row that repeats the same relationship. That is why (1) did not do what you expected.

Assuming that:

  • N_11 is the number of (:User {id: "1"})-[:RATED]->(:Movie) paths, and
  • User node id properties have string values (to allow my query to use them as map keys),

this query may do what you want:

MATCH (u1:User)-[:RATED]->(:Movie)<-[:RATED]-(u2:User)
WITH u1, u2, COUNT(*) as ct
WITH
  COLLECT(DISTINCT u1) as u1s,
  COLLECT(DISTINCT u2) AS u2s,
  COLLECT({headUser: u1.id, tailUser: u2.id, ct: ct}) AS pairs
UNWIND apoc.coll.intersection(u1s, u2s) AS user
WITH user, COUNT { (user)-[:RATED]->(:Movie) } AS userCt, pairs
WITH COLLECT(user.id) AS userIds, COLLECT(userCt) AS userCts, pairs
WITH apoc.map.fromLists(userIds, userCts) AS userCtMap, pairs
UNWIND pairs as pair
RETURN
  pair.headUser AS headUser,
  pair.tailUser AS tailUser,
  (2.0 * pair.ct)/(userCtMap[pair.headUser] + userCtMap[pair.tailUser]) AS pathSim

This query calculates all distinct users found by the MATCH clause and maps each user's id to its number of outgoing -[:RATED]->(:Movie) relationships. It then uses that mapping to calculate the denominator of your pathSim formula.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.