5

Is there a way to use Lucene to work with graph data?

Example

One user has a relationship with many lucene documents (Document Connections) One User has a relationship with other Users (User Connections [Graph])

If a user searches the Index, he gets back the documents that he has a relationship with. This is simple and straightforward.

What would be a way to get back the documents that the User Connections have a relationship with.

Indexing each document with all the user's that have a relationship with it in a user_id field is an approach. However when you query the index providing the User Connections for the user performing the search query size is unpredictable. Think of Users that have 1000's of User Connections. This will not scale.

It's almost like the User Connections and User Documents stored in a Graph DB can easily provide us the documents to search against but what is an effective way to communicate that to Lucene so it can only search against those documents for the given query. If any results are returned, this will guarantee that at least one or more of the User Connections has a relationship with the documents returned in the results.

1

3 Answers 3

3

I don't believe there is currently any graph technology that sits on top of solr or lucene.

You would probably be best looking at either one of these two camps:

  • Neo4j with SpringData (free for single instance)

OR

  • Tinkerpop Blueprints (possibly rexter if not using java/scala) on one of these technologies:
  • Titan on Cassandra with Hadoop (multi master, no point of failure)
  • OrientDb
  • Neo4j

These databases are graph databases. Tinkerpop Blueprints is a standard that allows you to abstract the specific implementation. Springdata currently only supports neo4j for graph technologies.

Neo4j costs money if you cluster (free license is single instance only).

You can read discussion on solr/lucene with graphing here. http://lucene.472066.n3.nabble.com/indexing-directed-graph-td2949556.html

Note neo4j supports full text search.

Sign up to request clarification or add additional context in comments.

8 Comments

That's great that Neo4j supports full text based searching. The clustering of Neo4j is for fault tolerance/scalability or sharding?
In my reading on Neo4j, i have come to understand that it comes included with the lucene document database. So Lucene can be used to find a node and then Neo4j can be used to graph out to the relationships from that node to other nodes. github.com/andreasronge/neo4j/wiki/Neo4j%3A%3ACore-Lucene
Scalability. Neo4j uses a master-slave paradigm for clustering so it's not as fault tolerant as titan on cassandra for example. Orient also uses a multi-master paradigm if I remember so it's a good choice as well.
Thanks for the info on lucene - I didn's know that (or perhaps forgot) :)
Also you may be interested in faunus if you're doing any sort of statistical analysis on graph data. It's tinkerpop compliant.
|
0

Graph databases are supported since solr 6.0; if you don't have solr installed, it's probably still better to use a graph database instead, but now at least, you have a choice. I found this, documentation is still sparse:

https://solr.pl/en/2016/04/18/solr-6-0-and-graph-traversal-support/

Comments

0

Apache Jena may be relevant here since it has some graph capabilities (SPARQL, RDF) and makes use of Lucene.

See Apache Jena Fuseki and Jena Text.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.