How to optimize graph traversals in ArangoDB?

Question

I primarily intended to ask this question : "Is ArangoDB a true graph database ?"

But, this question would sound quite offending.

You, peoples at triAGENS, did a really great job in creating a "multi-paradigm" database. As a user of PostgreSQL, PostGIS, MongoDB and Neo4J/Titan, I really appreciate to see an "all-in-one" solution :)

But the question remains, basically creating a graph in ArangoDB requires to create two separate collections : one for edges and one for vertices, thus, as far as I understand, it already means that vertices and related edges are not "physically" neighbors.

Moreover, even after creating appropriate index, I'm facing some serious performance issues when doing this kind of stuff in Gremlin

g.v('an_id').out('likes').in('likes').count()

Which returns a result after ~ 3 seconds (perceived time)

I assumed I poorly understood how Gremlin and Blueprint/ArangoDB worked so I tried to rewrite the same query using AQL :

LET lst = (FOR e1 in NEIGHBORS(vertices, edges, "an_id", "outbound", [ { "$label": "likes" } ] )
    FOR e2 in NEIGHBORS(vertices, edges, e1.edge._to, "inbound", [ { "$label": "likes" } ] )
        RETURN 1
    )
RETURN length(lst)

Which gives me a delay of same order of magnitude.

If I tried to run the same query on a Titan or Neo4j database (with the very same data), queries returns almost immediately (perceived time : <200ms)

So it seems to me that ArangoDB graph features are a "smart graph layer" above a "traditionnal document database" but that ArangoDB is not a "native" graph database.

To confirm this feeling, I transform data to load it in PostgreSQL and run a query (with a multiple table JOIN as you can assume) and got similar (to ArangoDB) execution delays

Did I do something wrong (in AQL query) ?

Is there a way to optimize the database to get better traversal times ?

In PostgreSQL, conceptually, I would mix edge and node and use a CLUSTER clause to physically order data, does something similar can be done in ArangoDB ? (I assume that it would be hard, as it would involve to "interlace" edges and nodes, just an intuition)

mchacki · Accepted Answer · 2014-01-10 10:54:28Z

7

i am a Core Developer of ArangoDB. Could you give me a bit more information ob the dimensions of data you are using?

Amount of vertices
Amount of edges

Then we can create our own setup with equal dimensions and optimize it.

answered Jan 10, 2014 at 10:54

mchacki

3,26715 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

14 Comments

Raphaël Braud Over a year ago

thanks for your comment, here are the infos : arangosh [_system]> db.edges.count() 185426 arangosh [_system]> db.vertices.count() 78797

mchacki Over a year ago

hi i tried a similar query to yours using imdb data set db.vertices.count() = 63027, db.edges.count() = 225060. So dimensions are quite similar. (The count returns up to 3000 depending on the starting node.). In my time measurements i get request times below 0.3s (If i do not load collections beforehand it is about 3s, but in production collections are always loaded, only default indices are set). Could you try our dataset on your machine and tell us if you get same results?

mchacki Over a year ago

link to dataset: dropbox.com/s/fec6bii624c2lfy/imdbdata.tar.gz In your query replace "likes" with "ACTS_IN" and Starting node "858" for Bruce Willis. To Import the data you have to create a document collection "imdb_vertices" and edge collection "imdb_edges" and you can then use arangoimp to load the data into arangodb.

mchacki Over a year ago

Except of the default indexing for graphs we do not yet offer other graph-specific indices but have plans to add them in the future. E.g. a vertex-centric index is on our roadmap that allows to store an index for pathes of length n for each vertex, where the maximal size of n is configurable. This will give a large performance boost for traversals. If you require something or have other ideas for indices please let us know so than we can add them to the database.

mchacki Over a year ago

Indeed distributed graphs (and traversals) are on our roadmap for this year. We have to finish "general" sharding first though.

|

Collectives™ on Stack Overflow

How to optimize graph traversals in ArangoDB?

1 Answer 1

14 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

14 Comments

Your Answer

Sign up or log in

Post as a guest

Related