Scalable Python Shortest Path on Large DAG with Multiple Walkers

Ask Question

Asked 1 year, 8 months ago

Modified 1 year, 8 months ago

Viewed 55 times

I have a large (~10 million edges and some ~100k nodes) Directed Acyclic Graph (DAG) and a list of walkers (around ~30k), and each of these walker has an origin and destination node which get connected in the DAG with some additional computation to determine edges (usually ~50 edges per walker). For each walker, I need then to efficiently compute the shortest path, possibly in a scalable parallel manner.

I am fairly fluent only in Python, so I tried a first approach using Dask for parallelization. This worked well enough for the computation of the additional edges; but when doing the actual shortest path computation, it is crucial that I use graph-tool's fast C++ implementation of the algorithm optimized for DAGs.

Here unfortunately it gets a bit unclear to me how to efficiently handle sharing the "base" large DAG among Dask Workers, and then do the node/edges adding and shortest path computing function mapped to each walker.

Also, I am currently working with a single Linux VM with 8 threads and I think enough memory to hold all the computations for each thread, but I can use multiple VMs and I would like to enjoy the possible speed up.

Edit:

What I would do as a for loop in Python is:

import graph_tool as gt

# instantiate the graph_tool graph
g = gt.Graph()

# add edges with weight
g.ep['weight'] = g.new_edge_property('double')
g.add_edge_list(edges[['src','dst','weight']].values,eprops=[g.ep['weight']])

for walker in walkers:
    origin_node       = g.add_vertex()
    destination_node  = g.add_vertex() 
    origin_edges      = function_to_compute_origin_edges(walker)
    destination_edges = function_to_compute_destination_edges(walker)
    g.add_edge_list([origin_edges,destination_edges])

    vertex_list,edge_list = gt.topology.shortest_path(g,origin_node,destination_node,weights=g.ep['weight'],dag=True)

    g.remove_vertices([origin_node,destination_node])

Since every computation is independent of the others, I would like to parallelize this instead of using a for loop.

edited Mar 25, 2024 at 1:47

asked Mar 24, 2024 at 4:23

Alessandro Zanoli

113 bronze badges

What specific "shortest path computation"? There are many different algorithms for that. The most common one is A* which simply often scales really poorly (multiple threads does not really help much). There are variants but they may not find the best path. Even the Dijkstra's algorithm often does not scale so well. The reason is that nodes needs to be protected with atomics/mutexes and this kind of protection introduce a huge overhead and causes threads to stall when they work on the same node. 8 threads is not so much (especially with 4 cores), but 100k node either...

Jérôme Richard
– Jérôme Richard

2024-03-24 12:01:16 +00:00
Commented Mar 24, 2024 at 12:01
do you have a minimal reproducible example to show?

darren
– darren

2024-03-24 16:06:01 +00:00
Commented Mar 24, 2024 at 16:06
@JérômeRichard I did not really look into the C++ part of graph-tool yet, but I think internally, when setting the flag debug=True, it generates a topological sorting of the DAG to achieve shortest path computation in O(|V+E|) time. I tried implementing it in pure Python but the performance is much worse so I would like to either use graph-tool's libraries or find another fast way of coding it @D.L. I wrote down what I want to do in a Python for loop that I want to parallelize

Alessandro Zanoli
– Alessandro Zanoli

2024-03-25 01:50:15 +00:00
Commented Mar 25, 2024 at 1:50

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Scalable Python Shortest Path on Large DAG with Multiple Walkers

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest