I have a large (~10 million edges and some ~100k nodes) Directed Acyclic Graph (DAG) and a list of walkers (around ~30k), and each of these walker has an origin and destination node which get connected in the DAG with some additional computation to determine edges (usually ~50 edges per walker). For each walker, I need then to efficiently compute the shortest path, possibly in a scalable parallel manner.
I am fairly fluent only in Python, so I tried a first approach using Dask for parallelization. This worked well enough for the computation of the additional edges; but when doing the actual shortest path computation, it is crucial that I use graph-tool's fast C++ implementation of the algorithm optimized for DAGs.
Here unfortunately it gets a bit unclear to me how to efficiently handle sharing the "base" large DAG among Dask Workers, and then do the node/edges adding and shortest path computing function mapped to each walker.
Also, I am currently working with a single Linux VM with 8 threads and I think enough memory to hold all the computations for each thread, but I can use multiple VMs and I would like to enjoy the possible speed up.
Edit:
What I would do as a for loop in Python is:
import graph_tool as gt
# instantiate the graph_tool graph
g = gt.Graph()
# add edges with weight
g.ep['weight'] = g.new_edge_property('double')
g.add_edge_list(edges[['src','dst','weight']].values,eprops=[g.ep['weight']])
for walker in walkers:
origin_node = g.add_vertex()
destination_node = g.add_vertex()
origin_edges = function_to_compute_origin_edges(walker)
destination_edges = function_to_compute_destination_edges(walker)
g.add_edge_list([origin_edges,destination_edges])
vertex_list,edge_list = gt.topology.shortest_path(g,origin_node,destination_node,weights=g.ep['weight'],dag=True)
g.remove_vertices([origin_node,destination_node])
Since every computation is independent of the others, I would like to parallelize this instead of using a for loop.
A*which simply often scales really poorly (multiple threads does not really help much). There are variants but they may not find the best path. Even the Dijkstra's algorithm often does not scale so well. The reason is that nodes needs to be protected with atomics/mutexes and this kind of protection introduce a huge overhead and causes threads to stall when they work on the same node. 8 threads is not so much (especially with 4 cores), but 100k node either...debug=True, it generates a topological sorting of the DAG to achieve shortest path computation in O(|V+E|) time. I tried implementing it in pure Python but the performance is much worse so I would like to either use graph-tool's libraries or find another fast way of coding it @D.L. I wrote down what I want to do in a Python for loop that I want to parallelize