I have a large network with 17,765 nodes and 7,4876 edges. I'm using igraph to run most of my analysis. I got stuck on finding the number of shortest paths for different pairs of nodes (around 1 million pairs). I don't need the paths, only their counts for each pair (how many exist). To do so, I'm iterating through the node pairs using a parallelized strategy together with the all_shortest_paths() function. It works for subsets of a few thousand node pairs; however, it is extremely slow, and I don't know how to optimize it. The code can be found below:
library(igraph)
library(doParallel)
library(foreach)
count_paths <- function(g,start,end) {
#create the cluster
my.cluster <- parallel::makeCluster(
n.cores,
type = "PSOCK")
doParallel::registerDoParallel(my.cluster)
foreach(i=1:length(start),.combine = "c") %dopar% {
length(igraph::all_shortest_paths(g,
from = start[i],
to=end[i],
mode = "all")[["res"]])
}
}
counts<-count_paths(graph_directed,names(v_start),names(v_end))
stopCluster(my.cluster)
I have opted for the "all" option in the all_shortest_paths() because I'm treating my graph as undirected.
Thanks in advance for your help :)
all_shortest_pathsby including as many end vertices in thetoparameter as possible. There is a non-negligible overhead to each call, so reducing the number of calls should give you a very noticeable performance boost. There is currently no direct way to only get the number of paths, but not the paths themselves. While theigraph_get_all_shortest_pathsfunction in the igraph C library is able to return only the counts of paths, but not the paths themselves, internally it always computes paths.