1

After previous discussion and help from the F.Privé I made some changes and the following code is actually doing what is expected to do.

library(purrr)
library(parallel)

p_list = list( "P1" = list( c("MAKM1","MMERMTD","FTRWDSE" )) , 
                  "P2" = list( c("MFFGGDSF1","DFRMDFMMGRSDFG","DSDMFFF")),
                  "P3" = list( c("MDERTDF1","DFRGRSDFMMG","DMMMFFFS")),
                  "P4" = list( c("MERTSDMDF1","SDFRGSSMRSDFG","DFFFM")))


chars <- set_names(c("M", "S", "M"), c("class.1", "class.35", "class.4"))

get_0_and_all_combn <- function(x) {
  map(seq_along(x), function(i) combn(as.list(x), i, simplify = FALSE)) %>%
    unlist(recursive = FALSE) %>% 
    c(0L, .)
}


get_pos_combn <- function(x, chars) {
  x.spl <- strsplit(x, "")[[1]] 

  isUni1 = grep("class.1", names(chars))
  isFirst = grepl("1",x)

  map2(.x=chars, .y=seq_along(chars), .f=function( chr, index ) {

    if( length(isUni1) != 0 ){
      if( index == isUni1 & isFirst == TRUE )
        1 %>% get_0_and_all_combn()
      else{
        which(x.spl == chr) %>%
          get_0_and_all_combn()
      }
    }else{
      which(x.spl == chr) %>%
        get_0_and_all_combn()
    }

  }) %>%
    expand.grid()
}


get_pos_combn_with_infos <- function(seq, chars, p_name) {
  cbind.data.frame(p_name, seq, get_pos_combn(seq, chars))
}

combine_all <- function(p_list, chars){

  i = 1
  fp <- as.data.frame(matrix(ncol = 5))
  colnames(fp) = c("p_name" ,"seq" , names(chars) )

  for(p in p_list){

    p_name = names(p_list)[i]

    for(d in 1:length(p[[1]])){

      seq = p[[1]][d]

      f = get_pos_combn_with_infos(seq, chars, p_name)
      # unlist the list wherever exist in the dataframe and collapse
      # its values with the ":" symbol.
      for(c in 1:nrow(f)){
        if(is.list(f[c,3]))
          f[c,3]=paste(unlist(f[c,3]),collapse=":")
        if(is.list(f[c,4]))
          f[c,4]=paste(unlist(f[c,4]),collapse=":")
        if(is.list(f[c,5]))
          f[c,5]=paste(unlist(f[c,5]),collapse=":")
      }

      fp = na.omit(rbind( f , fp ) )
    }

    i = i + 1
  }

  fp
}


numCores <- detectCores()

results = mcmapply(FUN=combine_all, MoreArgs=list(p_list , chars)  , mc.cores = numCores-1) 

The only thing, one should run is the last function ( combine_all() ), giving as inputs the p_list and chars variables .

If this is done, the result is a data.frame that contains all possible combinations of all possible combinations of the positions inside the strings (p_list) of characters defined in the chars variable

I know it's a little bit complicated but I don't know another way to explain the results.

Anyway. Because my actual list (p_list) is larger enough than the one in the example above I thought to make it run in parallel mode at more than one CPU cores at a time.

For that purpose as you can see I used the parallel package. I run it in a linux box (because as I understood mcmapply uses fork to create other processes), but the truth is that i didn't got any result, except an empty list.

Any idea maybe to improve the algorithm or to make it run in parallel is welcome.

Thank you.

1 Answer 1

2

Here, the problem is how you use mapply. If you don't supply any arguments to vectorize over (the ...), it is normal that it returns a list of length 0.

I will use foreach because it's easier to work with. You can see this guide for parallelism in R with foreach.

Then combine_all becomes

combine_all <- function(p_list, chars) {

  p_names <- names(p_list)

  all_all_f <- foreach(i = seq_along(p_list)) %dopar% {

    p <- p_list[[i]][[1]]
    p_name <- p_names[i]

    all_f <- foreach(d = seq_along(p)) %do% {

      f <- get_pos_combn_with_infos(p[d], chars, p_name)
      # unlist the list wherever exist in the dataframe and collapse
      # its values with the ":" symbol.
      for(c in 1:nrow(f)){
        if(is.list(f[c,3]))
          f[c,3]=paste(unlist(f[c,3]),collapse=":")
        if(is.list(f[c,4]))
          f[c,4]=paste(unlist(f[c,4]),collapse=":")
        if(is.list(f[c,5]))
          f[c,5]=paste(unlist(f[c,5]),collapse=":")
      }

      f
    }

    do.call("rbind", all_f)
  }

  do.call("rbind", all_all_f)
}

Then you do

library(foreach)
doParallel::registerDoParallel(parallel::detectCores() - 1)
the_res_you_want <- combine_all(p_list = p_list, chars = chars)
doParallel::stopImplicitCluster()

On Linux and Mac, this is registering fork clusters (mc-like). On windows, this code is likely to not work.

PS1: beware that your data frame can be quite large if you parallelize over lots of elements.

PS2: you should really keep the data frames with column-lists rather than collapsing them into strings. See http://r4ds.had.co.nz/many-models.html#list-columns-1.

Sign up to request clarification or add additional context in comments.

2 Comments

To emulate Windows behavior on Linux / macOS, use doParallel::registerDoParallel(cl <- parallel::makeCluster(2L)). Indeed, it chokes on missing objects ("globals").
But, using the doFuture backend things will work the same on all platforms (Linux, macOS, and Windows) and on all backends (not just forked ones). So, try the following with Florian's example above and it'll work: library("doFuture"); registerDoFuture(); plan(multiprocess). For other type of parallel backend, see the main vignette of cran.r-project.org/package=future

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.