10

In R 3.0.2 on Linux 3.12.0, I am using the system() function to execute a number of tasks. The desired effect is for each of these tasks to run as they would if I had executed them on the command-line via Rscript outside of R system().

However, when executing them inside R via system(), each task is tied to the same single CPU from the master R process.

In other words:

When launched via RScript directly from a bash shell, outside of R, each task runs on its own core as possible (this is desired)

When launched inside R via system(), each task runs on the same single core. There is no multicore sharing. If I have 100 tasks, they are all stuck on one core.

I cannot figure out how to spawn a process inside of R so that each process will use its own core.

I am using a simple test to consume CPU cycles so I can measure the effect using top/htop:

dd if=/dev/urandom bs=32k count=1000 | bzip2 -9 >> /dev/null

When this simple test is launched outside of R multiple times, each iteration gets its own core. But when I launch it inside of R:

system("dd if=/dev/urandom bs=32k count=2000 | bzip2 -9 >> /dev/null", ignore.stdout=TRUE,ignore.stderr=TRUE,wait=FALSE)

They are all stuck on a single core.

Here is a visualization after running 4 simultaneous/concurrent iterations of system().

enter image description here

Please help me, I need to be able to tell R to launch new tasks, with each of them running in their own core.

UPDATE DEC 4 2013:

I tried a test in Python using this:

import thread
thread.start_new_thread(os.system,("/bin/dd if=/dev/urandom of=/dev/null bs=32k count=2000",))

I repeated the new thread several times, and as expected everything worked (multiple cores used, one per thread).

So I think install the rPython package in R, and try the same from within R:

python.exec("import thread")
python.exec("thread.start_new_thread(os.system,('/bin/dd if=/dev/urandom of=/dev/null bs=32k count=2000',))")

Unfortunately, once again it was limited to a single core even after repeated calls. Why is it that everything launched is limited to a single core when executed from R?

9
  • I think impossible without using add-on package or at least parallel package. You find here more explanations. Commented Dec 2, 2013 at 9:26
  • 2
    Have you tried GNU parallel on your system? Or perhaps if you are running 4 processes you could try using xargs in your launch script with the P - 4 '4 maxprocs' option to try and force parallel execution?? Commented Dec 2, 2013 at 9:27
  • @agstudy, I have tried parallel package. I couldn't even get that to work correctly, so I don't know if somehow my Debian install of R 3.0.2 x64 is somehow hosed, or what. Parallel still was limited to a single core. Commented Dec 2, 2013 at 10:00
  • @StephenHenderson, sorry mate, I don't see how either of those would work in this case. The actual commands I am generating with system() are each unique. Commented Dec 2, 2013 at 10:06
  • OK if they are genuinely unique e.g. diff commands it can't work but often one runs through files running the same command e.g zipping (your example) them or similar in which case you can replace a loop with a parallel or xargs -P command on the list of filenames. That said I never tried it I gen don't know if it works...I have though run multiple Rscripts in parallel from a bash shell. Commented Dec 2, 2013 at 10:15

2 Answers 2

7
+50

Following on @agstudy's comment, you should get parallel to work first. On my system, this uses multiple cores:

f<-function(x)system("dd if=/dev/urandom bs=32k count=2000 | bzip2 -9 >> /dev/null", ignore.stdout=TRUE,ignore.stderr=TRUE,wait=FALSE)
library(parallel)
mclapply(1:4,f,mc.cores=4)

I would have wrote this in a comment myself, but it is too long. I know you have said that you have tried the parallel package, but I wanted to confirm that you are using it correctly. If it doesn't work, can you confirm that a non-system call uses mclapply correctly, like this one?

a<-mclapply(rep(1e8,4),rnorm,mc.cores=4)

Reading your comments, I suspect that your pthreads Linux package is out of date and broken. On my system, I am using libpthread-2.15.so (not 2.13). If you're on Ubuntu, you can grab the latest with apt-get install libpthread-stubs0.

Also, note that you should be using parallel, not multicore. If you look at the docs for parallel, you'll note that they have incorporated the work on multicore.


Reading your next set of comments, I must insist that it is parallel and not multicore that has been included in R since 2.14. You can read about this on the CRAN Task View.

Getting parallel to work is crucial. I previously told you that you could compile it directly from source, but this is not correct. I guess the only way to recompile it would be to compile R from source.

Can you also verify that your CPU affinity is set correctly? Also can you check if R can detect the number of cores? Just run:

library(parallel)
mcaffinity()
# Should be c(1,2,3,4) for you.
detectCores()
# Should be 4 for you.
Sign up to request clarification or add additional context in comments.

15 Comments

Hi, thanks for trying. The first block of code spawns four processes, I can see them on htop etc. But locked to just a single core like the example screenshot. Your second example (non-system call) also used just 100% of a single core. So now that we've got this new info, can you give me insight as to why my parallel library is not working?
New info... just tried it via CLI R instead of RStudio, and it gave me a segfault. Here is the pastebin: pastebin.com/1SWhH4Zd -- in addition, I checked the kern log and found this: [586018.637080] rsession[28883]: segfault at 7f1e1eeda9d0 ip 00007f1e23912d8c sp 00007fff484ab730 error 4 in libpthread-2.13.so[7f1e2390b000+17000]. I did a quick google but not seeing anyone else with this issue with R. But it appears to be the culprit, if I only knew why.
I just removed the parallel package, multicore, foreach, doParallel, doSNOW. Then I reinstalled multicore, which I think is what R 3.0.2 should use instead of parallel. It includes mclapply. Still no joy, same segfault at the CLI and single core. I cannot find anyone else that has run into this issue so not sure where to go next.
@user1530260 I have updated my answer, I suspect your pthreads is broken.
Could be related to OpenBlas. It is possible that the different R sessions called OpenBlas, itself diverting all workload to a single core. See grokbase.com/t/r/r-sig-hpc/124qe5gmwn/parallel-and-openblas
|
2

I tested running:

system("dd if=/dev/urandom bs=32k count=2000 | bzip2 -9 >> /dev/null", ignore.stdout=TRUE,ignore.stderr=TRUE,wait=FALSE)
system("dd if=/dev/urandom bs=32k count=2000 | bzip2 -9 >> /dev/null", ignore.stdout=TRUE,ignore.stderr=TRUE,wait=FALSE)
system("dd if=/dev/urandom bs=32k count=2000 | bzip2 -9 >> /dev/null", ignore.stdout=TRUE,ignore.stderr=TRUE,wait=FALSE)
system("dd if=/dev/urandom bs=32k count=2000 | bzip2 -9 >> /dev/null", ignore.stdout=TRUE,ignore.stderr=TRUE,wait=FALSE)

on Linux 2.6.32 with R 3.0.2 and on Linux 3.8.0 with R 2.15.2. In both cases it takes up 4 CPU cores (as you would expect).

-- Edit --

I installed Linux 3.12 on a Virtual Box machine, and here R 3.0.2 also does what I expect: Takes up 4 CPUs. It even slowly wanders between the CPUs - so each process does not stick to the same CPU but changes every second or so.

This leads me to believe your system as some local modifications that forces R to use only one CPU.

From your description I would guess the local modifications are in R and not system wide (since your Python has no problems spawning more processes).

The modifications could be on your user alone, so create a new user and try with that. If it works for the new user, we need to figure out what your userid has installed.

If it does not work for the new user, it could be globally installed R libraries that causes the problem. Install an older R version and try that out. If the older version works, your R 3.0.2 installation is probably broken. Remove it and re-install it.

4 Comments

I already completely reinstalled R from scratch after a complete purge. It made no difference. I am currently in the process of deploying a brand new physical server and will see if that solves it. Will take another couple days due to time constraints on my side. I cannot imagine what "modification" has occurred though in this particular config that is causing it to be bound to a single CPU.
would you please confirm whether or not you did any special apt-get install's for libpthread or etc, or just used library(parallel) which is built-in for R 3.0.2? Can you show me a list of installed packages with pthread in name, because there does seem to be some sort of link on my system - the segfault of libpthread.so + the limit to one core must be related.
I used the CRAN version of R. I did not do any special install of pthreads and did not use library(parallel). All I did was: Add CRAN to sources.list; apt-get install r-base-core; R (enter) Paste the 4 lines.
I was unable to build the new physical server yet due to winter storm and UPS couldn't deliver (and probably can't until Tuesday now). Since it has become clear that the issue is with my configuration, somehow, I will continue working to that end (new system, reinstallation etc). Thank you for helping prove that it works, that is what is important.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.