I have access to a large CPU cluster that does not have GPUs. Is it possible to speed up YOLO training by parallelizing between multiple CPU nodes?
The docs say that device parameter specifies the computational device(s) for training: a single GPU (device=0), multiple GPUs (device=0,1), CPU (device=cpu), or MPS for Apple silicon (device=mps).
What about multiple CPUs?
Add a comment
|
1 Answer
You can use torch.set_num_threads(int) (docs) to control how many CPU processes pytorch uses to execute operations.
1 Comment
Artem Lebedev
Did not work for me. Just tried. It runs on 16 threads no matter what I set, and of these 16 only 8 actually compute.