0

Essentially I'm following the example in the guide

(posting only the relevant part, the full example is at this repo

embryos = [fertilising_room(population_model) for _ in 1:POPULATION_SIZE]

chunks = Iterators.partition(embryos, length(embryos) ÷ nthreads())
tasks = map(chunks) do chunk
    @spawn get_offspring(chunk)
end
all_offspring = vcat([fetch(task) for task in tasks]...)

@info "All offspring -> $(length(all_offspring))"

This is taking this much:

 % time julia examples/multi_only_crossover.jl 
  Activating project at `~/Code/julia/BraveNewAlgorithm.jl`
WARNING: using Distances.pairwise in module BraveNewAlgorithm conflicts with an existing identifier.
[ Info: Number of threads -> 1
[ Info: Reading parameters file
[ Info: All offspring -> 1000000
julia examples/multi_only_crossover.jl  7,25s user 0,31s system 114% cpu 6,629 total
% time julia --threads 2 examples/multi_only_crossover.jl
  Activating project at `~/Code/julia/BraveNewAlgorithm.jl`
WARNING: using Distances.pairwise in module BraveNewAlgorithm conflicts with an existing identifier.
[ Info: Number of threads -> 2
[ Info: Reading parameters file
[ Info: All offspring -> 1000000
julia --threads 2 examples/multi_only_crossover.jl  7,36s user 0,36s system 118% cpu 6,508 total
% time julia --threads 4 examples/multi_only_crossover.jl
  Activating project at `~/Code/julia/BraveNewAlgorithm.jl`
WARNING: using Distances.pairwise in module BraveNewAlgorithm conflicts with an existing identifier.
[ Info: Number of threads -> 4
[ Info: Reading parameters file
[ Info: All offspring -> 1000000
julia --threads 4 examples/multi_only_crossover.jl  7,88s user 0,35s system 134% cpu 6,139 total

Am I doing something wrong here?

5
  • 2
    Your problem appears not to benefit from multithreading. It is probably limited by some other more fundamental and slower resource like memory management or disk IO. Profiling the code to see where it is actually spending its time may be useful. Might be worth checking if the Warning is trying to tell you something important too... Commented Jul 30 at 19:46
  • 1
    Does it still happen if you choose a less ambitious population size like 1000 or 10000? I suspect that you may be running into virtual memory limitations here unless you have a lot of ram. Commented Jul 31 at 7:48
  • @MartinBrown I do have a lot of RAM... yes, it happens all across the board... Commented Jul 31 at 10:35
  • 1
    @jjmerelo This is not dependent of the amount of RAM unless you actually run out of RAM and the swap is used instead (which drastically reduce performance, even in sequential). If the code is memory bound, then what matter is the memory throughput. The later is basically 8 * ram_frequency * number_of_channel bytes/s for modern (DDR4/DDR5) RAM DIMMs. See this article to get information about RAM on Linux. RAM Throughput is a common bottleneck on most parallel codes (especially with many threads). But yes, profiling is critical here. Commented Jul 31 at 17:12
  • 1
    @jjmerelo As to know if you have enough available RAM (or whether the slow swap storage is used instead), a basic htop (or top) command is generally enough on Linux (or the task manager on Windows). Commented Jul 31 at 17:14

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.