Essentially I'm following the example in the guide
(posting only the relevant part, the full example is at this repo
embryos = [fertilising_room(population_model) for _ in 1:POPULATION_SIZE]
chunks = Iterators.partition(embryos, length(embryos) ÷ nthreads())
tasks = map(chunks) do chunk
@spawn get_offspring(chunk)
end
all_offspring = vcat([fetch(task) for task in tasks]...)
@info "All offspring -> $(length(all_offspring))"
This is taking this much:
% time julia examples/multi_only_crossover.jl
Activating project at `~/Code/julia/BraveNewAlgorithm.jl`
WARNING: using Distances.pairwise in module BraveNewAlgorithm conflicts with an existing identifier.
[ Info: Number of threads -> 1
[ Info: Reading parameters file
[ Info: All offspring -> 1000000
julia examples/multi_only_crossover.jl 7,25s user 0,31s system 114% cpu 6,629 total
% time julia --threads 2 examples/multi_only_crossover.jl
Activating project at `~/Code/julia/BraveNewAlgorithm.jl`
WARNING: using Distances.pairwise in module BraveNewAlgorithm conflicts with an existing identifier.
[ Info: Number of threads -> 2
[ Info: Reading parameters file
[ Info: All offspring -> 1000000
julia --threads 2 examples/multi_only_crossover.jl 7,36s user 0,36s system 118% cpu 6,508 total
% time julia --threads 4 examples/multi_only_crossover.jl
Activating project at `~/Code/julia/BraveNewAlgorithm.jl`
WARNING: using Distances.pairwise in module BraveNewAlgorithm conflicts with an existing identifier.
[ Info: Number of threads -> 4
[ Info: Reading parameters file
[ Info: All offspring -> 1000000
julia --threads 4 examples/multi_only_crossover.jl 7,88s user 0,35s system 134% cpu 6,139 total
Am I doing something wrong here?
8 * ram_frequency * number_of_channelbytes/s for modern (DDR4/DDR5) RAM DIMMs. See this article to get information about RAM on Linux. RAM Throughput is a common bottleneck on most parallel codes (especially with many threads). But yes, profiling is critical here.htop(ortop) command is generally enough on Linux (or the task manager on Windows).