2

I created a Vector of Vectors, named all_arrays in Julia in this way for a specific purpose:

using DataFrames
using StatsBase

list_of_numbers = 1:17

all_arrays = [zeros(Float64, (17,)) for i in 1:1000]
round = 1
while round != 1001
    random_array = StatsBase.sample(1:17 , length(list_of_numbers))
    random_array = random_array/sum(random_array)

    if (0.0 in random_array) || (random_array in all_arrays)
        continue
    end

    all_arrays[round] = random_array
    round += 1
    println(round)
end

The dimension of all_arrays is:

julia> size(all_arrays)
(1000,)

Then I want to convert all_arrays into a DataFrame with 1000*17 dimensions (Note that each vector in the all_arrays is a (17,) shape Vector). I tried This way:

df = DataFrames.DataFrame(zeros(1000,17) , :auto)
for idx in 1:length(all_arrays)
    df[idx , :] = all_arrays[idx]
end

But I'm looking for a straightforward way for this instead of a for loop and a prebuilt DataFrame! Is there any?

1 Answer 1

3

If you want simple code use (the length of the code is the same as below, but I find it conceptually simpler):

DataFrame(mapreduce(permutedims, vcat, all_arrays), :auto)

For such small data as you described this should be efficient enough.

If you want something faster use:

DataFrame([getindex.(all_arrays, i) for i in 1:17], :auto, copycols=false)

Here is a benchmark:

julia> using BenchmarkTools

julia> @btime DataFrame(mapreduce(permutedims, vcat, $all_arrays), :auto);
  7.257 ms (3971 allocations: 65.22 MiB)

julia> @btime DataFrame([getindex.($all_arrays, i) for i in 1:17], :auto, copycols=false);
  41.000 μs (88 allocations: 140.66 KiB)
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks! I specified that I want to avoid using for loops and a prebuilt DataFrame; instead, I want a straightforward way to do this, like a shorter code. And the answer is precisely what I expected.
Sure - I just was not sure if performance was your objective :). As performance is usually what Julia users ask for. I updated the answer.
Sure it is! I want higher performance when I avoid for loops! And let me tell you smth. I wrote this code in Python and ran it on the Colab. It took more than 1 Hour in Python. But in Julia, it took 0.7 seconds runtime on my machine! It's fascinating!!!
Wait... if the reason "to avoid for loops" is looking for performance, this is not correct. Indeed in Julia for loops have (roughly) the same efficiency than vectorised code. Just use the form that is more intuitive for you/the algorithm. Still there are in Julia possible performance pitfails .Have a look to these tips
for loop is fast if it does not use global variables. The OP solution used global variable, which was slowing it down. One would either need to wrap all in a function, use let or define global variables as const or in Julia 1.8 or later provide type assertions for them.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.