26

what would be the fastest way to merge a list of numpy arrays into one array if one knows the length of the list and the size of the arrays, which is the same for all?

I tried two approaches:

A you can see vstack is faster, but for some reason the first run takes three times longer than the second. I assume this caused by (missing) preallocation. So how would I preallocate an array for vstack? Or do you know a faster methode?

Thanks!

[UPDATE]

I want (25280, 320) not (80, 320, 320) which means, merged_array = array(list_of_arrays) wont work for me. Thanks Joris for pointing that out!!!

Output:

0.547468900681 s merged_array = array(first_list_of_arrays)
0.547191858292 s merged_array = array(second_list_of_arrays)
0.656183958054 s vstack first
0.236850976944 s vstack second

Code:

import numpy
import time
width = 320
height = 320
n_matrices=80

secondmatrices = list()
for i in range(n_matrices):
    temp = numpy.random.rand(height, width).astype(numpy.float32)
    secondmatrices.append(numpy.round(temp*9))

firstmatrices = list()
for i in range(n_matrices):
    temp = numpy.random.rand(height, width).astype(numpy.float32)
    firstmatrices.append(numpy.round(temp*9))


t1 = time.time()
first1=numpy.array(firstmatrices)
print time.time() - t1, "s merged_array = array(first_list_of_arrays)"

t1 = time.time()
second1=numpy.array(secondmatrices)
print time.time() - t1, "s merged_array = array(second_list_of_arrays)"

t1 = time.time()
first2 = firstmatrices.pop()
for i in range(len(firstmatrices)):
    first2 = numpy.vstack((firstmatrices.pop(),first2))
print time.time() - t1, "s vstack first"

t1 = time.time()
second2 = secondmatrices.pop()
for i in range(len(secondmatrices)):
    second2 = numpy.vstack((secondmatrices.pop(),second2))

print time.time() - t1, "s vstack second"
4
  • 2
    Use timeit to do simple performance testing in Python. It produce more accurate results. Commented May 17, 2011 at 12:45
  • 2
    What dimensions you want the merged array to have? Because first1 is (80, 320, 320) and first2 is (25280, 320) Commented May 17, 2011 at 13:02
  • @joris, thanks for pointing that out. I want the second one, which was my initial approach. I will change it in the question. Commented May 17, 2011 at 13:06
  • 2
    Then you need vstack instead of dstack from eumiro's answer. Commented May 17, 2011 at 13:10

1 Answer 1

23

You have 80 arrays 320x320? So you probably want to use dstack:

first3 = numpy.dstack(firstmatrices)

This returns one 80x320x320 array just like numpy.array(firstmatrices) does:

timeit numpy.dstack(firstmatrices)
10 loops, best of 3: 47.1 ms per loop


timeit numpy.array(firstmatrices)
1 loops, best of 3: 750 ms per loop

If you want to use vstack, it will return a 25600x320 array:

timeit numpy.vstack(firstmatrices)
100 loops, best of 3: 18.2 ms per loop
Sign up to request clarification or add additional context in comments.

1 Comment

Hi eurmiro, sorry my question was unclear. I actually need (25280, 320) and not (80, 320, 320). See update of my question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.