9

Suppose I have a list contains un-equal length lists.

a = [ [ 1, 2, 3], [2], [2, 4] ]

What is the best way to obtain a zero padding numpy array with standard shape?

zero_a = [ [1, 2, 3], [2, 0, 0], [2, 4, 0] ]

I know I can use list operation like

n = max( map( len, a ) )
map( lambda x : x.extend( [0] * (n-len(x)) ), a )
zero_a = np.array(zero_a)

but I was wondering is there any easy numpy way to do this work?

5
  • Have you made any attempts Commented Nov 9, 2013 at 16:29
  • 1
    @megawac I update my question. I am trying to find alternative numpy method. Commented Nov 9, 2013 at 16:38
  • +1 to the question because I've wanted something like this before myself, and couldn't think of anything clean enough. (I sometimes use pd.DataFrame(a).fillna(0).values, but I've been on a pandas kick for a while. There should really be something numpy-native.) Commented Nov 9, 2013 at 17:01
  • there is a pad function in numpy 1.7 Commented Nov 9, 2013 at 17:08
  • 3
    @alko: true, but the first thing it does is call narray = np.array(array) on the argument, which if it's a list of lists with varying lengths will give us an array with dtype=object and lists as elements. It's good for padding existing ndarrays, but I can't see how to get it to help here. Commented Nov 9, 2013 at 17:15

2 Answers 2

7

As numpy have to know size of an array just prior to its initialization, best solution would be a numpy based constructor for such case. Sadly, as far as I know, there is none.

Probably not ideal, but slightly faster solution will be create numpy array with zeros and fill with list values.

import numpy as np
def pad_list(lst):
    inner_max_len = max(map(len, lst))
    map(lambda x: x.extend([0]*(inner_max_len-len(x))), lst)
    return np.array(lst)

def apply_to_zeros(lst, dtype=np.int64):
    inner_max_len = max(map(len, lst))
    result = np.zeros([len(lst), inner_max_len], dtype)
    for i, row in enumerate(lst):
        for j, val in enumerate(row):
            result[i][j] = val
    return result

Test case:

>>> pad_list([[ 1, 2, 3], [2], [2, 4]])
array([[1, 2, 3],
       [2, 0, 0],
       [2, 4, 0]])

>>> apply_to_zeros([[ 1, 2, 3], [2], [2, 4]])
array([[1, 2, 3],
       [2, 0, 0],
       [2, 4, 0]])

Performance:

>>> timeit.timeit('from __main__ import pad_list as f; f([[ 1, 2, 3], [2], [2, 4]])', number = 10000)
0.3937079906463623
>>> timeit.timeit('from __main__ import apply_to_zeros as f; f([[ 1, 2, 3], [2], [2, 4]])', number = 10000)
0.1344289779663086
Sign up to request clarification or add additional context in comments.

Comments

2

Not strictly a function from numpy, but you could do something like this

from itertools import izip, izip_longest
import numpy
a=[[1,2,3], [4], [5,6]]
res1 = numpy.array(list(izip(*izip_longest(*a, fillvalue=0))))

or, alternatively:

res2=numpy.array(list(izip_longest(*a, fillvalue=0))).transpose()

If you use python 3, use zip, and itertools.zip_longest.

1 Comment

nice solution, but ties with manual padding on my machine (as expected -- key downside is generation of new list)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.