-1

Suppose I have a text file that contains data like this:

1  2  3  4  5
6  7  8 
9  10 11 12 13 14
15 16 17 18 19

How do I load it into a numpy array so it looks like this?

[1  2  3  4  5  0
 6  7  8  0  0  0
 9  10 11 12 13 14
 15 16 17 18 19 0 ]

The method I've been using so far involves reading the text file line by line, appending each row to a list, finding the row with the maximum length and padding the remaining rows accordingly.

Could anyone suggest a more efficient way?

Thank you very much!

1 Answer 1

1

Padding a list of lists can be done in various ways, but since you are already reading this from a file, I think the itertools.zip_longest will be a good start.

In [201]: txt = """1  2  3  4  5
     ...: 6  7  8 
     ...: 9  10 11 12 13 14
     ...: 15 16 17 18 19"""

read and parse the text lines:

In [202]: alist = []
In [203]: for line in txt.splitlines():
     ...:     alist.append([int(i) for i in line.split()])
     ...:     
In [204]: alist
Out[204]: [[1, 2, 3, 4, 5], [6, 7, 8], [9, 10, 11, 12, 13, 14], [15, 16, 17, 18, 19]]

zip_longest (here in PY3 form) takes a fillvalue:

In [205]: from itertools import zip_longest
In [206]: list(zip_longest(*alist, fillvalue=0))
Out[206]: 
[(1, 6, 9, 15),
 (2, 7, 10, 16),
 (3, 8, 11, 17),
 (4, 0, 12, 18),
 (5, 0, 13, 19),
 (0, 0, 14, 0)]
In [207]: np.array(_).T
Out[207]: 
array([[ 1,  2,  3,  4,  5,  0],
       [ 6,  7,  8,  0,  0,  0],
       [ 9, 10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19,  0]])

zip(*) can also be used to 'transpose' the list of lists:

In [209]: list(zip(*alist1))
Out[209]: 
[(1, 2, 3, 4, 5, 0),
 (6, 7, 8, 0, 0, 0),
 (9, 10, 11, 12, 13, 14),
 (15, 16, 17, 18, 19, 0)]

I'm guessing you are doing something more like:

In [211]: maxlen = max([len(i) for i in alist])
In [212]: maxlen
Out[212]: 6
In [213]: arr = np.zeros((len(alist), maxlen),int)
In [214]: for row, line in zip(arr, alist):
     ...:     row[:len(line)] = line
     ...:     
In [215]: arr
Out[215]: 
array([[ 1,  2,  3,  4,  5,  0],
       [ 6,  7,  8,  0,  0,  0],
       [ 9, 10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19,  0]])

Which looks pretty good to me.

A regular poster, Divakar, likes to post a solution that uses cumsum. Let's see if I can reproduce it. It involves constructing a 1d mask where the nonzero values are supposed to go. Working backwards we need a mask like:

In [240]: mask=arr.ravel()>0
In [241]: mask
Out[241]: 
array([ True,  True,  True,  True,  True, False,  True,  True,  True,
       False, False, False,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True, False], dtype=bool)

So that:

In [242]: arr.flat[mask] = np.hstack(alist)

There's a trick to this mapping that I haven't quite internalized!


The trick is the broadcast the lengths against [0,1,2,3,4,5]:

In [276]: lens=[len(i) for i in alist]
In [277]: maxlen=max(lens)
In [278]: mask=np.array(lens)[:,None]>np.arange(maxlen)
In [279]: mask
Out[279]: 
array([[ True,  True,  True,  True,  True, False],
       [ True,  True,  True, False, False, False],
       [ True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True, False]], dtype=bool)
In [280]: arr = np.zeros((len(alist), maxlen),int)
In [281]: arr[mask] = np.hstack(alist)
In [282]: arr
Out[282]: 
array([[ 1,  2,  3,  4,  5,  0],
       [ 6,  7,  8,  0,  0,  0],
       [ 9, 10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19,  0]])
Sign up to request clarification or add additional context in comments.

1 Comment

This worked! Thank you very much for answering my question! I wasn't aware that itertools had the zip_longest function! You were right, I was doing something similar to what you mentioned. I'm happy to learn something new!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.