2

I have a data set like this,

data = np.array([[ 5, 31, 61],
                 [ 10, 31, 67],
                 [ 15, 31, 69],
                 [ 4, 31, 72],
                 [ 14, 31, 73],
                 [ 21, 31, 77],
                 [ 19, 31, 78]])

I want to convert it into arrays in list for every single row. I tried,

np.split(data,len(data))

#[array([[ 5, 31, 61]]),
# array([[10, 31, 67]]),
# array([[15, 31, 69]]),
# array([[ 4, 31, 72]]),
# array([[14, 31, 73]]),
# array([[21, 31, 77]]),
# array([[19, 31, 78]])]

But as you can see it gives double [ to me. What I simply want is;

[np.array([5, 31, 61]),
np.array([10, 31, 67]),
np.array([15, 31, 69]),
np.array([4, 31, 72]),
np.array([14, 31, 73]),
np.array([21, 31, 77]),
np.array([19, 31, 78])]
4
  • 1
    Wonder why you would need a format like this ? You can simply use a loop to get what you need [data[i] for i in range(len(data))] Commented Feb 10, 2022 at 6:57
  • The format is necessary for a script I use in somewhere else. I always stop myself from using loops in python. That is why I tried np.split. But I think sometimes it is good to use it since it does the job simply in here. Thanks for the answer. Commented Feb 10, 2022 at 7:04
  • 1
    Split does a loop. It's making a list. Have you tried list(data)? Commented Feb 10, 2022 at 7:31
  • np.split works too but it seems not to be optimized for OP's case. [*data] is optimized the best. Commented Feb 10, 2022 at 13:56

2 Answers 2

1

What about taking advantage of unpacking?

lst = [*data]

or:

lst = list(data)

output:

[array([ 5, 31, 61]),
 array([10, 31, 67]),
 array([15, 31, 69]),
 array([ 4, 31, 72]),
 array([14, 31, 73]),
 array([21, 31, 77]),
 array([19, 31, 78])]
Sign up to request clarification or add additional context in comments.

Comments

1

np.split could be applied too but you are required to do it one-dimensionally. So you might like to create a one-dimensional view of your data first:

%%timeit
data_ravel = data.ravel()
out = np.split(data_ravel, len(data))
>>> 14.5 µs ± 337 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Note that creating a view is costless (it took 0.13 µs on my computer)

Internally, it's being done like so:

out = []
div_points = range(0, data.size+1, data.shape[1])
start = div_points[:-1]
end = div_points[1:]
out = list(data_ravel[i:j] for i,j in zip(start, end))
>>> 2.31 µs ± 44.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Note that it's faster because I'd doing a couple of optimisations here:

  • using range instead of np.array
  • using lazy list comprehension instead of list.append

However, it can't compete with classical methods like in @mozway 's answer. They are optimal:

%%timeit
out = [*data]
>>> 902 ns ± 8.09 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%%timeit
out = list(data)
>>> 979 ns ± 12.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%%timeit
out = [n for n in data]
>>> 1.04 µs ± 14.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%%timeit
out = list(n for n in data)
>>> 1.37 µs ± 80.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.