2

I have a series s

s = pd.Series([1, 2])

What is an efficient way to make s look like

0    [1]
1    [2]
dtype: object

4 Answers 4

4

Here's one approach that extracts into array and extends to 2D by introducing a new axis with None/np.newaxis -

pd.Series(s.values[:,None].tolist())

Here's a similar one, but extends to 2D by reshaping -

pd.Series(s.values.reshape(-1,1).tolist())

Runtime test using @P-robot's setup -

In [43]: s = pd.Series(np.random.randint(1,10,1000))

In [44]: %timeit pd.Series(np.vstack(s.values).tolist()) # @Nickil Maveli's soln
100 loops, best of 3: 5.77 ms per loop

In [45]: %timeit pd.Series([[a] for a in s]) # @P-robot's soln
1000 loops, best of 3: 412 µs per loop

In [46]: %timeit s.apply(lambda x: [x]) # @mgc's soln
1000 loops, best of 3: 551 µs per loop

In [47]: %timeit pd.Series(s.values[:,None].tolist()) # Approach1
1000 loops, best of 3: 307 µs per loop

In [48]: %timeit pd.Series(s.values.reshape(-1,1).tolist()) # Approach2
1000 loops, best of 3: 306 µs per loop
Sign up to request clarification or add additional context in comments.

5 Comments

Much better approach compared to mine.
@Divakar For the sake of completeness you could add an "Approach3" with pd.Series(memoryview(s.values.reshape(-1,1)).tolist()) as it should even be faster (as least with my current config. it is) to build the Series on the memoryview (probably more visible with a larger sample).
@mgc Is it only for python3? At my end with python 2.7, I am getting this error : "NotImplementedError: tolist() only supports byte views", when trying to do : memoryview(s.values.reshape(-1,1)).tolist().
@Divakar you're right it's only python 3. My bad, i don't use python 2.7 that much. I added "with my config" (but i should have define it : python 3.5.2 / numpy 1.11.1) because I can not guarantee that to be the case across all versions (so finally it might not be such a good candidate as an "Approach 3" for your answer!)
@mgc Still very much appreciate the feedback and for someone with python3 could try it out too!
2

If you want the result to still be a pandas Series you can use the apply method :

In [1]: import pandas as pd

In [2]: s = pd.Series([1, 2])

In [3]: s.apply(lambda x: [x])
Out[3]: 
0    [1]
1    [2]
dtype: object

Comments

1

This does it:

import numpy as np

np.array([[a] for a in s],dtype=object)
array([[1],
       [2]], dtype=object)

Comments

1

Adjusting atomh33ls' answer, here's a series of lists:

output = pd.Series([[a] for a in s])
type(output)
>> pandas.core.series.Series
type(output[0])
>> list

Timings for a selection of the suggestions:

import numpy as np, pandas as pd
s = pd.Series(np.random.randint(1,10,1000))

>> %timeit pd.Series(np.vstack(s.values).tolist())
100 loops, best of 3: 3.2 ms per loop

>> %timeit pd.Series([[a] for a in s])
1000 loops, best of 3: 393 µs per loop

>> %timeit s.apply(lambda x: [x])
1000 loops, best of 3: 473 µs per loop

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.