183

I have a dataframe with some columns like this:

A   B   C  
0   
4
5
6
7
7
6
5

The possible range of values in A are only from 0 to 7.

Also, I have a list of 8 elements like this:

List=[2,5,6,8,12,16,26,32]  //There are only 8 elements in this list

If the element in column A is n, I need to insert the n th element from the List in a new column, say 'D'.

How can I do this in one go without looping over the whole dataframe?

The resulting dataframe would look like this:

A   B   C   D
0           2
4           12
5           16
6           26
7           32
7           32
6           26
5           16

Note: The dataframe is huge and iteration is the last option option. But I can also arrange the elements in 'List' in any other data structure like dict if necessary.

2
  • 1
    I think you needs a (smaller) toy example, with the desired result. It sounds a little vague atm. Commented Oct 31, 2014 at 3:12
  • 39
    Never ever call a variable "List". In any language. Commented Jun 9, 2019 at 2:29

6 Answers 6

445

Just assign the list directly:

df['new_col'] = mylist

Alternative
Convert the list to a series or array and then assign:

se = pd.Series(mylist)
df['new_col'] = se.values

or

df['new_col'] = np.array(mylist)
Sign up to request clarification or add additional context in comments.

4 Comments

pykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy """Entry point for launching an IPython kernel.
@sparrow will using pd.Series effect the dtype? I mean will it leave floats as floats and strings as strings? Or will the elements within the list default to strings?
@IlyaRusin, it's a false positive which can be ignored in this case. For more info: stackoverflow.com/questions/20625582/…
This can be simplified to: df['new_col'] = pd.Series(mylist).values
61

IIUC, if you make your (unfortunately named) List into an ndarray, you can simply index into it naturally.

>>> import numpy as np
>>> m = np.arange(16)*10
>>> m[df.A]
array([  0,  40,  50,  60, 150, 150, 140, 130])
>>> df["D"] = m[df.A]
>>> df
    A   B   C    D
0   0 NaN NaN    0
1   4 NaN NaN   40
2   5 NaN NaN   50
3   6 NaN NaN   60
4  15 NaN NaN  150
5  15 NaN NaN  150
6  14 NaN NaN  140
7  13 NaN NaN  130

Here I built a new m, but if you use m = np.asarray(List), the same thing should work: the values in df.A will pick out the appropriate elements of m.


Note that if you're using an old version of numpy, you might have to use m[df.A.values] instead-- in the past, numpy didn't play well with others, and some refactoring in pandas caused some headaches. Things have improved now.

2 Comments

Hi @DSM. I get what you are saying but I am getting this error: Traceback (most recent call last): File "./b.py", line 24, in <module> d["D"] = m[d.A] IndexError: unsupported iterator index
@mane: urf, that's an old numpy bug. Does d["D"] = m[d.A.values] work for you?
20

A solution improving on the great one from @sparrow.

Let df, be your dataset, and mylist the list with the values you want to add to the dataframe.

Let's suppose you want to call your new column simply, new_column

First make the list into a Series:

column_values = pd.Series(mylist)

Then use the insert function to add the column. This function has the advantage to let you choose in which position you want to place the column. In the following example we will position the new column in the first position from left (by setting loc=0)

df.insert(loc=0, column='new_column', value=column_values)

1 Comment

This will not work if you changed your indexes of df to something other then 1,2,3... in that case you have to add between the lines: column_values.index=df.index
10

Old question; but I always try to use fastest code!

I had a huge list with 69 millions of uint64. np.array() was fastest for me.

df['hashes'] = hashes
Time spent: 17.034842014312744

df['hashes'] = pd.Series(hashes).values
Time spent: 17.141014337539673

df['key'] = np.array(hashes)
Time spent: 10.724546194076538

Comments

8

First let's create the dataframe you had, I'll ignore columns B and C as they are not relevant.

df = pd.DataFrame({'A': [0, 4, 5, 6, 7, 7, 6,5]})

And the mapping that you desire:

mapping = dict(enumerate([2,5,6,8,12,16,26,32]))

df['D'] = df['A'].map(mapping)

Done!

print df

Output:

   A   D
0  0   2
1  4  12
2  5  16
3  6  26
4  7  32
5  7  32
6  6  26
7  5  16

4 Comments

I think the OP knows how to do this already. By my reading the issue is constructing D from the elements of A and List ("If the element in column A is n, I need to insert the n th element from the List in a new column, say 'D'.")
SO has turned into some kind of F(*& nanny state. Thanks to @DSM for the comment but I couldn't correct the post untill it was peer reviewed. and then it was rejected because it was too fast. and then I was able to peer review my own edit. and then it's too late because a worse (IMHO) answer was "accepted". SO is really got some meta-nanny's who are less than helpful!!!!
Well, I can't speak for the nannies, but you'll find that your approach is about an order of magnitude slower on long arrays. In other respects, of course, choosing between np.array(List)[df.A] and df["A"].map(dict(enumerate(List))) is mostly a matter of preference.
Hi Phil, I only saw your solution and DSM's comment and then never got back to it since DSM's solution worked fine for me. But now looking at your solution, it works too. I have run DSM's solution on my dataset of about 200k entries and it runs in a couple of seconds with all the other calculations that I have. I am totally new to python-pandas and personally was not looking for anything elegant or great; whatever worked was fine. But honestly, thanks for the solution.
7

You can also use df.assign:

In [1559]: df
Out[1559]: 
   A   B   C
0  0 NaN NaN
1  4 NaN NaN
2  5 NaN NaN
3  6 NaN NaN
4  7 NaN NaN
5  7 NaN NaN
6  6 NaN NaN
7  5 NaN NaN

In [1560]: mylist = [2,5,6,8,12,16,26,32]

In [1567]: df = df.assign(D=mylist)

In [1568]: df
Out[1568]: 
   A   B   C   D
0  0 NaN NaN   2
1  4 NaN NaN   5
2  5 NaN NaN   6
3  6 NaN NaN   8
4  7 NaN NaN  12
5  7 NaN NaN  16
6  6 NaN NaN  26
7  5 NaN NaN  32

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.