Add column in dataframe from list

Question

I have a dataframe with some columns like this:

The possible range of values in A are only from 0 to 7.

Also, I have a list of 8 elements like this:

List=[2,5,6,8,12,16,26,32]  //There are only 8 elements in this list

If the element in column A is n, I need to insert the n th element from the List in a new column, say 'D'.

How can I do this in one go without looping over the whole dataframe?

The resulting dataframe would look like this:

A   B   C   D
0           2
4           12
5           16
6           26
7           32
7           32
6           26
5           16

Note: The dataframe is huge and iteration is the last option option. But I can also arrange the elements in 'List' in any other data structure like dict if necessary.

I think you needs a (smaller) toy example, with the desired result. It sounds a little vague atm. — Andy Hayden
– Andy Hayden, Commented Oct 31, 2014 at 3:12

sparrow · Accepted Answer · 2019-04-11 19:53:55Z

445

Just assign the list directly:

df['new_col'] = mylist

Alternative
Convert the list to a series or array and then assign:

se = pd.Series(mylist)
df['new_col'] = se.values

or

df['new_col'] = np.array(mylist)

edited Apr 11, 2019 at 19:53

answered Jul 20, 2016 at 20:58

sparrow

11.6k12 gold badges61 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

franchb Over a year ago

pykernel_launcher.py:1: SettingWithCopyWarning:  A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead  See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy   """Entry point for launching an IPython kernel.

3kstc Over a year ago

@sparrow will using pd.Series effect the dtype? I mean will it leave floats as floats and strings as strings? Or will the elements within the list default to strings?

sparrow Over a year ago

@IlyaRusin, it's a false positive which can be ignored in this case. For more info: stackoverflow.com/questions/20625582/…

smartse Over a year ago

This can be simplified to: df['new_col'] = pd.Series(mylist).values

edge-case · Accepted Answer · 2018-09-26 13:36:11Z

61

IIUC, if you make your (unfortunately named) List into an ndarray, you can simply index into it naturally.

>>> import numpy as np
>>> m = np.arange(16)*10
>>> m[df.A]
array([  0,  40,  50,  60, 150, 150, 140, 130])
>>> df["D"] = m[df.A]
>>> df
    A   B   C    D
0   0 NaN NaN    0
1   4 NaN NaN   40
2   5 NaN NaN   50
3   6 NaN NaN   60
4  15 NaN NaN  150
5  15 NaN NaN  150
6  14 NaN NaN  140
7  13 NaN NaN  130

Here I built a new m, but if you use m = np.asarray(List), the same thing should work: the values in df.A will pick out the appropriate elements of m.

Note that if you're using an old version of numpy, you might have to use m[df.A.values] instead-- in the past, numpy didn't play well with others, and some refactoring in pandas caused some headaches. Things have improved now.

edited Sep 26, 2018 at 13:36

edge-case

1,3642 gold badges17 silver badges33 bronze badges

answered Oct 31, 2014 at 3:18

DSM

355k67 gold badges606 silver badges504 bronze badges

2 Comments

mane Over a year ago

Hi @DSM. I get what you are saying but I am getting this error: Traceback (most recent call last): File "./b.py", line 24, in <module> d["D"] = m[d.A] IndexError: unsupported iterator index

DSM Over a year ago

@mane: urf, that's an old numpy bug. Does d["D"] = m[d.A.values] work for you?

erip · Accepted Answer · 2018-07-10 14:21:52Z

20

A solution improving on the great one from @sparrow.

Let df, be your dataset, and mylist the list with the values you want to add to the dataframe.

Let's suppose you want to call your new column simply, new_column

First make the list into a Series:

column_values = pd.Series(mylist)

Then use the insert function to add the column. This function has the advantage to let you choose in which position you want to place the column. In the following example we will position the new column in the first position from left (by setting loc=0)

df.insert(loc=0, column='new_column', value=column_values)

edited Jul 10, 2018 at 14:21

erip

17.1k11 gold badges73 silver badges131 bronze badges

answered Dec 7, 2017 at 11:39

Salvatore Cosentino

7,3506 gold badges19 silver badges25 bronze badges

1 Comment

Guy s Over a year ago

This will not work if you changed your indexes of df to something other then 1,2,3... in that case you have to add between the lines: column_values.index=df.index

Mehdi · Accepted Answer · 2019-10-17 11:52:35Z

10

Old question; but I always try to use fastest code!

I had a huge list with 69 millions of uint64. np.array() was fastest for me.

df['hashes'] = hashes
Time spent: 17.034842014312744

df['hashes'] = pd.Series(hashes).values
Time spent: 17.141014337539673

df['key'] = np.array(hashes)
Time spent: 10.724546194076538

answered Oct 17, 2019 at 11:52

Mehdi

1,19716 silver badges14 bronze badges

Comments

Toby Seo · Accepted Answer · 2014-10-31 04:04:44Z

8

First let's create the dataframe you had, I'll ignore columns B and C as they are not relevant.

df = pd.DataFrame({'A': [0, 4, 5, 6, 7, 7, 6,5]})

And the mapping that you desire:

mapping = dict(enumerate([2,5,6,8,12,16,26,32]))

df['D'] = df['A'].map(mapping)

Done!

print df

Output:

edited Oct 31, 2014 at 4:04

Toby Seo

5172 gold badges5 silver badges14 bronze badges

answered Oct 31, 2014 at 3:36

Phil Cooper

5,8871 gold badge27 silver badges41 bronze badges

4 Comments

DSM Over a year ago

I think the OP knows how to do this already. By my reading the issue is constructing D from the elements of A and List ("If the element in column A is n, I need to insert the n th element from the List in a new column, say 'D'.")

Phil Cooper Over a year ago

SO has turned into some kind of F(*& nanny state. Thanks to @DSM for the comment but I couldn't correct the post untill it was peer reviewed. and then it was rejected because it was too fast. and then I was able to peer review my own edit. and then it's too late because a worse (IMHO) answer was "accepted". SO is really got some meta-nanny's who are less than helpful!!!!

DSM Over a year ago

Well, I can't speak for the nannies, but you'll find that your approach is about an order of magnitude slower on long arrays. In other respects, of course, choosing between np.array(List)[df.A] and df["A"].map(dict(enumerate(List))) is mostly a matter of preference.

mane Over a year ago

Hi Phil, I only saw your solution and DSM's comment and then never got back to it since DSM's solution worked fine for me. But now looking at your solution, it works too. I have run DSM's solution on my dataset of about 200k entries and it runs in a couple of seconds with all the other calculations that I have. I am totally new to python-pandas and personally was not looking for anything elegant or great; whatever worked was fine. But honestly, thanks for the solution.

Mayank Porwal · Accepted Answer · 2021-01-20 06:42:01Z

7

You can also use df.assign:

In [1559]: df
Out[1559]: 
   A   B   C
0  0 NaN NaN
1  4 NaN NaN
2  5 NaN NaN
3  6 NaN NaN
4  7 NaN NaN
5  7 NaN NaN
6  6 NaN NaN
7  5 NaN NaN

In [1560]: mylist = [2,5,6,8,12,16,26,32]

In [1567]: df = df.assign(D=mylist)

In [1568]: df
Out[1568]: 
   A   B   C   D
0  0 NaN NaN   2
1  4 NaN NaN   5
2  5 NaN NaN   6
3  6 NaN NaN   8
4  7 NaN NaN  12
5  7 NaN NaN  16
6  6 NaN NaN  26
7  5 NaN NaN  32

answered Jan 20, 2021 at 6:42

Mayank Porwal

34.2k9 gold badges45 silver badges65 bronze badges

Collectives™ on Stack Overflow

Add column in dataframe from list

6 Answers 6

4 Comments

2 Comments

1 Comment

Comments

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

4 Comments

2 Comments

1 Comment

Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related