Add A 1-D Numpy Array to DataFrame as a Row

Question

Is there a function which allows you to efficiently append a NumPy array directly to a DataFrame?

Variables:

df = pd.DataFrame(columns=['col1', 'col2', 'col3'])

Out[1]: +------+------+------+
        | Col1 | Col2 | Col3 |
        +------+------+------+
        |      |      |      |
        +------+------+------+


arr = np.empty(3)

# array is populated with values. Random numbers are chosen in this example,
#    but in my program, the numbers are not arbitrary.
arr[0] = 756
arr[1] = 123
arr[2] = 452

Out[2]: array([756, 123, 452])

How do I directly append arr to the end of dfto get this?

+------+------+------+
| Col1 | Col2 | Col3 |
+------+------+------+
|  756 |  123 |  452 |
+------+------+------+

I've tried using df.append(arr) but it doesn't accept NumPy arrays. I could convert the NumPy array into a DataFrame then append it, but I think that would be very inefficient, especially over millions of iterations. Is there a more efficient way to do it?

@rafaelc that's going to slow to a crawl very quickly. it starts to take 10ms per row once your at 100K rows and there's another 900K+ to go — ALollz
– ALollz, Commented Oct 8, 2019 at 19:49
@ALollz but not one said there was a for loop and we were appending at every iteration — rafaelc
– rafaelc, Commented Oct 8, 2019 at 19:50
can you give more information about how you're generating these numbers? Likely the best solution is going to be to preallocate everything, fill it accordingly and then construct the DataFrame at the end. — ALollz
– ALollz, Commented Oct 8, 2019 at 19:51
I was surprised there is no easy way to append a line of data frame into another data frame!!! — Farzad Amirjavid
– Farzad Amirjavid, Commented Aug 23, 2022 at 9:44

Guillaume · Accepted Answer · 2020-06-01 14:45:52Z

14

@BalrogOfMoira is that really faster than simply creating the dataframe to append?

df.append(pd.DataFrame(arr.reshape(1,-1), columns=list(df)), ignore_index=True)

Otherwise @Wonton you could simply concatenate arrays then write to a data frame, which could the be appended to the original data frame.

edited Jun 1, 2020 at 14:45

Guillaume

1,8021 gold badge26 silver badges42 bronze badges

answered Oct 8, 2019 at 20:05

braulio

5713 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Guillaume Over a year ago

It's useful if the DataFrame already exists and populated with data.

Mehdi Shafiei · Accepted Answer · 2020-05-21 04:45:24Z

7

This will work:

df.append(pd.DataFrame(arr).T)

answered May 21, 2020 at 4:45

Mehdi Shafiei

791 silver badge1 bronze badge

1 Comment

Muccagelato Over a year ago

It doesn't work if column index are integer(and column indexes do not start from zero). It adds column with index from zero the smallest index in df.columns, and assign the arr value only to the first len(arr)-th elements.

BalrogOfMoria · Accepted Answer · 2019-10-08 19:54:29Z

4

@rafaelc comment can work only if your Pandas DataFrame is indexed from 0 to len(df)-1, so it is not a general workaround and it can easily produce a silent bug in your code.

If you are sure that your Numpy array has the same columns of your Pandas DataFrame you could try using the append function with a dict comprehension as follows:

data_to_append = {}
for i in range(len(df.columns)):
    data_to_append[df.columns[i]] = arr[i]
df = df.append(data_to_append, ignore_index = True)

You need to reassign the DataFrame because append function does not support in-place modification.

I hope it helps.

answered Oct 8, 2019 at 19:54

BalrogOfMoria

1347 bronze badges

Comments

DrWhat · Accepted Answer · 2024-10-16 12:34:51Z

0

AttributeError: 'DataFrame' object has no attribute 'append'

From this SEx answer:

As of pandas 2.0, append (previously deprecated) was removed.

You need to use concat instead (for most applications):

df = pd.concat([df, pd.DataFrame([new_row])], ignore_index=True)

... it's also possible to use loc, although this only works if the new index is not already present in the DataFrame (typically, this will be the case if the index is a RangeIndex:

df.loc[len(df)] = new_row # only use with a RangeIndex!

See original answer by mozway: for more details.

answered Oct 16, 2024 at 12:34

DrWhat

2,4906 gold badges25 silver badges38 bronze badges

Collectives™ on Stack Overflow

Add A 1-D Numpy Array to DataFrame as a Row

4 Answers 4

1 Comment

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related