19

Is there a function which allows you to efficiently append a NumPy array directly to a DataFrame?

Variables:

df = pd.DataFrame(columns=['col1', 'col2', 'col3'])

Out[1]: +------+------+------+
        | Col1 | Col2 | Col3 |
        +------+------+------+
        |      |      |      |
        +------+------+------+


arr = np.empty(3)

# array is populated with values. Random numbers are chosen in this example,
#    but in my program, the numbers are not arbitrary.
arr[0] = 756
arr[1] = 123
arr[2] = 452

Out[2]: array([756, 123, 452])

How do I directly append arr to the end of dfto get this?

+------+------+------+
| Col1 | Col2 | Col3 |
+------+------+------+
|  756 |  123 |  452 |
+------+------+------+

I've tried using df.append(arr) but it doesn't accept NumPy arrays. I could convert the NumPy array into a DataFrame then append it, but I think that would be very inefficient, especially over millions of iterations. Is there a more efficient way to do it?

5
  • 2
    Use df.loc[len(df)] = arr Commented Oct 8, 2019 at 19:43
  • @rafaelc that's going to slow to a crawl very quickly. it starts to take 10ms per row once your at 100K rows and there's another 900K+ to go Commented Oct 8, 2019 at 19:49
  • @ALollz but not one said there was a for loop and we were appending at every iteration Commented Oct 8, 2019 at 19:50
  • 2
    can you give more information about how you're generating these numbers? Likely the best solution is going to be to preallocate everything, fill it accordingly and then construct the DataFrame at the end. Commented Oct 8, 2019 at 19:51
  • 1
    I was surprised there is no easy way to append a line of data frame into another data frame!!! Commented Aug 23, 2022 at 9:44

4 Answers 4

14

@BalrogOfMoira is that really faster than simply creating the dataframe to append?

df.append(pd.DataFrame(arr.reshape(1,-1), columns=list(df)), ignore_index=True)

Otherwise @Wonton you could simply concatenate arrays then write to a data frame, which could the be appended to the original data frame.

Sign up to request clarification or add additional context in comments.

1 Comment

It's useful if the DataFrame already exists and populated with data.
7

This will work:

df.append(pd.DataFrame(arr).T)

1 Comment

It doesn't work if column index are integer(and column indexes do not start from zero). It adds column with index from zero the smallest index in df.columns, and assign the arr value only to the first len(arr)-th elements.
4

@rafaelc comment can work only if your Pandas DataFrame is indexed from 0 to len(df)-1, so it is not a general workaround and it can easily produce a silent bug in your code.

If you are sure that your Numpy array has the same columns of your Pandas DataFrame you could try using the append function with a dict comprehension as follows:

data_to_append = {}
for i in range(len(df.columns)):
    data_to_append[df.columns[i]] = arr[i]
df = df.append(data_to_append, ignore_index = True)

You need to reassign the DataFrame because append function does not support in-place modification.

I hope it helps.

Comments

0

AttributeError: 'DataFrame' object has no attribute 'append'

From this SEx answer:

As of pandas 2.0, append (previously deprecated) was removed.

You need to use concat instead (for most applications):

df = pd.concat([df, pd.DataFrame([new_row])], ignore_index=True)

... it's also possible to use loc, although this only works if the new index is not already present in the DataFrame (typically, this will be the case if the index is a RangeIndex:

df.loc[len(df)] = new_row # only use with a RangeIndex!

See original answer by mozway: for more details.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.