2

I'm trying to add two DataFrames together in Python having first set their index column to equal one of the existing columns.

Using the top rated method in the following thread gives an error though:

(see- Adding two pandas dataframes)

Here is a simple example of the problem:

import pandas as pd
import numpy as np

a = np.array([['A',1.,2.,3.],['B',1.,2.,3.],['C',1.,2.,3.]])
a = pd.DataFrame(a)
a = a.set_index(0)

a 

     1    2    3
0               
A  1.0  2.0  3.0
B  1.0  2.0  3.0
C  1.0  2.0  3.0

b = np.array([['A',1.,2.,3.],['B',1.,2.,3.]])
b = pd.DataFrame(b)
b.set_index(0)

b

     1    2    3
0               
A  1.0  2.0  3.0
B  1.0  2.0  3.0

df_add = a.add(b,fill_value=1)

And the error:

Traceback (most recent call last):

  File "<ipython-input-150-885d92411f6c>", line 1, in <module>
    df_add = a.add(b,fill_value=1)

  File "/home/anaconda3/lib/python3.6/site-packages/pandas/core/ops.py", line 1234, in f
    return self._combine_frame(other, na_op, fill_value, level)

  File "/home/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 3490, in _combine_frame
    result = _arith_op(this.values, other.values)

  File "/home/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 3459, in _arith_op
    return func(left, right)

  File "/home/anaconda3/lib/python3.6/site-packages/pandas/core/ops.py", line 1195, in na_op
    result[mask] = op(xrav, yrav)

TypeError: must be str, not int

Any help on preventing this problem would be greatly appreciated.

1 Answer 1

0

Problem is in defined DataFrame - all data are converted to strings in 2d numpy array:

a = np.array([['A',1.,2.,3.],['B',1.,2.,3.],['C',1.,2.,3.]])
print (a)
[['A' '1.0' '2.0' '3.0']
 ['B' '1.0' '2.0' '3.0']
 ['C' '1.0' '2.0' '3.0']]

Solution is remove strings values and specify index by list:

a = np.array([[1.,2.,3.],[1.,2.,3.],[1.,2.,3.]])
a = pd.DataFrame(a, index=list('ABC'))

b = np.array([[1.,2.,3.],[1.,2.,3.]])
b = pd.DataFrame(b, index=list('AB'))

df_add = a.add(b,fill_value=1)
print (df_add)
     0    1    2
A  2.0  4.0  6.0
B  2.0  4.0  6.0
C  2.0  3.0  4.0

Or convert DataFrames after setting index to floats:

a = np.array([['A',1.,2.,3.],['B',1.,2.,3.],['C',1.,2.,3.]])
a = pd.DataFrame(a)
a = a.set_index(0).astype(float)

b = np.array([['A',1.,2.,3.],['B',1.,2.,3.]])
b = pd.DataFrame(b)
b = b.set_index(0).astype(float)

df_add = a.add(b,fill_value=1)
print (df_add)
     1    2    3
0               
A  2.0  4.0  6.0
B  2.0  4.0  6.0
C  2.0  3.0  4.0
Sign up to request clarification or add additional context in comments.

2 Comments

Alternatively the dataframes can be converted using "convert_objects(convert_numeric=True)" but this functionality is deprecated.
@user8188120 - or a = a.set_index(0).astype(float)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.