0

I'm having problems with multiplying values in two different dataframes. Im doing a PCA regression and want to multiply all my loadings with the original values.

for example:

PCA dataframe

PC1 PC2
X 0 1
X1 1 2
X2 2 1
X3 2 1
X4 3 2
X5 5 4

Original dataframe:

A A1 A2 A3 A4 A5
1 1 3 4 1 2 4
2 8 5 3 2 1 2
3 9 3 5 1 3 1

I then want to multiply PC1 with every row in the original dataframe such that:

PC1 = 0xA + 1xA1 + 2xA2 + 2xA3 + 3xA4 + 5xA5

insert first row from second dataframe: PC1 = 0x1 + 3x1 + 4x2 + 2x1 + 3x2 + 5x8 = 59 Second row: PC1 = 0x8 + 5x1 +3x2 + 2x2 + 1x3 + 5x2 = 28 Third row: PC1 = 0x9 + 1x3 + 2x5 + 2x1 + 3x3 + 1x5 = 29

new dataframe:

PC1 PC2
1 59
2 28
3 29

And so on.

My PCA dataframe have the shape (14,4) and my value dataframe has the shape (159,14)

3 Answers 3

2

You are looking for a dot product - which you can get with np.dot

print(df)
    2  3
1       
X   0  1
X1  1  2
X2  2  1
X3  2  1
X4  3  2
X5  5  4
print(xf)
   2  3  4  5  6  7
1                  
1  1  3  4  1  2  4
2  8  5  3  2  1  2
3  9  3  5  1  3  1
print(pd.DataFrame(np.dot(xf, df), columns=['PC1', 'PC2']))
   PC1  PC2
0   39   32
1   28   33
2   29   31
Sign up to request clarification or add additional context in comments.

Comments

2

If same length of first DataFrame and same length of columns names in second DataFrame is possible multiple by numpy array with DataFrame.dot with rename columns names by df1.columns:

df = df2.dot(df1.to_numpy()).rename(columns=dict(enumerate(df1.columns)))
print (df)
   PC1  PC2
1   39   32
2   28   33
3   29   31

Comments

2

Use:

string = """    PC1 PC2
X   0   1
X1  1   2
X2  2   1
X3  2   1
X4  3   2
X5  5   4"""
string2 = """A  A1  A2  A3  A4  A5
1   3   4   1   2   4
8   5   3   2   1   2
9   3   5   1   3   1"""
data1 = [x.split('  ') for x in string.split('\n')]
data2 = [x.split('  ') for x in string2.split('\n')]

df1 = pd.DataFrame(np.array([x[1:] for x in data1[1:]], dtype = float), columns = np.array(data1)[0,1:])
df2 = pd.DataFrame(np.array(data2[1:], dtype = float), columns = data2[0])





#Solution
import numpy as np
pd.DataFrame(np.dot(df2,df1), columns = ['PC1', 'PC2'])

Output:

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.