0

I want to merge two dataframes together; one which is an empty dataframe having a column header and the other one is a dataframe of size 18 x 600.

What I tried:

userQuestionVector1 = pd.read_csv("embedding1_3.csv")
userQuestionVector2 = pd.read_csv("embedding2_3.csv")
userQuestionVector = pd.concat([userQuestionVector1,userQuestionVector2],axis=1)
new_df = pd.DataFrame(columns=[vector])
df_userQuestionVector = new_df.append(userQuestionVector)
print(df_userQuestionVector)

Over here, vector is a list of 600 strings.

['word2vec_q1_1', 'word2vec_q1_2', 'word2vec_q1_3', ..., 'word2vec_q1_300', 'word2vec_q2_1', ..., 'word2vec_q2_300']

Dimension of new_df is 0 x 600.

Dimension of userQuestionVector1 and userQuestionVector2 are 18 x 300.

Dimension of userQuestionVector is 18 x 600.

The output df_userQuestionVector is 18 x 1200 in dimension i.e., it is merging the two dataframes side by side leaving second half with NaN values.

  value1_1 value1_2 value1_3 ... value1_300 string1 string2 string3 ... string300
0 value2_1 value2_2 value2_3 ... value2_300  NaN     NaN     NaN   ...     NaN
1 value3_1 value3_2 value3_3 ... value3_300  NaN     NaN     NaN   ...     NaN
2 value4_1 value4_2 value4_3 ... value4_300  NaN     NaN     NaN   ...     NaN
.   .       .       .            .       .       .            .
.   .       .       .            .       .       .            .

The expected output should be 18 X 600 in dimension i.e., df_userQuestionVector should merge below new_df.

   string1  string2  string3  ... string300
0  value1_1 value1_2 value1_3 ... value1_300
1  value2_1 value2_2 value2_3 ... value2_300
2  value3_1 value3_2 value3_3 ... value3_300
.   .       .       .            .       .    
.   .       .       .            .       .       

I also tried:

frames=[new_df, userQuestionVector]
df_userQuestionVector = pd.concat(frames,axis=0)

But this gives me same result.

How should I solve this problem? Thank you.

4
  • What's in vector? Why not just using append with the 2 dataframes? Commented Jul 17, 2017 at 11:59
  • @gionni vector is a list of 600 strings. Look at my updated question. Commented Jul 17, 2017 at 12:05
  • @gionni Which two dataframes you are mentioning? Commented Jul 17, 2017 at 12:06
  • Ignore it, sorry, I was misinterpreting the question :) Commented Jul 17, 2017 at 12:10

1 Answer 1

2

While reading the csv set the header to None and Instead of creating a new_df dataframe set the userQuestionVector dataframe columns to vector i.e change the code to

userQuestionVector1 = pd.read_csv("embedding1_3.csv", header= None)
userQuestionVector2 = pd.read_csv("embedding2_3.csv", header = None)
userQuestionVector = pd.concat([userQuestionVector1,userQuestionVector2],axis=1)
userQuestionVector.columns = vector

Hope this helps.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.