0

Suppose I have 3 dataframes that are wrapped in a list. The dataframes are:

df_1 = pd.DataFrame({'text':['a','b','c','d','e'],'num':[2,1,3,4,3]})
df_2 = pd.DataFrame({'text':['f','g','h','i','j'],'num':[1,2,3,4,3]})
df_3 = pd.DataFrame({'text':['k','l','m','n','o'],'num':[6,5,3,1,2]})

The list of the dfs is:

df_list = [df_1, df_2, df_3]

Now I want to make a for loop such that goes on df_list, and for each df takes the text column and merge them on a new dataframe with a new column head called topic. Now since each text column is different from each dataframe I want to populate the headers as topic_1, topic_2, etc. The desired outcome should be as follow:

  topic_1 topic_2 topic_3
0       a       f       k
1       b       g       l
2       c       h       m
3       d       i       n
4       e       j       o

I can easily extract the text columns as:

lst = []
for i in range(len(df_list)):
    lst.append(df_list[i]['text'].tolist())

It is just that I am stuck on the last part, namely bringing the columns into 1 df without using brute force.

0

2 Answers 2

2

You can extract the wanted columns with a list comprehension and concat them:

pd.concat([d['text'].rename(f'topic_{i}')
           for i,d in enumerate(df_list, start=1)],
          axis=1)

output:

  topic_1 topic_2 topic_3
0       a       f       k
1       b       g       l
2       c       h       m
3       d       i       n
4       e       j       o
Sign up to request clarification or add additional context in comments.

Comments

1

Generally speaking you want to avoid looping anything on a pandas DataFrame. However, in this solution I do use a loop to rename your columns. This should work assuming you just have these 3 dataframes:

import pandas as pd

df_1 = pd.DataFrame({'text':['a','b','c','d','e'],'num':[2,1,3,4,3]})
df_2 = pd.DataFrame({'text':['f','g','h','i','j'],'num':[1,2,3,4,3]})
df_3 = pd.DataFrame({'text':['k','l','m','n','o'],'num':[6,5,3,1,2]})

df_list = [df_1.text, df_2.text, df_3.text]
df_combined = pd.concat(df_list,axis=1)
df_combined.columns = [f"topic_{i+1}" for i in range(len(df_combined.columns))]
>>> df_combined
  topic_1 topic_2 topic_3
0       a       f       k
1       b       g       l
2       c       h       m
3       d       i       n
4       e       j       o

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.