Pandas DataFrame stack multiple column values into single column

Question

Assuming the following DataFrame:

  key.0 key.1 key.2  topic
1   abc   def   ghi      8
2   xab   xcd   xef      9

How can I combine the values of all the key.* columns into a single column 'key', that's associated with the topic value corresponding to the key.* columns? This is the result I want:

   topic  key
1      8  abc
2      8  def
3      8  ghi
4      9  xab
5      9  xcd
6      9  xef

Note that the number of key.N columns is variable on some external N.

ouroboros1 · Accepted Answer · 2025-03-21 07:36:08Z

60

You can melt your dataframe:

>>> keys = [c for c in df if c.startswith('key.')]
>>> pd.melt(df, id_vars='topic', value_vars=keys, value_name='key')

   topic variable  key
0      8    key.0  abc
1      9    key.0  xab
2      8    key.1  def
3      9    key.1  xcd
4      8    key.2  ghi
5      9    key.2  xef

It also gives you the source of the key.

From v0.20, melt is a first class function of the pd.DataFrame class:

>>> df.melt('topic', value_name='key').drop('variable', axis=1)

   topic  key
0      8  abc
1      9  xab
2      8  def
3      9  xcd
4      8  ghi
5      9  xef

edited Mar 21 at 7:36

ouroboros1

15.2k7 gold badges49 silver badges59 bronze badges

answered Dec 19, 2015 at 22:55

Alexander

111k32 gold badges212 silver badges208 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

miraculixx · Accepted Answer · 2015-12-19 23:42:51Z

7

After trying various ways, I find the following is more or less intuitive, provided stack's magic is understood:

# keep topic as index, stack other columns 'against' it
stacked = df.set_index('topic').stack()
# set the name of the new series created
df = stacked.reset_index(name='key')
# drop the 'source' level (key.*)
df.drop('level_1', axis=1, inplace=True)

The resulting dataframe is as required:

   topic  key
0      8  abc
1      8  def
2      8  ghi
3      9  xab
4      9  xcd
5      9  xef

You may want to print intermediary results to understand the process in full. If you don't mind having more columns than needed, the key steps are set_index('topic'), stack() and reset_index(name='key').

edited Dec 19, 2015 at 23:42

answered Dec 19, 2015 at 23:09

miraculixx

10.4k2 gold badges43 silver badges63 bronze badges

2 Comments

ilyas patanam Over a year ago

I can't seem to find any documentation on the name argument for reset_index, could you explain how it works?

miraculixx Over a year ago

it's the Series.reset_index()

BENY · Accepted Answer · 2017-09-15 13:07:54Z

7

OK , cause one of the current answer is mark as duplicated of this question, I will answer here.

By Using wide_to_long

pd.wide_to_long(df, ['key'], 'topic', 'age').reset_index().drop('age',1)
Out[123]: 
   topic  key
0      8  abc
1      9  xab
2      8  def
3      9  xcd
4      8  ghi
5      9  xef

answered Sep 15, 2017 at 13:07

BENY

324k22 gold badges176 silver badges250 bronze badges

Collectives™ on Stack Overflow

Pandas DataFrame stack multiple column values into single column

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related