Why does set of a pandas dataframe return column names of the dataframe?

Question

I was just tinkering around and found this amusing:

>>> import pandas as pd
>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> x = set(df)
>>> x
{'col2', 'col1'}

Why does pandas return column names as set values?

Because iterating directly over a dataframes iterates over it's column names. — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Oct 11, 2018 at 20:06
I was just checking out the DataFrame class and was trying to find implementation for __iter__ method but couldn't find it. I am sorry if this is a stupid question. I am learning. — Floydian
– Floydian, Commented Oct 11, 2018 at 20:12
It makes a little more sense if you consider that a dataframe is a dict-like container of Series, with column names as keys and series as values. When you iterate over a dict it iterates over the keys — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Oct 11, 2018 at 20:14

jpp · Accepted Answer · 2018-10-11 20:16:39Z

1

Because that's how __iter__ is defined in the source code for NDFrame, of which pd.DataFrame is a child:

def __iter__(self):
    """Iterate over infor axis"""
    return iter(self._info_axis)

pd.DataFrame._info_axis is used internally to store column labels:

df = pd.DataFrame(columns=list('abcd'))

df._info_axis # Index(['a', 'b', 'c', 'd'], dtype='object')

set iterates the pd.DataFrame instance via __iter__, hashes each element, and returns a set of values corresponding to unique column labels.

answered Oct 11, 2018 at 20:16

jpp

166k37 gold badges301 silver badges362 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

user3483203 · Accepted Answer · 2018-10-11 20:22:40Z

1

You can find the implementation for __iter__ in DataFrame's parent class NDFrame:

def __iter__(self):
    """Iterate over infor axis"""
    return iter(self._info_axis)

It's essentially the same as calling keys on a DataFrame, defined in the same location. I'm including it here because the docstring is more helpful, and describes the differences in _info_axis between Series, DataFrame and Panel

def keys(self):
    """Get the 'info axis' (see Indexing for more)
    This is index for Series, columns for DataFrame and major_axis for
    Panel.
    """
    return self._info_axis

edited Oct 11, 2018 at 20:22

answered Oct 11, 2018 at 20:17

user3483203

51.3k10 gold badges72 silver badges104 bronze badges

Collectives™ on Stack Overflow

Why does set of a pandas dataframe return column names of the dataframe?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related