0

I was just tinkering around and found this amusing:

>>> import pandas as pd
>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> x = set(df)
>>> x
{'col2', 'col1'}

Why does pandas return column names as set values?

5
  • Same goes for tuple(), list(). Commented Oct 11, 2018 at 20:05
  • Because iterating directly over a dataframes iterates over it's column names. Commented Oct 11, 2018 at 20:06
  • I was just checking out the DataFrame class and was trying to find implementation for __iter__ method but couldn't find it. I am sorry if this is a stupid question. I am learning. Commented Oct 11, 2018 at 20:12
  • It makes a little more sense if you consider that a dataframe is a dict-like container of Series, with column names as keys and series as values. When you iterate over a dict it iterates over the keys Commented Oct 11, 2018 at 20:14
  • @Floydian you can find it in it's base class NDFrame Commented Oct 11, 2018 at 20:15

2 Answers 2

1

Because that's how __iter__ is defined in the source code for NDFrame, of which pd.DataFrame is a child:

def __iter__(self):
    """Iterate over infor axis"""
    return iter(self._info_axis)

pd.DataFrame._info_axis is used internally to store column labels:

df = pd.DataFrame(columns=list('abcd'))

df._info_axis # Index(['a', 'b', 'c', 'd'], dtype='object')

set iterates the pd.DataFrame instance via __iter__, hashes each element, and returns a set of values corresponding to unique column labels.

Sign up to request clarification or add additional context in comments.

Comments

1

You can find the implementation for __iter__ in DataFrame's parent class NDFrame:

def __iter__(self):
    """Iterate over infor axis"""
    return iter(self._info_axis)

It's essentially the same as calling keys on a DataFrame, defined in the same location. I'm including it here because the docstring is more helpful, and describes the differences in _info_axis between Series, DataFrame and Panel

def keys(self):
    """Get the 'info axis' (see Indexing for more)
    This is index for Series, columns for DataFrame and major_axis for
    Panel.
    """
    return self._info_axis

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.