1

Let's say df is a typical pandas.DataFrame instance, I am trying to understand how come list(df) would return a list of column names.

The goal here is for me to track it down in the source code to understand how list(<pd.DataFrame>) returns a list of column names.

So far, the best resources I've found are the following:

  • Get a list from Pandas DataFrame column headers
    • Summary: There are multiple ways of getting a list of DataFrame column names, and each varies either in performance or idiomatic convention.
  • SO Answer
    • Summary: DataFrame follows a dict-like convention, thus coercing with list() would return a list of the keys of this dict-like structure.
  • pandas.DataFrame source code:
    • I can't find within the source code that point to how list() would create a list of column head names.
2
  • I don't know anything about pandas, but I would look for an __iter__() method on the DataFrame class. Commented Feb 27, 2022 at 17:46
  • Thanks, seems like pd.DataFrame class inherited the __iter__() method from its parent, NDFrame. Commented Mar 8, 2022 at 23:55

2 Answers 2

2

DataFrames are iterable. That's why you can pass them to the list constructor.

list(df) is equivalent to [c for c in df]. In both cases, DataFrame.__iter__ is called.

When you iterate over a DataFrame, you get the column names.

Why? Because the developers probably thought this is a nice thing to have.

Looking at the source, __iter__ returns an iterator over the attribute _info_axis, which seems to be the internal name of the columns.

Sign up to request clarification or add additional context in comments.

Comments

1

Actually, as you have correctly stated in your question. One can think of a pandas dataframe as a list of lists (or more correctly a dict like object).

Take a look at this code which takes a dict and parses it into a df.

import pandas as pd

# create a dataframe
d = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
df = pd.DataFrame(d)

print(df)

x = list(df)
print(x)

x = list(d)
print(x)

The result in both cases (for the dataframe df and the dict d) is this:

['col1', 'col2']
['col1', 'col2']

This result confirms your thinking that a "DataFrame follows a dict-like convention" .

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.