Python retrieve row index of a Dataframe

Question

Could I ask how to retrieve an index of a row in a DataFrame? Specifically, I am able to retrieve the index of rows from a df.loc.

idx = data.loc[data.name == "Smith"].index

I can even retrieve row index from df.loc by using data.index like this:

idx = data.loc[data.index == 5].index

However, I cannot retrieve the index directly from the row itself (i.e., from row.index, instead of df.loc[].index). I tried using these codes:

idx = data.iloc[5].index

The result of this code is the column names.

To provide context, the reason I need to retrieve the index of a specific row (instead of rows from df.loc) is to use df.apply for each row. I plan to use df.apply to apply a code to each row and copy the data from the row immediately above them.

def retrieve_gender (row):
    # This is a panel data, whose only data in 2000 is already keyed in. Time-invariant data in later years are the same as those in 2000.
    if row["Year"] == 2000:
        pass
    elif row["Year"] == 2001: # To avoid complexity, let's use only year 2001 as example.
        idx = row.index # This is wrong code.
        row["Gender"] = row.iloc[idx-1]["Gender"]
    return row["Gender"]


data["Gender"] = data.apply(retrieve_gender, axis=1)

SimbaPK · Accepted Answer · 2018-11-05 08:30:19Z

1

With Pandas you can loop through your dataframe like this :

for index in range(len(df)): 
    if df.loc[index,'year'] == "2001":
        df.loc[index,'Gender'] = df.loc[index-1 ,'Gender']

answered Nov 5, 2018 at 8:30

SimbaPK

5961 gold badge9 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

John Over a year ago

I actually wrote a retrieve_data(df) that uses iterrows(), instead of retrieve_data(row), and it worked. But I 'm just curious just in case. So there is no way this can be done by df.apply to each individual row, isn't there?

jpp · Accepted Answer · 2018-11-05 09:45:38Z

`apply` gives series indexed by column labels

The problem with idx = data.iloc[5].index is data.iloc[5] converts a row to a pd.Series object indexed by column labels.

In fact, what you are asking for is impossible via pd.DataFrame.apply because the series that feeds your retrieve_gender function does not include any index identifier.

Use vectorised logic instead

With Pandas row-wise logic is inefficient and not recommended; it involves a Python-level loop. Use columnwise logic instead. Taking a step back, it seems you wish to implement 2 rules:

If Year is not 2001, leave Gender unchanged.
If Year is 2001, use Gender from previous row.

`np.where` + `shift`

For the above logic, you can use np.where with pd.Series.shift:

data['Gender'] = np.where(data['Year'] == 2001, data['Gender'].shift(), data['Gender'])

`mask` + `shift`

Alternatively, you can use mask + shift:

data['Gender'] = data['Gender'].mask(data['Year'] == 2001, data['Gender'].shift())

Collectives™ on Stack Overflow

Python retrieve row index of a Dataframe

2 Answers 2

1 Comment

`apply` gives series indexed by column labels

Use vectorised logic instead

`np.where` + `shift`

`mask` + `shift`

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

apply gives series indexed by column labels

Use vectorised logic instead

np.where + shift

mask + shift

Comments

Your Answer

Sign up or log in

Post as a guest

Related

`apply` gives series indexed by column labels

`np.where` + `shift`

`mask` + `shift`