Could I ask how to retrieve an index of a row in a DataFrame? Specifically, I am able to retrieve the index of rows from a df.loc.
idx = data.loc[data.name == "Smith"].index
I can even retrieve row index from df.loc by using data.index like this:
idx = data.loc[data.index == 5].index
However, I cannot retrieve the index directly from the row itself (i.e., from row.index, instead of df.loc[].index). I tried using these codes:
idx = data.iloc[5].index
The result of this code is the column names.
To provide context, the reason I need to retrieve the index of a specific row (instead of rows from df.loc) is to use df.apply for each row. I plan to use df.apply to apply a code to each row and copy the data from the row immediately above them.
def retrieve_gender (row):
# This is a panel data, whose only data in 2000 is already keyed in. Time-invariant data in later years are the same as those in 2000.
if row["Year"] == 2000:
pass
elif row["Year"] == 2001: # To avoid complexity, let's use only year 2001 as example.
idx = row.index # This is wrong code.
row["Gender"] = row.iloc[idx-1]["Gender"]
return row["Gender"]
data["Gender"] = data.apply(retrieve_gender, axis=1)