14

I'm new to Python & Pandas.

I want to find the index of a certain value (let's say security_id) in my pandas dataframe, because that is where the columns start. (There is an unknown number of rows with irrelevant data above the columns, as well as a number of empty 'columns' on the left side.)

As far as I see, the isin method only returns a boolean on whether the value exists, not its index.

How do I find the index of this value?

1

6 Answers 6

19

Get the index for rows matching search term in all columns

search = 'security_id' 
df.loc[df.isin([search]).any(axis=1)].index.tolist()

Rows filtered for matching search term in all columns

search = 'search term' 
df.loc[df.isin([search]).any(axis=1)]
Sign up to request clarification or add additional context in comments.

1 Comment

This is probably the most performant answer, as it uses .loc. Great answer!
3

value you are looking for is not duplicated:

poz=matrix[matrix==minv].dropna(axis=1,how='all').dropna(how='all')
value=poz.iloc[0,0]
index=poz.index.item()
column=poz.columns.item()

you can get its index and column

duplicated:

matrix=pd.DataFrame([[1,1],[1,np.NAN]],index=['q','g'],columns=['f','h'])
matrix
Out[83]: 
   f    h
q  1  1.0
g  1  NaN
poz=matrix[matrix==minv].dropna(axis=1,how='all').dropna(how='all')
index=poz.stack().index.tolist()
index
Out[87]: [('q', 'f'), ('q', 'h'), ('g', 'f')]

you will get a list

Comments

3

A oneliner solution avoiding explicit loops...

  • returning the entire row(s)

    df.iloc[np.flatnonzero((df=='security_id').values)//df.shape[1],:]

  • returning row(s) and column(s)

    df.iloc[ np.flatnonzero((df=='security_id').values)//df.shape[1], np.unique(np.flatnonzero((df=='security_id').values)%df.shape[1]) ]

Comments

2

Supposing that your DataFrame looks like the following :

      0       1            2      3    4
0     a      er          tfr    sdf   34
1    rt     tyh          fgd    thy  rer
2     1       2            3      4    5
3     6       7            8      9   10
4   dsf     wew  security_id   name  age
5   dfs    bgbf          121  jason   34
6  dddp    gpot         5754   mike   37
7  fpoo  werwrw          342   jack   31

Do the following :

for row in range(df.shape[0]): # df is the DataFrame
         for col in range(df.shape[1]):
             if df.get_value(row,col) == 'security_id':
                 print(row, col)
                 break

4 Comments

Thanks, this seems like a solution:) Though is the only way to find the value to iterate over both the rows and columns? Would there be a more efficient method?
No matter what you do, iteration will always be involved. Either you will do it , else Pandas would do it. Internally iteration will always be involved. Moreover iteration stops once, you get the ID. The worst case would be when security_id is the bottom-right element of your DataFrame ( O(mn) ). If the security_id is in the top-left half of the DataFrame, it won't be costly at all.
Moreover , you are asking for data cleaning. So, it is a cheap preprocessing step. Dont try to hyper-optimize everything. Premature optimization is the root of all evils. Remember.
Yeah that makes sense, I thought that might be the case (iteration in any case). Thanks for the explanation.
2

I think this question may have been asked before here. The accepted answer is pretty comprehensive and should help you find the index of a value in a column.

Edit: if the column that the value exists in is not known, then you could use:

for col in df.columns:
    df[df[col] == 'security_id'].index.tolist()

2 Comments

At the given question the column is known. In my case I don't know in which column the value appears. But I agree that it gives the direction to the answer for my question
Ahh, apologies! you could loop over the Column's in the data frame and apply the answer linked above. for col in df.columns: df[df[col] == 'security_id'].index.tolist(). This would also give you all occurrences of what you're looking for.
0

Function finds the positions of a value in a dataframe

import pandas as pd
import numpy as np

def pandasFindPositionsInDataframe(dfIn,findme):
    positions = []
    irow =0
    while ( irow < len(dfIn.index)):
        list_colPositions=dfIn.columns[dfIn.iloc[irow,:]==findme].tolist()   
        if list_colPositions != []:
            colu_iloc = dfIn.columns.get_loc(list_colPositions[0])
            positions.append([irow, colu_iloc])
        irow +=1

    return positions

1 Comment

How can this be made case insensitive?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.