While Loop Alternative in Python

Question

I am working on a huge dataframe and trying to create a new column, based on a condition in another column. Right now, I have a big while-loop and this calculation takes too much time, is there an easier way to do it?

With lambda for example?:

def promo(dataframe, a):  
    i=0
    while i < len(dataframe)-1:
        i=i+1
        if dataframe.iloc[i-1,5] >= a:
            dataframe.iloc[i-1,6] = 1
        else:
            dataframe.iloc[i-1,6] = 0

    return dataframe

it's easier to understand if you provide some explanation of trying to create a new column, based on a condition in another column — Sociopath
– Sociopath, Commented Sep 11, 2018 at 6:56
If you're using loops with dataframe rows, you're usually doing something wrong. Please show example input and output — OneCricketeer
– OneCricketeer, Commented Sep 11, 2018 at 6:58
@cricket_007 particularly indexing for-loops. If you have to loop over rows, then you should at least use itertuples. — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Sep 11, 2018 at 7:02

M.T · Accepted Answer · 2018-09-11 08:07:54Z

2

Don't use loops in pandas, they are slow compared to a vectorized solution - convert boolean mask to integers by astype True, False are converted to 1, 0:

dataframe = pd.DataFrame({'A':list('abcdef'),
                   'B':[4,5,4,5,5,4],
                   'C':[7,8,9,4,2,3],
                   'D':[1,3,5,7,1,0],
                   'E':list('aaabbb'),
                   'F':[5,3,6,9,2,4],
                   'G':[5,3,6,9,2,4]
})

a = 5
dataframe['new'] = (dataframe.iloc[:,5] >= a).astype(int)
print (dataframe)
   A  B  C  D  E  F  G  new
0  a  4  7  1  a  5  5    1
1  b  5  8  3  a  3  3    0
2  c  4  9  5  a  6  6    1
3  d  5  4  7  b  9  9    1
4  e  5  2  1  b  2  2    0
5  f  4  3  0  b  4  4    0

If you want to overwrite the 7th column:

a = 5
dataframe.iloc[:,6] = (dataframe.iloc[:,5] >= a).astype(int)
print (dataframe)
   A  B  C  D  E  F  G
0  a  4  7  1  a  5  1
1  b  5  8  3  a  3  0
2  c  4  9  5  a  6  1
3  d  5  4  7  b  9  1
4  e  5  2  1  b  2  0
5  f  4  3  0  b  4  0

edited Sep 11, 2018 at 8:07

M.T

5,2614 gold badges37 silver badges58 bronze badges

answered Sep 11, 2018 at 7:02

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

jezrael Over a year ago

@M.T - Thank you, +3

Antonio Andrés Over a year ago

I do not understand why you are using iloc[]. In this case I would use: df.loc[:, 5] or df[5]. Can you explain me if there is any difference in performance?

jezrael Over a year ago

@AntonioAndrés - It depends if want seelct 6th column like in my solution, then need dataframe.iloc[:,5]. But if default columns names by RangeIndex then is possible use df[5] or df.loc[:, 5]

Antonio Andrés Over a year ago

Ok, I understand you, iloc[] is for position also in columns not only in rows. Thanks!

jezrael Over a year ago

@PV8 - Then use dataframe['new'] = np.where(dataframe.iloc[:,5] >= a, 1, 2)

|

Collectives™ on Stack Overflow

While Loop Alternative in Python

1 Answer 1

11 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

11 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related