new python pandas dataframe column based on value of variable, using function

Question

I have a variable, 'ImageName' which ranges from 0-1600. I want to create a new variable, 'LocationCode', based on the value of 'ImageName'.

If 'ImageName' is less than 70, I want 'LocationCode' to be 1. if 'ImageName' is between 71 and 90, I want 'LocationCode' to be 2. I have 13 different codes in all. I'm not sure how to write this in python pandas. Here's what I tried:

def spatLoc(ImageName):
    if ImageName <=70:
        LocationCode = 1
    elif ImageName >70 and ImageName <=90:
        LocationCode = 2
   return LocationCode

df['test'] = df.apply(spatLoc(df['ImageName'])

but it returned an error. I'm clearly not defining things the right way but I can't figure out how to.

--> return LocationCode should be indented as well. Looks like it is falling out of your method definition — FirebladeDan
– FirebladeDan, Commented Jul 6, 2015 at 20:12

EdChum · Accepted Answer · 2015-07-06 20:26:46Z

You can just use 2 boolean masks:

df.loc[df['ImageName'] <= 70, 'Test'] = 1
df.loc[(df['ImageName'] > 70) & (df['ImageName'] <= 90), 'Test'] = 2

By using the masks you only set the value where the boolean condition is met, for the second mask you need to use the & operator to and the conditions and enclose the conditions in parentheses due to operator precedence

Actually I think it would be better to define your bin values and call cut, example:

In [20]:    
df = pd.DataFrame({'ImageName': np.random.randint(0, 100, 20)})
df

Out[20]:
    ImageName
0          48
1          78
2           5
3           4
4           9
5          81
6          49
7          11
8          57
9          17
10         92
11         30
12         74
13         62
14         83
15         21
16         97
17         11
18         34
19         78

In [22]:    
df['group'] = pd.cut(df['ImageName'], range(0, 105, 10), right=False)
df

Out[22]:
    ImageName      group
0          48   [40, 50)
1          78   [70, 80)
2           5    [0, 10)
3           4    [0, 10)
4           9    [0, 10)
5          81   [80, 90)
6          49   [40, 50)
7          11   [10, 20)
8          57   [50, 60)
9          17   [10, 20)
10         92  [90, 100)
11         30   [30, 40)
12         74   [70, 80)
13         62   [60, 70)
14         83   [80, 90)
15         21   [20, 30)
16         97  [90, 100)
17         11   [10, 20)
18         34   [30, 40)
19         78   [70, 80)

Here the bin values were generated using range but you could pass your list of bin values yourself, once you have the bin values you can define a lookup dict:

In [32]:    
d = dict(zip(df['group'].unique(), range(len(df['group'].unique()))))
d

Out[32]:
{'[0, 10)': 2,
 '[10, 20)': 4,
 '[20, 30)': 9,
 '[30, 40)': 7,
 '[40, 50)': 0,
 '[50, 60)': 5,
 '[60, 70)': 8,
 '[70, 80)': 1,
 '[80, 90)': 3,
 '[90, 100)': 6}

You can now call map and add your new column:

In [33]:    
df['test'] = df['group'].map(d)
df

Out[33]:
    ImageName      group  test
0          48   [40, 50)     0
1          78   [70, 80)     1
2           5    [0, 10)     2
3           4    [0, 10)     2
4           9    [0, 10)     2
5          81   [80, 90)     3
6          49   [40, 50)     0
7          11   [10, 20)     4
8          57   [50, 60)     5
9          17   [10, 20)     4
10         92  [90, 100)     6
11         30   [30, 40)     7
12         74   [70, 80)     1
13         62   [60, 70)     8
14         83   [80, 90)     3
15         21   [20, 30)     9
16         97  [90, 100)     6
17         11   [10, 20)     4
18         34   [30, 40)     7
19         78   [70, 80)     1

The above can be modified to suit your needs but it's just to demonstrate an approach which should be fast and without the need to iterate over your df.

I've updated my answer to show a way that wouldn't use 13 masks, it's slightly more complicated but I think better as it's cleaner

Bob Dalgleish · Accepted Answer · 2019-03-29 19:12:16Z

0

In Python, you use the dictionary lookup notation to find a field within a row. The field name is ImageName. In the spatLoc() function below, the parameter row is a dictionary containing the entire row, and you would find an individual column by using the field name as key to the dictionary.

def spatLoc(row):
    if row['ImageName'] <=70:
        LocationCode = 1
    elif row['ImageName']  >70 and row['ImageName']  <=90:
        LocationCode = 2
    return LocationCode

df['test'] = df.apply(spatLoc, axis=1)

edited Mar 29, 2019 at 19:12

Bob Dalgleish

8,2604 gold badges35 silver badges42 bronze badges

answered Mar 29, 2019 at 15:22

Ahmad Senousi

6433 gold badges14 silver badges26 bronze badges

1 Comment

FZs Over a year ago

Please describe, what did you change and why, to help others understand the problem and this answer

Collectives™ on Stack Overflow

new python pandas dataframe column based on value of variable, using function

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related