new column based on specific string info from two different columns Python Pandas

Question

I have 2 columns, I need to take specific string information from each column and create a new column with new strings based on this.

In column "Name" I have wellnames, I need to look at the last 4 characters of each wellname and if it Contains "H" then call that "HZ" in a new column.

I need to do the same thing if the column "WELLTYPE" contains specific words.

Using a Data Analysis program Spotfire I can do this all in one simple equation. (see below).

case  
When right([UWI],4)~="H" Then "HZ" 
When [WELLTYPE]~="Horizontal" Then "HZ" 
When [WELLTYPE]~="Deviated" Then "D" 
When [WELLTYPE]~="Multilateral" Then "ML"
else "V"
End

What would be the best way to do this in Python Pandas?

Is there a simple clean way you can do this all at once like in the spotfire equaiton above?

Here is the datatable with the two columns and my hopeful outcome column. (it did not copy very well into this), I also provide the code for the table below.

    Name    WELLTYPE    What I Want
0   HH-001HST2  Oil Horizontal  HZ
1   HH-001HST   Oil_Horizontal  HZ
2   HB-002H Oil HZ
3   HB-002  Water_Deviated  D
4   HB-002  Oil_Multilateral    ML
5   HB-004  Oil V
6   HB-005  Source  V
7   BB-007  Water   V

Here is the code to create the dataframe

# Dataframe with hopeful outcome
raw_data = {'Name': ['HH-001HST2', 'HH-001HST', 'HB-002H', 'HB-002', 'HB-002','HB-004','HB-005','BB-007'],
            'WELLTYPE':['Oil Horizontal', 'Oil_Horizontal', 'Oil', 'Water_Deviated', 'Oil_Multilateral','Oil','Source','Water'],
           'What I Want': ['HZ', 'HZ', 'HZ', 'D', 'ML','V','V','V']}
df = pd.DataFrame(raw_data, columns = ['Name','WELLTYPE','What I Want'])
df

Alexey Trofimov · Accepted Answer · 2017-04-13 19:12:09Z

2

Nested 'where' variant:

df['What I Want'] = np.where(df.Name.str[-4:].str.contains('H'), 'HZ',
                       np.where(df.WELLTYPE.str.contains('Horizontal'),'HZ',
                       np.where(df.WELLTYPE.str.contains('Deviated'),'D',
                       np.where(df.WELLTYPE.str.contains('Multilateral'),'ML',
                       'V'))))

edited Apr 13, 2017 at 19:12

answered Apr 13, 2017 at 19:03

Alexey Trofimov

5,0672 gold badges24 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

brandog Over a year ago

PERFECT AWESOME!. I was hoping it would be this easy!

nbraun · Accepted Answer · 2017-04-13 19:05:29Z

1

Using apply by row:

def criteria(row):
    if row.Name[-4:].find('H') > 0:
        return 'HZ'
    elif row.WELLTYPE.find('Horizontal') > 0:
        return 'HZ'
    elif row.WELLTYPE.find('Deviated') > 0:
        return 'D'
    elif row.WELLTYPE.find('Multilateral') > 0:
        return 'ML'
    else:
        return 'V'

df['want'] = df.apply(criteria, axis=1)

answered Apr 13, 2017 at 19:05

nbraun

334 bronze badges

Comments

piRSquared · Accepted Answer · 2017-04-13 19:21:43Z

1

This feels more natural to me. Obviously subjective

from_name = df.Name.str[-4:].str.contains('H').map({True: 'HZ'})

regex = '(Horizontal|Deviated|Multilateral)'
m = dict(Horizontal='HZ', Deviated='D', Multilateral='ML')
from_well = df.WELLTYPE.str.extract(regex, expand=False).map(m)

df['What I Want'] = from_name.fillna(from_well).fillna('V')

print(df)

         Name          WELLTYPE What I Want
0  HH-001HST2    Oil Horizontal          HZ
1   HH-001HST    Oil_Horizontal          HZ
2     HB-002H            Oil HZ          HZ
3      HB-002    Water_Deviated           D
4      HB-002  Oil_Multilateral          ML
5      HB-004             Oil V           V
6      HB-005            Source           V
7      BB-007             Water           V

answered Apr 13, 2017 at 19:21

piRSquared

296k68 gold badges509 silver badges654 bronze badges

3 Comments

brandog Over a year ago

Thanks, for me the nested where is easier to understand. I havent used the map function yet.

piRSquared Over a year ago

@brandog the nested np.where is very similar to what you were doing before. I like providing information. Hopefully others will find it useful to see how to use the techniques I chose to use.

brandog Over a year ago

yes agreed. I will probably use them in the future.

Collectives™ on Stack Overflow

new column based on specific string info from two different columns Python Pandas

3 Answers 3

1 Comment

Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related