2

I have 2 columns, I need to take specific string information from each column and create a new column with new strings based on this.

In column "Name" I have wellnames, I need to look at the last 4 characters of each wellname and if it Contains "H" then call that "HZ" in a new column.

I need to do the same thing if the column "WELLTYPE" contains specific words.

Using a Data Analysis program Spotfire I can do this all in one simple equation. (see below).

case  
When right([UWI],4)~="H" Then "HZ" 
When [WELLTYPE]~="Horizontal" Then "HZ" 
When [WELLTYPE]~="Deviated" Then "D" 
When [WELLTYPE]~="Multilateral" Then "ML"
else "V"
End

What would be the best way to do this in Python Pandas?

Is there a simple clean way you can do this all at once like in the spotfire equaiton above?

Here is the datatable with the two columns and my hopeful outcome column. (it did not copy very well into this), I also provide the code for the table below.

    Name    WELLTYPE    What I Want
0   HH-001HST2  Oil Horizontal  HZ
1   HH-001HST   Oil_Horizontal  HZ
2   HB-002H Oil HZ
3   HB-002  Water_Deviated  D
4   HB-002  Oil_Multilateral    ML
5   HB-004  Oil V
6   HB-005  Source  V
7   BB-007  Water   V

Here is the code to create the dataframe

# Dataframe with hopeful outcome
raw_data = {'Name': ['HH-001HST2', 'HH-001HST', 'HB-002H', 'HB-002', 'HB-002','HB-004','HB-005','BB-007'],
            'WELLTYPE':['Oil Horizontal', 'Oil_Horizontal', 'Oil', 'Water_Deviated', 'Oil_Multilateral','Oil','Source','Water'],
           'What I Want': ['HZ', 'HZ', 'HZ', 'D', 'ML','V','V','V']}
df = pd.DataFrame(raw_data, columns = ['Name','WELLTYPE','What I Want'])
df

3 Answers 3

2

Nested 'where' variant:

df['What I Want'] = np.where(df.Name.str[-4:].str.contains('H'), 'HZ',
                       np.where(df.WELLTYPE.str.contains('Horizontal'),'HZ',
                       np.where(df.WELLTYPE.str.contains('Deviated'),'D',
                       np.where(df.WELLTYPE.str.contains('Multilateral'),'ML',
                       'V'))))
Sign up to request clarification or add additional context in comments.

1 Comment

PERFECT AWESOME!. I was hoping it would be this easy!
1

Using apply by row:

def criteria(row):
    if row.Name[-4:].find('H') > 0:
        return 'HZ'
    elif row.WELLTYPE.find('Horizontal') > 0:
        return 'HZ'
    elif row.WELLTYPE.find('Deviated') > 0:
        return 'D'
    elif row.WELLTYPE.find('Multilateral') > 0:
        return 'ML'
    else:
        return 'V'

df['want'] = df.apply(criteria, axis=1)

Comments

1

This feels more natural to me. Obviously subjective

from_name = df.Name.str[-4:].str.contains('H').map({True: 'HZ'})

regex = '(Horizontal|Deviated|Multilateral)'
m = dict(Horizontal='HZ', Deviated='D', Multilateral='ML')
from_well = df.WELLTYPE.str.extract(regex, expand=False).map(m)

df['What I Want'] = from_name.fillna(from_well).fillna('V')

print(df)

         Name          WELLTYPE What I Want
0  HH-001HST2    Oil Horizontal          HZ
1   HH-001HST    Oil_Horizontal          HZ
2     HB-002H            Oil HZ          HZ
3      HB-002    Water_Deviated           D
4      HB-002  Oil_Multilateral          ML
5      HB-004             Oil V           V
6      HB-005            Source           V
7      BB-007             Water           V

3 Comments

Thanks, for me the nested where is easier to understand. I havent used the map function yet.
@brandog the nested np.where is very similar to what you were doing before. I like providing information. Hopefully others will find it useful to see how to use the techniques I chose to use.
yes agreed. I will probably use them in the future.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.