4

I have a Pandas dataframe column named 'VALUE' which has string data like this: '-1.459NS' I want to create 2 new columns -> 'VALUE' must have a float -1.459 and UNIT must have a string 'NS'

Is there a Regex and/or Non Regex way of doing this? What is the fastest way of doing this? I have maybe a million + lines over which I want to do this.

>>> d = {'VALUE': ['-1.234NS','0.22MH']}
>>> df=pd.DataFrame(data=d)
>>> df
      VALUE
0  -1.234NS
1    0.22MH

I want:

    VALUE    UNIT
0  -1.234    NS
1    0.22    MH

Where VALUE is float and UNIT is string

2
  • Are units always 2-characters long? Commented Aug 27, 2018 at 22:11
  • The units can be any number of characters Commented Aug 27, 2018 at 23:14

2 Answers 2

6

df.column.str.extract will produce a dataframe with one column per matched group in the regex, indexed by int position. Then you can use rename to rename the columns.

>>> df.VALUE.str.extract(r'([-]?[\d.]*)([\w\D]*)').rename(columns={0:'VALUE', 1:'UNIT'})

    VALUE UNIT
0  -1.234   NS
1    0.22   MH
Sign up to request clarification or add additional context in comments.

1 Comment

My approach would have been df.VALUE.str.extract('([+-.0-9]+)(.*)'), then plus sign is allowed and UNIT could have numbers, too - however, I'm not a regex expert, but at least for the sample above it would also work.
0

Here is another non-regex way to solve this if the following units aren't fixed lengths. This will split the string into float numbers and string type units.

s = '-1.234NS'
a,b = map(str, s.split("."))
a = a + '.'
unit = ''
for num in b:
    try:
        float(num)
        a += num
    except ValueError:
        unit += num
fl = float(a)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.