I want to extract numbers from strings like. They appear in many columns so what is the most efficient way to remove these strings and get only the numbers? Is there a way other than using regex
-
1Add the sample output.Amit Vikram Singh– Amit Vikram Singh2021-04-05 23:44:42 +00:00Commented Apr 5, 2021 at 23:44
-
1Its always important to add 3 simple things in your question. 1st- Samples of input, 2nd- Samples of output and 3rd- your efforts in form of code, kindly do add these in your question to make it clear, thank you.RavinderSingh13– RavinderSingh132021-04-05 23:47:06 +00:00Commented Apr 5, 2021 at 23:47
-
1@RavinderSingh13 Thanks for letting me know. I've added sample input and outputs.Ilovenoodles– Ilovenoodles2021-04-06 01:04:29 +00:00Commented Apr 6, 2021 at 1:04
-
@AmitVikramSingh Thanks for letting me know. I've added them.Ilovenoodles– Ilovenoodles2021-04-06 01:04:53 +00:00Commented Apr 6, 2021 at 1:04
2 Answers
Assuming you expect only one number per column, you could try using str.extract here:
df["some_col"] = df["some_col"].str.extract(r'(\d+(?:\.\d+)?)')
2 Comments
str.extract on multiple columns, by passing a list of columns.I would use a function with regex that matches the pattern of what you are seeing. Since you tagged pandas and dataframe I am assuming you are working with a dataframe but a sample output would certainly help. Here is how I would tackle it:
import pandas as pd
import numpy as np
import re
def extract_numbers (column1: str):
result = np.nan
for x in column1.split():
if re.search(r'\d+\.?\d+', x)
result = float(re.search(r'\d+\.?\d+', x).group())
if pd.notnunll(result):
return result
df['Numbers'] = df['YourColumn'].apply(extract_numbers)
The result of this function would be a new column called "Numbers" that contains the extracted number from each string. It will return NaN when a number is not found (or matched to). Once you have a column with the number value from each string you can interact with it however you please.