2

I have the following dataframe:

data
A(1.2,2)
B(1,5)
A(5.8, 9)
B(8.9,0.9)

I would like to convert these float (str) objects to int. How do I do that?

Desired Output:

data
A(1,2)
B(1,5)
A(6, 9)
B(9,1)

What I tried so far?

pd.to_numeric(df['data'])

But I get the following error: ValueError: Unable to parse string "A(1,2)" at position 0 How do I solve this?

4 Answers 4

2

Your strings are effectively namedtuples. Treat them as such

  • define expected named tuples
  • turn into named tuples using pd.eval()
  • turn back into wanted string representation using f-string
  • alternatively use Series map() to changed to wanted representation
df = pd.read_csv(
    io.StringIO(
        """df
A(1.2,2)
B(1,5)
A(5.8, 9)
B(8.9,0.9)"""
    ),
    sep="\t",
).rename(columns={"df": "data"})


from collections import namedtuple

A = namedtuple("A", "x y")
B = namedtuple("B", "x y")
df = pd.DataFrame(
    {
        "data": [
            f"{type(nt).__name__}({round(nt.x,0):.0f},{round(nt.y):.0f})"
            for nt in pd.eval(df["data"])
        ]
    }
)

use map() for rounding

df["data"] = pd.Series(pd.eval(df["data"])).map(
    lambda nt: str(nt._replace(x=int(round(nt.x, 0)), y=int(round(nt.y, 0))))
)
data
0 A(1,2)
1 B(1,5)
2 A(6,9)
3 B(9,1)
Sign up to request clarification or add additional context in comments.

Comments

1

You have a string and need first to seperate the numbers from each other, propbaly a custom function might be the easiest way:

def round_string(s):
    start = s.index('(') +1
    stop = s.index(')')
    l = s[start:stop].split(',')
    lst = [str(int(round(float(i)))) for i in l]
    return s[:start] + ','.join(lst) + s[stop:]

s = "B(8.9,0.9)"
round_string(s)
# 'B(9,1)'

Map function to dataframe:

df['data'].map(round_string)

Comments

1

If you don't need to round off, following should work, replacing the decimal and digits after decimal with empty string

df['data'].str.replace('\.\d+', '', regex=True)
0     A(1,2)
1     B(1,5)
2    A(5, 9)
3     B(8,0)
Name: col, dtype: object

For rounding the values, a bit more effort is required, just extract the parenthesis part using regex and assign it to a temporary column, then use comprehension to round each values calling the eval finally replace with new value in the given column.

df.assign(tup=df['data'].str.extract('(\(.*\))')).apply(lambda x: x['data'].replace(x['tup'], str(tuple(round(i) for i in eval(x['tup'])))), axis=1)

0    A(1, 2)
1    B(1, 5)
2    A(6, 9)
3    B(9, 1)
dtype: object

3 Comments

Thanks for the suggestion. But, when I try to execute it I get an error saying SyntaxError: invalid syntax Sorry for the silly question, how to I solve it?
Hello @disukumo, I tested the code twice, and it is running fine on my end with Python 3.7.5 and pandas 1.2.2, One possible reason might be due to pandas older version if you have, another reason might be the sample data you have in the question is different than the actual data you have.
My python version is 3.8 and pandas is 1.2.4. Not sure, why it isn't working. Thanks for the help!
1

So what you want to do is convert the floats nested inside of strings into ints.

Furthermore, your output suggests you don't want to use the int function but probably round(x,0) (I say this because int(5.8) evaluates to 5, not 6.

So a function like this applied to the dataframe will work:

def convert_fn_strs(fn):
    val_list = re.split('[(,)]',fn)
    val_list.remove('')
    fn_name = val_list.pop(0)
    val_list = [round(float(x)) for i,x in enumerate(val_list)]
    return fn_name + str(tuple(val_list))```


Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.