1

This should be simple, but for some reason I am not understanding where I am going wrong.

I have a sample dataframe:

df = pd.DataFrame({'name':('Nick', 'Nick', 'Nick', 'David'), 'num':(1, 2, 3, 4)})

enter image description here

I want to create a new column called link where if the value in name is 'Nick', then the link value would be some text + the num column value.

This is the code I am currently using:

df['link'] = np.where(df.name == "Nick","https://" + str(df.num), '')

But instead of the first row being:

0, Nick, 1, "https://1"

It is:

0, Nick, 1, "https://0    1\n1    2\n2    3\n3    4\nName: num, dtype: int64"

Which means it is using the whole num column, rather the row.

Any idea what I am doing wrong? And on a side note, I have to do this for millions of rows, any suggestions of the most efficient way of doing it?

1 Answer 1

1

Use df.num.astype(str), not str(df.num):

df['link'] = np.where(df.name=="Nick", "https://" + df.num.astype(str), '')

output:

    name  num       link
0   Nick    1  https://1
1   Nick    2  https://2
2   Nick    3  https://3
3  David    4           

Why?

df.num.astype(str) converts each item to a string:

0    1
1    2
2    3
3    4
Name: num, dtype: object

str(df.num) converts to the string representation of the Series object, which gets applied to all rows by broadcasting:

'0    1\n1    2\n2    3\n3    4\nName: num, dtype: int64'
Sign up to request clarification or add additional context in comments.

1 Comment

THANK YOU!. Very quick too! I am guessing I was turning the column into a string in my version, whereas you are saying use the column where each value is now a string. Super!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.