2

I have a dataframe:

ID
239200202
14700993
1153709258720067584

And have a output whether the id is a bot or not in an array form [1,1,0] How can I combine it into one dataframe like:

ID Bot
239200202 bot
14700993 bot
1153709258720067584 Not bot

I tried this code, but it didn't work:

test = pd.read_csv('./user_data/user_lookup/dataset/test_dataframe.csv', index_col=1)
df = pd.DataFrame(columns=['UserID','Bot/Not'])
for index,row in test.iterrows():
   if test[index] == 1:
      df.loc[index,['UserID']] = test['User ID']
      df.loc[index,['Bot/Not']] = 'Bot'
   if test[index] == 0:
      df.loc[index, ['UserID']] = test['User ID']
      df.loc[index, ['Bot/Not']] = 'Not-Bot'
print(df)

It would be great if someone can help me out. Thank you

5
  • is test_dataframe.csv contains ID only? Commented Jun 3, 2021 at 6:14
  • what is the name of the array? Commented Jun 3, 2021 at 6:14
  • yes test_dataframe.csv only has ids. I dropped the rest of the columns because I don't need it. Commented Jun 3, 2021 at 6:15
  • array is an output from: pred_logreg_test = logreg.predict(test_scaled) I am predicting the output array whether the id is a bot or not Commented Jun 3, 2021 at 6:16
  • you can use pandas assign, for you array/list you can do a list comprehension arr =[1,1,0] arr = ['Bot' if x==1 else 'Not-Bot' for x in arr] Commented Jun 3, 2021 at 6:17

4 Answers 4

1

According to the hints that you have given in the question,

You can add the column name Bot to the test dataframe as follow:

new_pred = ['bot' if x==1 else 'Not bot' for x in pred_logreg_test]
test['Bot'] = list(new_pred)

Sign up to request clarification or add additional context in comments.

Comments

1

Here is the solution to the above problem

array = [1,1,0]
df['BOT']=df.loc[df['ID'].isin(array)]

Comments

1

it's best to use here with pd.concat , to merge this 2 df into one

also, try to avoid iterrows at any cost while working with DataFrames, its substantially slower

example:

import pandas as pd
import numpy as np

df = pd.DataFrame({'ID': [100, 101, 102]})
bot_not_bot = np.array([1,0,1])
df = pd.concat([df, pd.DataFrame({'bot/not bot': bot_not_bot})], axis=1)

instead of using iterrows which is slower, use apply for faster results on larger scale DataFrames

df['bot/not bot'] = df['bot/not bot'].apply(lambda x: 'Bot' if x else 'Not Bot')

This is the correct way to use Dataframes, avoid iterrows

Comments

0

Use indexing into an array:

df = pd.DataFrame({'UserID': [239200202, 14700993, 1153709258720067584]})
is_bot = np.array([1,1,0])
df['Bot'] = np.array(['not bot', 'bot'])[is_bot]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.