python error in implementing csv file

Question

i am getting this error when i try to run quora duplicates files on my feature python file, the part of code i am running is below

data = pd.read_csv('train.csv', sep='\t')
data = data.drop(['id', 'qid1', 'qid2'], axis=1)

and the output is

unfile('/Volumes/Macintosh HD/chrome/is_that_a_duplicate_quora_question-master/feature_engineering.py', wdir='/Volumes/Macintosh HD/chrome/is_that_a_duplicate_quora_question-master')

Traceback (most recent call last):

File "<ipython-input-31-e29a1095cc40>", line 1, in <module>
runfile('/Volumes/Macintosh HD/chrome/is_that_a_duplicate_quora_question-master/feature_engineering.py', wdir='/Volumes/Macintosh HD/chrome/is_that_a_duplicate_quora_question-master')

File "/Users/Yash/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 705, in runfile
execfile(filename, namespace)

File "/Users/Yash/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "/Volumes/Macintosh HD/chrome/is_that_a_duplicate_quora_question-master/feature_engineering.py", line 55, in <module>
data = data.drop(['id','qid1','qid2'], axis=1)

File "/Users/Yash/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py", line 2530, in drop
obj = obj._drop_axis(labels, axis, level=level, errors=errors)

File "/Users/Yash/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py", line 2562, in _drop_axis
new_axis = axis.drop(labels, errors=errors)

File "/Users/Yash/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3744, in drop
labels[mask])

ValueError: labels ['id' 'qid1' 'qid2'] not contained in axis

my csv file is like this

"id","qid1","qid2","question1","question2","is_duplicate"
"0","1","2","What is the step by step guide to invest in share market in india?","What is the step by step guide to invest in share market?","0"
"1","3","4","What is the story of Kohinoor (Koh-i-Noor) Diamond?","What would happen if the Indian government stole the Kohinoor (Koh-i-Noor) diamond back?","0"

please help me in trying to figure out the problem

sep='\t' means use tab as a separator but looks like your data is comma-separated. sep=',' might work? — sjw
– sjw, Commented Apr 20, 2018 at 8:26

Rachit kapadia · Accepted Answer · 2018-04-20 08:41:02Z

you need to remove the separator argument \ because content in csv already has , as a separator:

# sample.csv file contains following data

"id","qid1","qid2","question1","question2","is_duplicate"
"0","1","2","What is the step by step guide to invest in share market in india?","What is the step by step guide to invest in share ,"0"
"1","3","4","What is the story of Kohinoor (Koh-i-Noor) Diamond?","What would happen if the Indian government stole the Kohinoor(-i-Noor) diamond back?","0"

df = pd.read_csv('sample.csv')
data = df.drop(['id', 'qid1', 'qid2'], axis=1)
print data

#output will be like this:
"question1","question2","is_duplicate"
"What is the step by step guide to invest in share market in india?","What is the step by step guide to invest in share ,"0"
"What is the story of Kohinoor (Koh-i-Noor) Diamond?","What would happen if the Indian government stole the Kohinoor(-i-Noor) diamond back?","0"

Collectives™ on Stack Overflow

python error in implementing csv file

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related