5

I have a Pandas DataFrame called "data" with 2 columns and 50 rows filled with one or two lines of text each, imported from a .tsv file. Some of the questions may contain integers and floats, besides strings. I am trying to extract the first word of every sentence (in both columns), but consistently get this error: AttributeError: 'DataFrame' object has no attribute 'str'.

At first, I thought the error was due to my wrong use of "data.str.split", but all changes I could Google failed. Then I through the file might not be composed of all strings. So I tried "data.astype(str)" on the file, but the same error remained. Any suggestions? Thanks a lot!

Here is my code:

import pandas as pd
questions = "questions.tsv"
data = pd.read_csv(questions, usecols = [3], nrows = 50, header=1, sep="\t")
data = data.astype(str)
first_words = data.str.split(None, 1)[0]
3
  • Yes, both work! Thanks so much! Just to learn, any idea why my approach failed? Commented Sep 15, 2017 at 4:46
  • 1
    It doesn't work because you can't call .str accessor on a dataframe directly. Commented Sep 15, 2017 at 4:53
  • Thanks, very grateful. Commented Sep 15, 2017 at 4:57

2 Answers 2

5

Use:

first_words = data.apply(lambda x: x.str.split().str[0])

Or:

first_words = data.applymap(lambda x: x.split()[0])

Sample:

data = pd.DataFrame({'a':['aa ss ss','ee rre', 1, 'r'],
                   'b':[4,'rrt ee', 'ee www ee', 6]})
print (data)
          a          b
0  aa ss ss          4
1    ee rre     rrt ee
2         1  ee www ee
3         r          6

data = data.astype(str)
first_words = data.apply(lambda x: x.str.split().str[0])
print (first_words)
    a    b
0  aa    4
1  ee  rrt
2   1   ee
3   r    6

first_words = data.applymap(lambda x: x.split()[0])
print (first_words)
    a    b
0  aa    4
1  ee  rrt
2   1   ee
3   r    6
Sign up to request clarification or add additional context in comments.

3 Comments

I could not understand what you said well, but from what I understood, it seemed like you were upset
Sorry, I didn't see x.str.split().str[0] in your answer.
Fantastic. So happy to see you thinking positively.
1

The problem is that you attempted to use the pd.Series.str string accessor on a pd.DataFrame. Unfortunately, it is a pd.Series only attribute. That means you need to use it in a pd.Series context. You can accomplish in several ways.

Setup
Assume your dataframe looked like this

              Col1               Col2
0   this is a test        hello world
1  this is another          pandas123
2            test3       tommy trojan
3         etcetera  one more sentence

Option 1
Use stack to convert a 2-dimensional dataframe into a series... then use the string accessor

#  Make a
#  Series
#  /----\    
df.stack().str.split(n=1).str[0].unstack()
#                                 \_____/
#                                 Turn it
#                                   Back

       Col1       Col2
0      this      hello
1      this  pandas123
2     test3      tommy
3  etcetera        one

Option 2
Or you can use pd.DataFrame.apply to use the pd.Series.str accessor on each column separately.
This is covered in @jezrael's answer.

df.apply(lambda x: x.str.split(n=1).str[0])

       Col1       Col2
0      this      hello
1      this  pandas123
2     test3      tommy
3  etcetera        one

Option 3
Use a comprehension

pd.DataFrame({c: df[c].str.split(n=1).str[0] for c in df})

       Col1       Col2
0      this      hello
1      this  pandas123
2     test3      tommy
3  etcetera        one

You'll notice that in all options, we used the str on a pd.Series object and not a pd.DataFrame object.

3 Comments

Awesome! I think split(n=1) might improve efficiency a bit, because splitting stops after the first word (everything after is unnecessary). This was covered in my (now deleted) answer.
Added. Thanks for tip.
This is great, thanks. I am a starter, so I am grateful for this steep learning curve!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.