5

I was trying to google up if there's a way to parse a pandas dataframe row wise and write the contents of each row into a new text file. My dataframe consists of a single column called Reviews.

Review classification

I'm looking to do some sentiment analysis on movie reviews and that I need each review to be in a separate text file. Can somebody help me here.

5
  • That's going to be very inefficient, what's the purpose for that? Commented Nov 9, 2015 at 23:23
  • Just to perform classification. My requirement is in that way Commented Nov 9, 2015 at 23:25
  • make a file name variable that changes every time you write a new line then open that filename with the w parameter Commented Nov 9, 2015 at 23:32
  • Can you please suggest the format to write data from dataframe to text file ? @RNar I've been wondering on that for quite a while. Does to_csv work for this ? Commented Nov 9, 2015 at 23:34
  • 1
    i wouldnt suggest it, no. because you want to write a new file for each row, iterate through the rows then just have something like f = open(filename, 'w') then f.write(row) kind of thing. just make sure to change filename each time. Commented Nov 9, 2015 at 23:43

4 Answers 4

9

I've written something like this and it works. anyways thanks for your inputs guys

for index, row in p.iterrows():
    if i > len(p):
       break
    else:
       f = open(str(i)+'.txt', 'w')
       f.write(row[0])
       f.close()
       i+=1

where p is a dataframe.

Sign up to request clarification or add additional context in comments.

1 Comment

For anyone else receiving the Unicode Error: change f = open(str(i)+'.txt', 'w'), to f = open(str(i)+'.txt', 'w', encoding='utf-8')
2

It's still inefficient, but since it's required here's one possible solution.

import pandas as pd
from io import StringIO

data="""
column1 column2
c1 c2
c3 c4
c5 c6
"""

df = pd.read_csv(StringIO(data), delimiter='\s+')

i=0
for row in df.values:
    filename = 'testdir/review{}.csv'.format(i)
    row.tofile(filename, sep=",", format="%s")
    i+=1

This will take the values as an array and write the data to a csv file named review0.csv, review1.csv... Another solution is to use pd.to_csv within the loop and specify the chunk

Comments

1

Here's another way to do it. This creates a destination folder if it doesn't exist.

import pandas as pd
from pathlib import Path

root_location = Path("/my/root/path")
os.makedirs(root_location, exist_ok=True)
df = pd.read_csv(my_csv) # for example

for index, row in df.iterrows():
    with open(root_location / (str(row["file_name"]) + ".txt"), "w") as f:
        f.write(str(row["file_contents"]))

Comments

0

this is simpler but might be costly solution

for i in range(len(data_to_txt)): 
    data_to_txt.iloc[[i]].to_csv(str(i)+".txt")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.