Python & Pandas: Writing data to specific columns in csv

Question

While using Python and Pandas, I'm running a script that analyzes txt files for word count and lexile scores. I can successfully run the script and write to csv. However, my output delivers unexpected values, and I'm having difficulty writing the data to the specific column.

Here is code:

import pandas as pd
import textstat
import csv

header = ["word_count", "flech"]

with open('data.csv', 'w', encoding='UTF8') as f:
    writer = csv.writer(f)

    writer.writerow(header)
    
for text_number in range(0, 2):

    f = open(f'\TXTs\text_{text_number}.txt', 'r')

    if f.mode == 'r':
        contents = f.read()
        
    text_data = (contents)

    word_count = textstat.lexicon_count(text_data, removepunct=True)
    flech = textstat.flesch_kincaid_grade(text_data)
   
    wc = pd.DataFrame([word_count])
    fl = pd.DataFrame([flech])
    
    def wc_count():
        wc.to_csv('output.csv', mode="a", header="word_count", index=False)
        
    def fl_count():
        fl.to_csv('output.csv', mode="a", header="flech", index=False)

    wc_count()
    fl_count()

I'd like the output to look like this, with the 2 & 271 values in the "word_count" column, and the -3.1 and 13 in the "flech" column:

word_count, flech
2, -3.1
271, 13

However, the output produced looks like this:

word_count, flech
    
0   
2   
0   
-3.1    
0   
271 
0   
13

Clearly, I've got some problems with my output. Any assistance would be greatly appreciated.

Muhammad Rasel · Accepted Answer · 2021-07-25 18:38:12Z

1

Instead of creating two dataframe try creating one and write in csv.

flech = textstat.flesch_kincaid_grade(text_data) # change after this line
output_df = pd.DataFrame({"word_count":[word_count], "flech":[flech])
output_df.to_csv('output.csv', mode="a", index=False)

answered Jul 25, 2021 at 18:38

Muhammad Rasel

7244 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Daniel Hutchinson Over a year ago

Many thanks! This works - however, it also appends "word_count" and "flech" for every txt read, resulting multiple entries above the data values. Any ideas on how to eliminate this?

Muhammad Rasel Over a year ago

Try adding header = False after index = False

mozway · Accepted Answer · 2021-07-25 18:40:03Z

1

It looks like you're going through great lengths for something that seems quite straightforward. Just use pandas' I/O function to read/write your data: pandas.read_csv and pandas.DataFrame.to_csv

It is hard to give you the exact code without the data, but try something like:

with open(f'\TXTs\text_{text_number}.txt', 'r') as f:
    text_data = f.read()

word_count = textstat.lexicon_count(text_data, removepunct=True)
flech = textstat.flesch_kincaid_grade(text_data)

df = pd.DataFrame({'word_count': word_count, 'flech': flech})

df.to_csv('output.csv', index=False)

answered Jul 25, 2021 at 18:40

mozway

267k13 gold badges56 silver badges106 bronze badges

5 Comments

Daniel Hutchinson Over a year ago

Many thanks for this - I'm learning Python/Pandas, and clearly have a lot to learn. Your code resulted in the following error: "ValueError: If using all scalar values, you must pass an index."

mozway Over a year ago

Can you provide the content of the word_count and flech? Are those lists? Please read the documentation of pandas.DataFrame to see how to construct a DataFrame. The first example is what I expected to do (using a dictionary with keys as column name and items as lists).

Daniel Hutchinson Over a year ago

word_count & flech are both integers produced by the textstate module's analysis of the txt files. Since they are written via a loop, I think they would be in a list in this case.

mozway Over a year ago

On then you have to wrap it in square brackets like indicated by @Muhammad Rasel. However, you should NOT write your file line by line. Please give an example of the data, explain what you want to do and give the expected output. It is certainly possible to do what you want by writing the complete output file only once.

Daniel Hutchinson Over a year ago

The data is as follows: text_0.txt containing text: "Test text", and 'text_1.txt1 containing text: "Four score and seven years ago....[continues]" The loop runs these txt files through the textstat module. The module's functions are follows: word count counts words (minus punctuation), providing values of 2 and 271. The Flech function calculates the lexile scores of the txt, -3.1 & 13. I'd like the output of the textstat functions to be written in a csv under the columns indicated above. A way to do this once instead of line by line would be excellent.

Collectives™ on Stack Overflow

Python & Pandas: Writing data to specific columns in csv

2 Answers 2

2 Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related