0

While using Python and Pandas, I'm running a script that analyzes txt files for word count and lexile scores. I can successfully run the script and write to csv. However, my output delivers unexpected values, and I'm having difficulty writing the data to the specific column.

Here is code:

import pandas as pd
import textstat
import csv

header = ["word_count", "flech"]

with open('data.csv', 'w', encoding='UTF8') as f:
    writer = csv.writer(f)

    writer.writerow(header)
    
for text_number in range(0, 2):

    f = open(f'\TXTs\text_{text_number}.txt', 'r')

    if f.mode == 'r':
        contents = f.read()
        
    text_data = (contents)

    word_count = textstat.lexicon_count(text_data, removepunct=True)
    flech = textstat.flesch_kincaid_grade(text_data)
   
    wc = pd.DataFrame([word_count])
    fl = pd.DataFrame([flech])
    
    def wc_count():
        wc.to_csv('output.csv', mode="a", header="word_count", index=False)
        
    def fl_count():
        fl.to_csv('output.csv', mode="a", header="flech", index=False)

    wc_count()
    fl_count()

I'd like the output to look like this, with the 2 & 271 values in the "word_count" column, and the -3.1 and 13 in the "flech" column:

word_count, flech
2, -3.1
271, 13

However, the output produced looks like this:

word_count, flech
    
0   
2   
0   
-3.1    
0   
271 
0   
13  

Clearly, I've got some problems with my output. Any assistance would be greatly appreciated.

2 Answers 2

1

Instead of creating two dataframe try creating one and write in csv.

flech = textstat.flesch_kincaid_grade(text_data) # change after this line
output_df = pd.DataFrame({"word_count":[word_count], "flech":[flech])
output_df.to_csv('output.csv', mode="a", index=False)
Sign up to request clarification or add additional context in comments.

2 Comments

Many thanks! This works - however, it also appends "word_count" and "flech" for every txt read, resulting multiple entries above the data values. Any ideas on how to eliminate this?
Try adding header = False after index = False
1

It looks like you're going through great lengths for something that seems quite straightforward. Just use pandas' I/O function to read/write your data: pandas.read_csv and pandas.DataFrame.to_csv

It is hard to give you the exact code without the data, but try something like:

with open(f'\TXTs\text_{text_number}.txt', 'r') as f:
    text_data = f.read()

word_count = textstat.lexicon_count(text_data, removepunct=True)
flech = textstat.flesch_kincaid_grade(text_data)

df = pd.DataFrame({'word_count': word_count, 'flech': flech})

df.to_csv('output.csv', index=False)

5 Comments

Many thanks for this - I'm learning Python/Pandas, and clearly have a lot to learn. Your code resulted in the following error: "ValueError: If using all scalar values, you must pass an index."
Can you provide the content of the word_count and flech? Are those lists? Please read the documentation of pandas.DataFrame to see how to construct a DataFrame. The first example is what I expected to do (using a dictionary with keys as column name and items as lists).
word_count & flech are both integers produced by the textstate module's analysis of the txt files. Since they are written via a loop, I think they would be in a list in this case.
On then you have to wrap it in square brackets like indicated by @Muhammad Rasel. However, you should NOT write your file line by line. Please give an example of the data, explain what you want to do and give the expected output. It is certainly possible to do what you want by writing the complete output file only once.
The data is as follows: text_0.txt containing text: "Test text", and 'text_1.txt1 containing text: "Four score and seven years ago....[continues]" The loop runs these txt files through the textstat module. The module's functions are follows: word count counts words (minus punctuation), providing values of 2 and 271. The Flech function calculates the lexile scores of the txt, -3.1 & 13. I'd like the output of the textstat functions to be written in a csv under the columns indicated above. A way to do this once instead of line by line would be excellent.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.