0

I have following script properly identifies ASCII and non-ASCII lines, but I want a report for each file, not per line. Since I have the print inside the loop, and I have many files, I get far too much output. How can I modify this code to get a single output per file? It should tell me whether there was any non-ASCII text in the file.

import os

for file in os.listdir('.'):
    if file.endswith('.txt'):

        with open(file) as f:
            content = f.readlines()

            for entry in content:
                try:
                    entry.encode('ascii')
                except UnicodeEncodeError:
                    print("it was not a ascii-encoded unicode string")
                    print(file)
                else:
                    print("It may have been an ascii-encoded unicode string")
                    print(file)
3
  • 1
    Remove the print statements you have, and put a print statement outside the with open(file) ... context manager but inside the for file in ... block Commented Dec 29, 2016 at 18:53
  • 1
    If you think about the structure of your script, I think you will be able to determine the solution. Just think about storing the information you want to print while the script is evaluating each entry in content, and printing that information when the inner for loop is complete. Commented Dec 29, 2016 at 18:54
  • 1
    That depends on which output you want, and under what conditions. Your program is clearly written to evaluate every line of every file, so you'll have to unambiguously tell us what you do want. Commented Dec 29, 2016 at 18:54

1 Answer 1

1

For instance, if you want to show whether there was any non-ASCII string in the file, you maintain a flag to tell you whether you've found a bad line. However, you wait until the end of the file to report.

import os

for file in os.listdir('.'):
    if file.endswith('.txt'):

        with open(file) as f:
            content = f.readlines()
            good_file = True

            for entry in content:
                try:
                    entry.encode('ascii')
                except UnicodeEncodeError:
                    good_file = False

        if good_file:
            print("It may have been an ASCII-encoded unicode string")
        else:
            print("it was not an ASCII-encoded unicode string")

        print(file)
Sign up to request clarification or add additional context in comments.

4 Comments

Thank you so much, did this trick and I just learned something :)
Excellent! An important part of programming is to determine when you have enough information to make a decision -- in this case, you don't know what you want to print until after you read the entire file.
Please remember to appropriately edit the question, and accept an answer to let SO archive this properly.
I did accept the answer truly appreciated for the help

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.