7

I have a python code which reads many files. but some files are extremely large due to which i have errors coming in other codes. i want a way in which i can check for the character count of the files so that i avoid reading those extremely large files. Thanks.

5 Answers 5

9
os.stat(filepath).st_size

Assuming by ‘characters’ you mean bytes. ETA:

i need total character count just like what the command 'wc filename' gives me unix

In which mode? wc on it own will give you a line, word and byte count (same as stat), not Unicode characters.

There is a switch -m which will use the locale's current encoding to convert bytes to Unicode and then count code-points: is that really what you want? It doesn't make any sense to decode into Unicode if all you are looking for is too-long files. If you really must:

import sys, codecs

def getUnicodeFileLength(filepath, charset= None):
    if charset is None:
        charset= sys.getfilesystemencoding()
    readerclass= codecs.getReader(charset)
    reader= readerclass(open(filepath, 'rb'), 'replace')
    nchar= 0
    while True:
        chars= reader.read(1024*32)  # arbitrary chunk size
        if chars=='':
            break
        nchar+= len(chars)
    reader.close()
    return nchar

sys.getfilesystemencoding() gets the locale encoding, reproducing what wc -m does. If you know the encoding yourself (eg. 'utf-8') then pass that in instead.

I don't think you want to do this.

Sign up to request clarification or add additional context in comments.

3 Comments

hi bob , i need total character count just like what the command 'wc filename' gives me unix
@randeepsp: Update your question with additional information. Do not add this kind of important information in comments.
For counting bytes, os.path.getsize(filepath) is easier for me to remember than os.stat(filepath).st_size (thanks @Sapph)
7

If you want the unicode character count for a text file given a specific encoding, you will have to read in the entire file to do that.

However, if you want the byte count for a given file, you want os.path.getsize(), which should only need to do a stat on the file as long as your OS has stat() or an equivalent call (all Unixes and Windows do).

1 Comment

Because of UTF coding schemes, it's possible that you'll have characters with a varying number of bytes.
5

Try

import os
os.path.getsize(filePath)

to get the size of your file, in bytes.

Comments

4
os.path.getsize(path) 

Return the size, in bytes, of path. Raise os.error if the file does not exist or is inaccessible.

Comments

2

alternative way

f=open("file")
os.fstat( f.fileno() ).st_size
f.close()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.