how to check the character count of a file in python

Question

I have a python code which reads many files. but some files are extremely large due to which i have errors coming in other codes. i want a way in which i can check for the character count of the files so that i avoid reading those extremely large files. Thanks.

bobince · Accepted Answer · 2010-01-06 14:48:23Z

9

os.stat(filepath).st_size

Assuming by ‘characters’ you mean bytes. ETA:

i need total character count just like what the command 'wc filename' gives me unix

In which mode? wc on it own will give you a line, word and byte count (same as stat), not Unicode characters.

There is a switch -m which will use the locale's current encoding to convert bytes to Unicode and then count code-points: is that really what you want? It doesn't make any sense to decode into Unicode if all you are looking for is too-long files. If you really must:

import sys, codecs

def getUnicodeFileLength(filepath, charset= None):
    if charset is None:
        charset= sys.getfilesystemencoding()
    readerclass= codecs.getReader(charset)
    reader= readerclass(open(filepath, 'rb'), 'replace')
    nchar= 0
    while True:
        chars= reader.read(1024*32)  # arbitrary chunk size
        if chars=='':
            break
        nchar+= len(chars)
    reader.close()
    return nchar

sys.getfilesystemencoding() gets the locale encoding, reproducing what wc -m does. If you know the encoding yourself (eg. 'utf-8') then pass that in instead.

I don't think you want to do this.

edited Jan 6, 2010 at 14:48

answered Jan 6, 2010 at 5:03

bobince

538k111 gold badges675 silver badges846 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

randeepsp Over a year ago

hi bob , i need total character count just like what the command 'wc filename' gives me unix

S.Lott Over a year ago

@randeepsp: Update your question with additional information. Do not add this kind of important information in comments.

hobs Over a year ago

For counting bytes, os.path.getsize(filepath) is easier for me to remember than os.stat(filepath).st_size (thanks @Sapph)

Mike · Accepted Answer · 2010-01-06 05:14:40Z

7

If you want the unicode character count for a text file given a specific encoding, you will have to read in the entire file to do that.

However, if you want the byte count for a given file, you want os.path.getsize(), which should only need to do a stat on the file as long as your OS has stat() or an equivalent call (all Unixes and Windows do).

edited Jan 6, 2010 at 5:14

answered Jan 6, 2010 at 5:05

Mike

5,0642 gold badges20 silver badges11 bronze badges

1 Comment

S.Lott Over a year ago

Because of UTF coding schemes, it's possible that you'll have characters with a varying number of bytes.

Sapph · Accepted Answer · 2010-01-06 05:05:04Z

5

Try

import os
os.path.getsize(filePath)

to get the size of your file, in bytes.

answered Jan 6, 2010 at 5:05

Sapph

6,2081 gold badge31 silver badges32 bronze badges

Comments

YOU · Accepted Answer · 2010-01-06 05:03:18Z

4

os.path.getsize(path)

Return the size, in bytes, of path. Raise os.error if the file does not exist or is inaccessible.

answered Jan 6, 2010 at 5:03

YOU

124k34 gold badges191 silver badges222 bronze badges

Comments

ghostdog74 · Accepted Answer · 2010-01-06 06:22:30Z

2

alternative way

f=open("file")
os.fstat( f.fileno() ).st_size
f.close()

edited Jan 6, 2010 at 6:22

answered Jan 6, 2010 at 5:33

ghostdog74

346k62 gold badges264 silver badges349 bronze badges

Collectives™ on Stack Overflow

how to check the character count of a file in python

5 Answers 5

3 Comments

1 Comment

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

3 Comments

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related