Python enumerate() tqdm progress-bar when reading a file?

Question

I can't see the tqdm progress bar when I use this code to iterate my opened file:

with open(file_path, 'r') as f:
    for i, line in enumerate(tqdm(f)):
        print("line #: %s" % i)
        for j in tqdm(range(line_size)):
            ...

What's the right way to use tqdm here?

hamflow · Accepted Answer · 2024-11-21 13:53:52Z

142

Avoid printing inside the loop when using tqdm. Also, use tqdm only on the first for-loop, and not on the inner for-loop.

from tqdm import tqdm
with open(file_path, 'r') as f:
    for i, line in enumerate(tqdm(f)):
        for j in range(line_size):
            ...

Some notes on using enumerate and its usage in tqdm are available here.

edited Nov 21, 2024 at 13:53

hamflow

3591 silver badge13 bronze badges

answered Mar 14, 2018 at 15:15

Valentino Constantinou

1,5721 gold badge11 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Mateen Ulhaq · Accepted Answer · 2024-01-05 23:32:22Z

35

tqdm is not displaying a progress bar because it does not know the number of lines in the file.

In order to display a progress bar, you will first need to scan the file and count the number of lines, then pass it to tqdm as the total.

from tqdm import tqdm

with open('myfile.txt', 'r') as f:
    num_lines = sum(1 for line in f)

with open('myfile.txt', 'r') as f:
    for line in tqdm(f, total=num_lines):
        print(line)

Reminder: A for loop over the file object f will iterate over lines, reading until the next newline character is encountered.

edited Jan 5, 2024 at 23:32

Mateen Ulhaq

27.8k21 gold badges121 silver badges155 bronze badges

answered Mar 15, 2019 at 18:33

user1446308

4974 silver badges4 bronze badges

Comments

hobs · Accepted Answer · 2025-07-09 16:10:27Z

15

I'm trying to do the same thing on a file containing all Wikipedia articles. So I don't want to count the total lines before starting processing. Also it's a bz2 compressed file, so the len of the decompressed line overestimates the number of bytes read in that iteration, so...

from tqdm import tqdm
from pathlib import Path

with tqdm(total=Path(filepath).stat().st_size) as pbar:
    with bz2.open(filepath) as fin:
        for i, line in enumerate(fin):
            if not i % 1000:
                pbar.update(fin.tell() - pbar.n)
            # do something with the decompressed line
    # Debug-by-print to see the attributes of `pbar`: 
    # print(vars(pbar))

Thank you Yohan Kuanke for your deleted answer. If moderators undelete it you can crib mine.

edited Jul 9 at 16:10

answered Jul 19, 2020 at 3:15

hobs

19.5k10 gold badges91 silver badges112 bronze badges

5 Comments

Ben Page Over a year ago

This gives the right output but I found that calling fin.tell() / pbar.update() for every line of the file dramatically slowed down the iteration speed. Using an if i % 100 == 0: condition to update the pbar less frequently gave me a 10x speedup.

hobs Over a year ago

Excellent idea @BenPage! I'll add your optimization to the answer

Eponymous Over a year ago

You can't use this technique if you use the csv module to read your file (for example, with csv_lines=csv.reader(fin)). You get the error OSError: telling position disabled by next() call when you call fin.tell()

hobs Over a year ago

@Eponymous Yea. The code is designed to work on file pointers, not any arbitrary iterable. You have to apply the enumerate() wrapper and the code in this for loop around the file stream object rather than any other object (such as a csv_reader)... even if it's derived from a file stream. It may not pass through all the methods of a file stream object (such as .tell). You would need to create a generator using this code and put that generator inside the csv_reader parens e.g. csv_reader((... for i, line in enumerate(fin))) .

vpekar May 21 at 9:28

Add at the top: from pathlib import Path

ejkitchen · Accepted Answer · 2022-09-18 18:10:51Z

8

If you are reading from a very large file, try this approach:

from tqdm import tqdm
import os

file_size = os.path.getsize(filename)
lines_read= []
pbar = tqdm.tqdm(total=file_zize, unit="MB")
with open(filename, 'r', encoding='UTF-8') as file:
    while (line := file.readline()):
        lines_read.append(line)
        pbar.update(s.getsizeof(line)-sys.getsizeof('\n'))
pbar.close()

I left out the processing you might want to do before the append(line)

EDIT:

I changed len(line) to s.getsizeof(line)-sys.getsizeof('\n') as len(line) is not an accurate representation of how many bytes were actually read (see other posts about this). But even this is not 100% accurate as sys.getsizeof(line) is not the real length of bytes read but it's a "close enough" hack if the file is very large.

I did try using f.tell() instead and subtracting a file pos delta in the while loop but f.tell with non-binary files is very slow in Python 3.8.10.

As per the link below, I also tried using f.tell() with Python 3.10 but that is still very slow.

If anyone has a better strategy, please feel free to edit this answer but please provide some performance numbers before you do the edit. Remember that counting the # of lines prior to doing the loop is not acceptable for very large files and defeats the purpose of showing a progress bar altogether (try a 30Gb file with 300 million lines for example)

Why f.tell() is slow in Python when reading a file in non-binary mode https://bugs.python.org/issue11114

edited Sep 18, 2022 at 18:10

answered Sep 13, 2022 at 21:28

ejkitchen

6297 silver badges11 bronze badges

2 Comments

Iaoceot Over a year ago

Thanks a lot , I"m confuzzing about how to use tqdm for out of Memory Big file

Barrel Roll Over a year ago

If you import tqdm from tqdm, then remove one of the tqdm from the initial pbar statement-- i.e., pbar = tqdm(total=file_zize, unit="MB").

Ashwin Geet D'Sa · Accepted Answer · 2020-08-26 14:17:00Z

2

In the case of reading a file with readlines(), following can be used:

from tqdm import tqdm
with open(filename) as f:
    sentences = tqdm(f.readlines(),unit='MB')

the unit='MB' can be changed to 'B' or 'KB' or 'GB' accordingly.

answered Aug 26, 2020 at 14:17

Ashwin Geet D'Sa

7,4583 gold badges40 silver badges66 bronze badges

Collectives™ on Stack Overflow

Python enumerate() tqdm progress-bar when reading a file?

5 Answers 5

Comments

Comments

5 Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

5 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related