106

I can't see the tqdm progress bar when I use this code to iterate my opened file:

with open(file_path, 'r') as f:
    for i, line in enumerate(tqdm(f)):
        print("line #: %s" % i)
        for j in tqdm(range(line_size)):
            ...

What's the right way to use tqdm here?

0

5 Answers 5

142

Avoid printing inside the loop when using tqdm. Also, use tqdm only on the first for-loop, and not on the inner for-loop.

from tqdm import tqdm
with open(file_path, 'r') as f:
    for i, line in enumerate(tqdm(f)):
        for j in range(line_size):
            ...

Some notes on using enumerate and its usage in tqdm are available here.

Sign up to request clarification or add additional context in comments.

Comments

35

tqdm is not displaying a progress bar because it does not know the number of lines in the file.

In order to display a progress bar, you will first need to scan the file and count the number of lines, then pass it to tqdm as the total.

from tqdm import tqdm

with open('myfile.txt', 'r') as f:
    num_lines = sum(1 for line in f)

with open('myfile.txt', 'r') as f:
    for line in tqdm(f, total=num_lines):
        print(line)

Reminder: A for loop over the file object f will iterate over lines, reading until the next newline character is encountered.

Comments

15

I'm trying to do the same thing on a file containing all Wikipedia articles. So I don't want to count the total lines before starting processing. Also it's a bz2 compressed file, so the len of the decompressed line overestimates the number of bytes read in that iteration, so...

from tqdm import tqdm
from pathlib import Path

with tqdm(total=Path(filepath).stat().st_size) as pbar:
    with bz2.open(filepath) as fin:
        for i, line in enumerate(fin):
            if not i % 1000:
                pbar.update(fin.tell() - pbar.n)
            # do something with the decompressed line
    # Debug-by-print to see the attributes of `pbar`: 
    # print(vars(pbar))

Thank you Yohan Kuanke for your deleted answer. If moderators undelete it you can crib mine.

5 Comments

This gives the right output but I found that calling fin.tell() / pbar.update() for every line of the file dramatically slowed down the iteration speed. Using an if i % 100 == 0: condition to update the pbar less frequently gave me a 10x speedup.
Excellent idea @BenPage! I'll add your optimization to the answer
You can't use this technique if you use the csv module to read your file (for example, with csv_lines=csv.reader(fin)). You get the error OSError: telling position disabled by next() call when you call fin.tell()
@Eponymous Yea. The code is designed to work on file pointers, not any arbitrary iterable. You have to apply the enumerate() wrapper and the code in this for loop around the file stream object rather than any other object (such as a csv_reader)... even if it's derived from a file stream. It may not pass through all the methods of a file stream object (such as .tell). You would need to create a generator using this code and put that generator inside the csv_reader parens e.g. csv_reader((... for i, line in enumerate(fin))) .
Add at the top: from pathlib import Path
8

If you are reading from a very large file, try this approach:

from tqdm import tqdm
import os

file_size = os.path.getsize(filename)
lines_read= []
pbar = tqdm.tqdm(total=file_zize, unit="MB")
with open(filename, 'r', encoding='UTF-8') as file:
    while (line := file.readline()):
        lines_read.append(line)
        pbar.update(s.getsizeof(line)-sys.getsizeof('\n'))
pbar.close()

I left out the processing you might want to do before the append(line)

EDIT:

I changed len(line) to s.getsizeof(line)-sys.getsizeof('\n') as len(line) is not an accurate representation of how many bytes were actually read (see other posts about this). But even this is not 100% accurate as sys.getsizeof(line) is not the real length of bytes read but it's a "close enough" hack if the file is very large.

I did try using f.tell() instead and subtracting a file pos delta in the while loop but f.tell with non-binary files is very slow in Python 3.8.10.

As per the link below, I also tried using f.tell() with Python 3.10 but that is still very slow.

If anyone has a better strategy, please feel free to edit this answer but please provide some performance numbers before you do the edit. Remember that counting the # of lines prior to doing the loop is not acceptable for very large files and defeats the purpose of showing a progress bar altogether (try a 30Gb file with 300 million lines for example)

Why f.tell() is slow in Python when reading a file in non-binary mode https://bugs.python.org/issue11114

2 Comments

Thanks a lot , I"m confuzzing about how to use tqdm for out of Memory Big file
If you import tqdm from tqdm, then remove one of the tqdm from the initial pbar statement-- i.e., pbar = tqdm(total=file_zize, unit="MB").
2

In the case of reading a file with readlines(), following can be used:

from tqdm import tqdm
with open(filename) as f:
    sentences = tqdm(f.readlines(),unit='MB')

the unit='MB' can be changed to 'B' or 'KB' or 'GB' accordingly.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.