1

I've a python script which loads data from 10k - 12k files and performs some operations. Sometime, this process takes hours. This such cases, I would like to see how much progress has been made by the python script.

Let's say if I'm loading 10,000 files using for loop, I don't want to do something like:

if n % 100 == 0:
    print("%d steps completed!" % n)

as this will unnecessarily evaluate the if condition for thousands of times. I know that the total cost of this if statement will still be small compared to hours it takes to run the script, however, I was curious if python has any efficient feature to keep track of the progress.

8
  • Any such feature would involve an evaluation of the progress, and so would be no better than your simple print() every 100 files. You have correctly recognized that the if statement will cost next to nothing compared to all the IO you're doing, so you're trying to solve a non-issue. Commented Apr 11, 2022 at 15:49
  • If you have concerns about efficiency or performance, the best way is just to check on your end. You can load 100 files and check performance with and without using if's. Of course, I believe a simple condition will have no discernable impact on performance. Commented Apr 11, 2022 at 15:55
  • The condition takes about 40 ns on my machine and any basic operations like n+1 takes already 30 ns. This is close to the minimum time of an instruction taken by the CPython interpreter. If you want a faster code, then you definitively need not to use Python (and more especially CPython). Loading a file should take far more than 1000 ns whatever the target system (it require 3 expensive syscalls). Commented Apr 11, 2022 at 15:57
  • I think another useful thing to consider is a way to keep track of which files have be processed and which haven't. This way, if your script ends up failing, you can always know what else needs to be processed. For this, consider the sqlite3 module Commented Apr 11, 2022 at 15:58
  • 1
    @smac89 I’m pretty sure that any call to sqlite3 would result in higher time than the print (or logging). And then it would be executed at any step like the « if » statement. Commented Apr 11, 2022 at 16:01

1 Answer 1

1

Try using tqdm library, like this:

from tqdm import tqdm

for i in tqdm(range(<your_cycle_range>)):
    <your operations>

instead of

for i in range(<your_cycle_range>)):
    ....
    ....
    if n % 100 == 0:
        print("%d steps completed!" % n)

PS: this is given that you have a cycle inside your python script.

Sign up to request clarification or add additional context in comments.

6 Comments

How is this more efficient than a simple if?
@PranavHosangadi Nothing can be more efficient than if. But if author asks about other ways to implement - tqdm just makes it prettier
@PranavHosangadi - The simple if of the example doesn't give a hint of how far there is to go. tqdm does that in several different formats. tqdm is a very common way to give feedback and it makes a lot of sense to make it an answer here.
This example assumes that you already have a cycle. Read ps. There's no need of any further statements. @PranavHosangadi
@tdelaney sure this answers "how do I display progress?", but in the context of OP's question where they are worried an if statement is too expensive, I don't think something that displays a whole progress bar is a good solution. The answer should mention this, otherwise it gives the impression that tqdm will add less overhead than an if.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.