0

I have a simple line that searches for all .json files, except there are about 28k of them, so this one line takes about a minute to complete:

from pathlib import Path

files = list(Path("~/foo/bar/).rglob("*.json"))

Is there a way to create a simple counter that shows the user how many rglob() found so far? Or even a harder solution where there's an estimated progress bar since I know the total number is around 28k (but this number slowly grows in future updates)?

I prefer rich over tqdm, but any solution is better than nothing. Thanks.

2 Answers 2

2

If you are okay with rustic look, then this can be done without any external package following way

from pathlib import Path

def show_count(iterable, start=1, text="items found"):
    for inx, item in enumerate(iterable, start):
        print("\r{} {}".format(inx, text), end="", flush=True)
        yield item

files = list(show_count(Path(".").rglob("*.json")))
print("\nFile search completed")

Explanation: I use enumerate builtin to get indices of items starting at start (default 1). I instruct print to not use newline as end, thus I stay in that line, use carriage return (\r) to go back to start of line, which will result in overwriting of said file and also use flush=True to make text appear immediately.

Sign up to request clarification or add additional context in comments.

Comments

1

You could use Progress (from rich) something like this:

from rich.progress import Progress
from pathlib import Path

PATH = Path("~/foo/bar").expanduser()
ESTIMATED_TOTAL = 25_000


with Progress(transient=True) as progress:
    task = progress.add_task("Finding json files", total=ESTIMATED_TOTAL)
    files: list[Path] = []
    for file in PATH.rglob("*.json"):
        progress.advance(task)
        files.append(file)
    print("Found", len(files), "files")

The key difference here is that you're building your list of JSON files one-by-one rather than calling list() on the rglob() generator

1 Comment

Hmm.. For some reason, the progress ends pretty quickly like in 1 second. I ended up scrapping the estimated_total idea and just using a add_task("foo", total=None) and progress.update(task, advance=1). I also added a TextColumn("{task.completed} json found.") inside Progress(), which showed the running count. Thanks for pointing me in the right direction though!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.