Does invoking a system call like statfs with Python subprocess use less overhead than invoking a C utility like df?

Question

Unix-based system. I'm trying to use as little overhead as possible right now in the code I'm working on (it's in a resource constrained space). In this particular code, we are gathering some basic disk usage stats. One suggestion was to replace a call to df with statfs since df is a C utility that requires its own subprocess to run whereas statfs is a system call which presumably uses less overhead (and is what df calls anyway).

We're calling df with Python's subprocess.check_output() command:

import subprocess

DF_CMD = ["df", "-P", "-k"]


def get_disk_usage() -> str:
    try:
        output = subprocess.check_output(DF_CMD, text=True)
    except subprocess.CalledProcessError as e:
        raise RuntimeError(f"Failed to execute {DF_CMD} " + str(e)) from e

    return output

I want to hard code our mount points (which we decided we're okay with) and replace the call to df with a call to statfs <mountpoint> in the above code. However, I'm unsure if calling with the same Python function will actually reduce overhead. I plan to use a profiler to check it, but I'm curious if anyone knows enough about the inner workings of Python/Unix to know what's going on under the hood?

And to be clear: by "overhead" I mean CPU and memory usage on the OS/machine.

Yes, you'll avoid the overhead of spawning a new process. Whether that overhead is significant enough to worry about is another matter. (Aside from the process overhead, there's also the question of whether the system call saves you from having to parse the text output of df, or if you'll have to reconstruct the text output provided by df from the raw data provided by os.statvfs. You are still going to have to profile your actual replacement.) — chepner
– chepner, Commented Jan 22, 2024 at 18:18

KamilCuk · Accepted Answer · 2024-01-22 20:14:43Z

However, I'm unsure if calling with the same Python function will actually reduce overhead

Spawning a new process - fork and execve - are generally extremely costly syscalls. They are the reason why the shell is so slow - almost every functionality in the shell is a separate process, and the shell also spawns subshells in many contexts. Nowadays, computers are anyway magnitudes extremely faster, the cost of spawning a new process is negligible. There are thousands of processes on nowadays computers.

Yes, replacing subprocess with os.statvfs will reduce the overhead. Unless you are working on a really really resource constrained device, like, I don't know, 64MB of memory, it is usually not worth the time, but it is very nice to do to make the code self-contained and clean and reduce the amount of possible errors. Python is "very" memory consuming anyway, so the act of running it already implicates to me that you probably have more than enough resources to spawn a single subprocess.

Collectives™ on Stack Overflow

Does invoking a system call like statfs with Python subprocess use less overhead than invoking a C utility like df?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related