7

I'd like to create a git pre-commit hook for my project that runs autopep8 on files modified by the potential commit. I only want to run it on Python files, not the other C++ files, text files, etc. How can I programmatically detect whether a file is a Python file? Not all of the Python files in the repository have the .py extension, so I cannot rely upon that.

4
  • Your files should have imports at the top. As far as I know c++ has #import, so you could check if there are imports or #imports/#includes at the top, which could tell you that the file is a python or a cpp file. Commented Jul 30, 2020 at 23:06
  • 6
    If the Python fils don't have a .py extension, do they at least have a line like #!/usr/bin/python at the top? Commented Jul 30, 2020 at 23:13
  • 3
    You could try parsing the file using Pythons ast module. If it passes then it could be a python file, and if it fails then it’s either a python file with a syntax error or it’s valid python code. Commented Aug 3, 2020 at 2:42
  • Just to be clear, autopep8 does run on non-Python files. I tried it on a Bash script and it totally messed it up. Commented Feb 3, 2023 at 3:05

2 Answers 2

1

You can't.

At least not in such general case and with perfect accuracy. Your best bet is to make sure all your python files in the repo do have .py extension or are disntinguished from other files in some simple, finite amount ways.

Your next best bet is file command.

Sign up to request clarification or add additional context in comments.

1 Comment

Regarding "make sure all your python files in the repo do have .py extension", it might be possible to use symlinks for that, i.e. put the original script with the .py extension in a folder or something then set up a relative symlink without it.
0

I am surprised not to see a solid answer to this. I'm leaning toward:

  1. If it ends in ".py", it's a Python file
  2. If it has a "#! /usr/bin/env python[3]" line, it's a Python file

I know that leaves out things like scripts that hard-code the interpreter, such as:

#! /some/virtual/env/bin/python3

I'm tempted to check for #! followed by the word python anywhere.

If you want to do the same, a first cut (with some debug print statements) can look like:

import os
import re


def is_readable_py_file(filename: str) -> bool:
    """Determine if filename is a python file and return bool."""
    if not os.path.isfile(filename):
        return False

    if os.path.splitext(filename)[1] == ".py":
        return True

    # Allow #!-specified files without ".py" extension                                                                                                                       
    try:
        with open(filename) as infile:
            first_line = infile.readline()
            if re.match(r"\s*#!\s*/usr/bin/env\s\s*python", first_line):
                return True
    except Exception as exc:
        print(f"Caught exception: {exc}")
        print(f"Assuming not a Python file: '{filename}'")

    return False

I expect that no approach is ideal for everybody and I think this is quite crude, but if you just want to copy/paste to get started, have at it!

Oh, the alternative check I'm considering would be (it matches everything the /usr/bin/env one matches, so you can substitute it):

            if re.match(r"\s*#!.*python", first_line):  # python anywhere in shebang
                return True

4 Comments

To be valid, a shebang can't be preceded by whitespace.
You can simplify \s\s* to \s+
ipython could show up in a shebang. To avoid that, You could swap python in the regexes for \bpython (word bound). There are probably other commands that could show up too, but that's the only one I can think of at the moment.
I'm afraid that this is not enough. The py extension is not requred and, as the questioner said, some of his files don't have it. The shebang at the top of the file is also optional. So you can't rely on this neither. Instead, we need to read the file and find a very pythonic way that identify a python file without error and that is common to all or most Python files. For example a construct without curly braces and a statement that ends with : like if condition:, may identify Python. But I don't know if this syntax exists in other languages or if there are better ways to identify py

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.