2

I have a local git repo and I'm trying to find a way to get a specific version of my xlsx file into my Python code so I can process it using pandas.

I found gitpython lib; but I'm not sure how to use it correctly.

repo = Repo(path_to_repo)
commit = repo.commit(sha)
targetfile = commit.tree / 'dataset.xlsx'

I don't know what to do next. I tried to load it to pandas using path; but, of course, it just loads my last version.

How to load previous version of xlsx to pandas?

1
  • Why not have Git check out the particular file and/or commit that you like (using git checkout <commit-hash> or git switch --detach <hash> for instance, or git restore to extract one particular file)? Then you can just use your OS's ordinary file facilities to read the file, now that it's not in Git any more. Commented Apr 14, 2022 at 11:10

1 Answer 1

2

When you ask for commit.tree / 'dataset.xlsx', you get back a git.Blob object:

>>> targetfile
<git.Blob "3137d9443f54325b8ad8a263b13053fee47fbff2">

If you want to read the contents of the object, you can extract the content using data_stream method, which returns a file-like object:

>>> data = targetfile.data_stream.read()

Or you can use the stream_data method (don't look at me, I didn't name them), which writes data into a file-like object:

>>> import io
>>> buf = io.BytesIO()
>>> targetfile.stream_data(buf)
<git.Blob "3137d9443f54325b8ad8a263b13053fee47fbff2">
>>> buf.getvalue()
b'The contents of the file...'
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.