101

Github has a limit on push large file. So if you want to push a large file to your repo, you have to use Git LFS.

I know it's a bad idea to add binary file in git repo. But if I am using gitlab on my server and there is no limit of file size in a repo, and I don't care the repo size to be super large on my server. In this condition, what's the advantage of git lfs?git clone or git checkout will be faster?

7
  • Have you compared the connection speed? Commented Feb 23, 2016 at 10:52
  • 1
    No. I am trying to figure it out in principle. Commented Feb 23, 2016 at 11:55
  • 3
    With git-lfs, clone will be MUCH quicker. Checkout a little longer, the time to download the files put in lfs. But if you REALLY need to checkin some binaries, lfs is the way to do. Commented Feb 23, 2016 at 18:17
  • atlassian.com/git/tutorials/git-lfs Commented Sep 8, 2017 at 19:09
  • 4
    Should clearly distinguish the use case if the large files are modified (heavily) or just static assets in the repo. In case that a large file is just added once, then never modified there is no use of LFS. In case the large files are modified, then the accepted answer apply Commented Feb 5, 2019 at 16:46

1 Answer 1

178

One specificity of Git (and other distributed systems) compared to centralized systems is that each repository contains the whole history of the project. Suppose you create a 100 MB file, modify it 100 times in a way that doesn't compress well. You'll end up with a 10 GB repository. This means that each clone will download 10 GB of data, eat 10 GB of disk space on each machine on which you're making a clone. What's even more frustrating: you'd still have to download these 10 GB of data even if you git rm the big files.

Putting big files in a separate system like git-lfs allow you to store only pointers to each version of the file in the repository, hence each clone will only download a tiny piece of data for each revision. The checkout will download only the version you are using, i.e. 100 MB in the example above. As a result, you would be using disk space on the server, but saving a lot of bandwidth and disk space on the client.

In addition to this, the algorithm used by git gc (internally, git repack) does not always work well with big files. Recent versions of Git made progress in this area and it should work reasonably well, but using a big repository with big files in it may eventually get you in trouble (like not having enough RAM to repack your repository).

Sign up to request clarification or add additional context in comments.

9 Comments

I always spouted on about it slowing down the repo over time, but this is a great concrete example! Thanks for showing how the size compounds as well as the resource consumption!
So, using LFS is only good if you modify those large files frequently? What if I want to keep some software packages in the repo that I use but never modify.?
@sanjivgupta In that scenario LFS will have very few benefits. By having you follow the gitlfs process, you would mark the files as binary; then if the file is accessed with git diff it will prevent it from potentially crashing because of a large file. Additionally, if you do decide to update one of those packages in the future, you will reap the intended benefits of lfs by cloning only the latest versions for the branch from which you are cloning. All that being said, you should use a package manager for that scenario whenever possible.
Can you clarify: does git lfs still store a separate copy of each pointed-to version of the binary file? Or, does it somehow store only changes of the binary file in order to save storage space on the git lfs server, or perhaps even it stores only the latest copy of the binary file, and older versions are lost? I'm trying to understand the storage benefits of binary files on the git lfs server, if any.
@GabrielStaples git lfs stores the complete file as it is (without additional compression) and does not use any diff functions (see this thread for a filetype comparison).
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.