While working on project recently, I discovered that one of my collaborators had accidentally committed to the repository a backup folder, which was around 400mb large. Though I could delete it, the record of that would still exist in the history, which was why the .git folder for the project was at 550mb large, which is much bigger than the rest of the actual files.

To the rescue came a 3rd party tool called Git Filter Repo. Here’s how it’s described:

git filter-repo is a versatile tool for rewriting history, which includes capabilities I have not found anywhere else. It roughly falls into the same space of tool as git filter-branch but without the capitulation-inducing poor performance, with far more capabilities, and with a design that scales usability-wise beyond trivial rewriting cases. git filter-repo is now recommended by the git project instead of git filter-branch.

I was very interested in the performance aspect, as when I had attempted to use git filter-branch to clean up the repository, the process took over 25 minutes to complete. Using Git Filter-Repo instead took under 10 seconds.

Besides bloated file sizes, this tool can also be used in case you need to remove sensitive files (like passwords) that you could accidentally send to someone who you share the repo with.

It’s super easy to install.

For my local mac, I simply used the command brew install git-filter-repo

For the linux servers that host the website, I used a python installer, python3 -m pip install --user git-filter-repo

next, since wp-content/uploadsBU/ is the folder I wanted to remove entirely, I used the command git filter-repo --path wp-content/uploadsBU/ --invert-paths --force.

This brought the /.git/ folder to a reasonable 161mb.

I discovered accidentally that this has the unfortunate effect of pretty much deleting your /.git/config/ file, so I had to re-add my remotes and branch associations.

Then I force pushed to origin, repeated the steps on the servers I controlled, and instructed my teammates to do the same for their own local copies.

I still have to learn:

  • why does this delete the config file?
  • how can I have this history purge automatically take effect on my peers repos at their next fetch?

If anyone has the answers to that, please leave a comment below :)

Leave a Reply

Your email address will not be published. Required fields are marked *

Post comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.