Remove a file from whole git history

969 views Asked by At

I know this question has already been asked, but in every answer, I found the situation is slightly different from mine and I don't see how to adapt it.

So here is the problem:

I cloned a repository and added a folder to work in it. In this folder, I added .csv files and .py files that use the csv ones. I tried to push this but realised it was taking to long as 2 csv files are very big. So i

git rm files

and then commit. I tried to push again and only then realised that removing a file doesn't remove it from the git history .// So now, from the last completed push, I have 2 commits: 1 where I added the files, 1 where I deleted some .csv.

I would like your help to delete the last 2 commits. Is that feasible? Thanks

3

There are 3 answers

0
eftshift0 On

filter-branch, as has been advised is fine if we are talking about biiiig histories. If we are talking of only a handful of revisions, you can do it (remove the files) by just amend the revision where you added the file and cherry-pick, or rebase interactive.

One example..... say I added file a.txt on master~2. I don't want it on the history anymore.

git checkout master~2 git rm --cached a.txt git commit --amend --no-edit git cherry-pick master~2..master git branch -f master # point master in this revision git checkout master

That should be enough.

0
torek On

... I would like ... to delete the last 2 commits. Is that feasible?

You can't quite delete commits, but you can easily tell Git to forget them.

The way this works is pretty simple, in the end. We start by noting that each commit saves a snapshot, and also stores the hash ID of its parent commit (along with your commit log message and your name as author and so on). This forms a backwards-pointing chain of commits.

If we let single uppercase letters stand in for commit hash IDs, we can draw this chain:

... <-F  <-G  <-H   <--master

Note that the branch name, master in this case, stores the hash ID of the last commit in the chain. (When something stores the hash ID of a commit, we say that this thing points to the commit, hence the arrows. The name master points to H, H points to G, and so on.)

The way Git finds these commits is to read the hash ID of H out of master, which locates commit H, then read commit H and show it. Then, having read H, Git has the hash ID of commit G, so Git can read G and show it, and so on.

When we make a new commit, Git actually does this by:

  • writing out the snapshot;
  • writing out the author and log message and so on;
  • having the new point back to the current commit;
  • and last, but most important, writing the hash ID of the new commit into the branch name.

So if we had:

...--F--G--H

and we added --I:

...--F--G--H--I

then Git has changed the name master to store the hash ID of commit I. Eventually we have:

...--F--G--H--I--J   <-- master

If we made several unwanted commits, we can tell Git: Re-set the name master to point to commit H instead of commit J. There are several ways to do that, but the first one to reach for, in this case, is git reset --hard (while we have master checked out, and be sure you don't have anything you are concerned with losing, because git reset --hard tells Git to throw everything out):

git checkout master
git reset --hard HEAD~2

The ~2 suffix tells Git to count back two steps—technically, two first parent steps, which matters when we have some merge commits in our chain, but here, we don't so it does not matter. If master currently points to J, that has Git count back twice: J to I, then I to H. Git then replaces our work with the contents from commit H and makes the name, master, point to H instead of J:

             I--J
            /
...--F--G--H   <-- master

Now that J is hard to find, it appears to be deleted.

The drawback to this is that if we've had our Git tell some other Git: Here, take copies of commits I and J, that other Git has the two commits and will re-introduce them to our own Git even after our Git has forgotten them. But if we have never successfully sent the two commits anywhere else, we're the only one who has them, so if we forget them, they're as good as gone.

(If we have pushed them, we can have our Git, and their Git, and every other Git that has picked them up since then, all forget them, and then they will be gone. But obviously this gets hard quickly.)

0
Romain Valeri On

I find the first example in the git filter-branch doc very fitting to your context. Take a look (source) :

Suppose you want to remove a file (containing confidential information or copyright violation) from all commits:

git filter-branch --tree-filter 'rm filename' HEAD
# and see also the variant further in the example description
git filter-branch --index-filter 'git rm --cached --ignore-unmatch filename' HEAD

(see the details on the doc page, I refrained from copy-pasting the whole thing here)