git replace file content in history

67 views Asked by At

I would like to replace the content of a file in all branches with the current visible version of that file. I have been trying something like this, but it does not work:

FILE_PATH="$1"

cp "$FILE_PATH" "/tmp/file_to_replace_in_git_history"

git filter-repo --path "$FILE_PATH" --force --blob-callback "
  if filepath == b'$FILE_PATH':
    with open('/tmp/file_to_replace_in_git_history', 'rb') as f:
      blob.data = f.read()
"

rm "/tmp/file_to_replace_in_git_history"

I wondering, but I could not find any ready to use solution. Probably you have a better idea.

I want to remove encrypted files which could contain passwords etc, which should not be there. So the files are binary and should be replaced complete.

3

There are 3 answers

1
phd On BEST ANSWER

Replacing entire file content seems to be the area where git filter-repo has problems. Blob callback doesn't know filename, --replace-text can only replace line by line AFAIU.

I can solve your task with git filter-branch. Yes, I know it's outdated and not recommended. It works on one branch at a time. I use --tree-filter which is extra slow because it checks out every commit, updates it and commits it back. But still here is the working solution:

FILE_PATH="$1"

cp "$FILE_PATH" "/tmp/file_to_replace_in_git_history"

FILTER_BRANCH_SQUELCH_WARNING=1 git filter-branch --tree-filter "
    cp '/tmp/file_to_replace_in_git_history' '$FILE_PATH'
" HEAD

rm "/tmp/file_to_replace_in_git_history"

After that run git log -- "$FILE_PATH" — the log should contain only one commit, the initial commit.

Upd. Process all branches and copy tags:

FILTER_BRANCH_SQUELCH_WARNING=1 git filter-branch --tree-filter "
    cp '/tmp/file_to_replace_in_git_history' '$FILE_PATH'
" --tag-name-filter cat -- --all
1
CodeWizard On

How to prevent it from happening?

  • First let's start with how to prevent it next time.
  • You should user smudge/clean filters to scan and filter your sensitive information.
  • The smudge/clean are filters which are runs whenever you commit file (clean) and checkout file to a working directory (smudge).

Smudge / clean

Read all about it and to set it up here:
https://git-scm.com/book/en/v2/Customizing-Git-Git-Attributes

It turns out that you can write your own filters for doing substitutions in files on commit/checkout.

These are called clean and smudge filters.

In the .gitattributes file, you can set a filter for particular paths and then set up scripts that will process files just before they’re checked out (“smudge”) and just before they’re staged (“clean”).

These filters can be set to do all sorts of fun things.

enter image description here


How to remove sensitive data from the repository

  • You can use git filter-branch or BFG.

BFG

https://rtyley.github.io/bfg-repo-cleaner/

BFG Repo-Cleaner

enter image description here

  • An alternative to git-filter-branch.
  • The BFG is a simpler, faster alternative to git-filter-branch for cleansing bad data out of your Git repository history:

Examples (from the official site)

Replace all passwords listed in a file (prefix lines 'regex:' or 'glob:' if required) with REMOVED wherever they occur in your repository :

$ bfg --replace-text passwords.txt  my-repo.git

Smudge/Clean demo

# Source: https://github.com/nirgeier/git-scripts/blob/master/smudge/remove-localhost/remove-localhost.sh

#!/bin/bash

### Define the desired filters.
### For the simplicity of the demo we use it inline
### In real life it can be any path to actual script

### The ip which we wish to use
### In real life it can be password, ip , token or any other value
DB_IP_LOCAL=127.0.0.1
DB_IP_PROD=10.10.10.10

# Generate the .env file
echo -e "${Cyan}* Initializing\t .env file"
cat << EOF >> .env
## Database
##  * Local:      <Any Value>
##  * Production: 10.10.10.10
database.ip=0.0.0.0

## Feature1
feature1.env=DEV
feature1.key=f1-key
feature1.name=feature1
EOF

### Init the empty repository
echo -e "* Initializing demo repository"

## Init git repo
git init --quiet

### MacOS users should use gsed instead of sed

# Clean is applied when we add file to stage
echo -e "* Define clean filter"
git config --local filter.cleanLocalhost.clean  "gsed -e 's/database.ip=.*/database.ip=${DB_IP_PROD}/g'"

# Smudge is applied when we checkout file
echo -e "* Define smudge filter"
git config --local filter.cleanLocalhost.smudge "gsed -e 's/database.ip=*/database.ip=${DB_IP_LOCAL}/g'"

###  Define the filters 
echo -e "* Adding filters (smudge-clean) to demo repository"
echo '.env text eol=lf filter=cleanLocalhost' > .gitattributes

### Commit the file again after we set up the filter
echo -e "* Adding second commit"
echo 'Second Commit' >> README.md
2
Hasturkun On

Here's a tiny script I wrote using filter-repo to replace a single file's contents, inspired by insert-beginning from the git filter-repo contrib scripts.

#!/bin/bash

SRC_FILE="$1"
TO_REPLACE_PATH="$2"

git filter-repo --commit-callback "for change in commit.file_changes:
    if (change.type != b'D' and change.filename == b'${TO_REPLACE_PATH}'):
        change.blob_id = b'$(git hash-object -w ${SRC_FILE})'"

Note that this handles commits, not blobs. It replaces any modification to the file matching the name with a blob created from the new source file.

Also, note that this isn't using --force currently (though you may need to add that), as the filter-repo documentation strongly suggests that you clone any local repositories with --no-local instead of using --force, which can be risky.