Using subtree-merge strategy, history is not merging

1.2k views Asked by At

I'm trying to use an external SVN repository as a subtree in my repository by 'subtree-merging' it in. I believe that this should keep the history of the files in the library intact, but it's not working - the files from the library that are merged into a subtree in my master branch have no history but for the commit when I add them - here's a history to show what I mean, precisely what I'm going to get to this state follows.

lappy8086:YACYAML jamie$ git log --graph 
* commit 0cc6c4e5061741e67d009f3375ce1d2bcd3ab540
| Author: James Montgomerie
| Date:   Thu May 17 12:04:43 2012 +0100
| 
|     Subtree-merge in libYAML (from a git-svn checkout).
|  
* commit b5af5af109d77f6adafebc3dcf5a4796a5035a2e
Author: James Montgomerie
Date:   Thu May 17 11:47:32 2012 +0100

First commit, add .gitignore.

Here's what I'm doing to try to get this to work:

# check out SVN repo
git svn clone http://svn.pyyaml.org/libyaml/branches/stable libYAML

# create my repo
mkdir YACYAML
cd YACYAML
git init
touch .gitignore
git add .gitignore
git commit -m "First commit, add .gitignore"

# Fetch from git-svn repo I got earlier
git remote add libyaml-svn ../libYAML/
git fetch libyaml-svn
git checkout -b libyaml-svn libyaml-svn/master

# Switch back to master, and try to merge in subtree
git checkout master
git read-tree --prefix=libYAML/ -u libyaml-svn/master
git commit -m "Merge in libYAML as subtree (from git-svn checkout of SVN repo)"

This 'works', but, as I said, when I look at my history I expect to see the full history from the libYAML repo, but I don't - it's as above.

3

There are 3 answers

1
th_in_gs On

Well, one answer was to install git-subtree and use it to:

git subtree add --prefix=libYAML/ ../libYAML master

which results in what I was looking for (and expected) from doing it manually:

lappy8086:YACYAML jamie$ git log --graph
*   commit 453d464cfc140c798d0dea85ab667fe16250181d
|\  Merge: 9fb083d 0ca365a
| | Author: James Montgomerie 
| | Date:   Thu May 17 14:32:36 2012 +0100
| | 
| |     Add 'libYAML/' from commit '0ca365adeb5711bf918d4401e98fce00bab8b3ec'
| |     
| |     git-subtree-dir: libYAML
| |     git-subtree-mainline: 9fb083d923011dd990222da2a58eda42e5220cde
| |     git-subtree-split: 0ca365adeb5711bf918d4401e98fce00bab8b3ec
| |   
| * commit 0ca365adeb5711bf918d4401e98fce00bab8b3ec
| | Author: xi
| | Date:   Sun May 29 05:52:36 2011 +0000
| | 
| |     Bumped the version number and updated the announcement.
| |     
| |     git-svn-id: http://svn.pyyaml.org/libyaml/branches/stable@374 18f92427-320e-0410-9341-c67f048884a3
| |   
| * commit 210b313e5ab158f32d8f09db6a8df8cb9bd6a982
| | Author: xi
| | Date:   Sun May 29 05:29:39 2011 +0000
| | 
| |     Added support for pkg-config.
| |     
| |     git-svn-id: http://svn.pyyaml.org/libyaml/branches/stable@373 18f92427-320e-0410-9341-c67f048884a3
...etc...

I'd still like to know the correct way to do this without the dependency on git-subtree though.

1
jdsumsion On

There appears to be an inconsistency in how git log [--follow] <filename> behaves when a merge includes the reparenting of a tree inside a subdir, like git subtree does.

I ran a few experiments, and if you introduce a synthetic re-parenting commit on the source codeline right before the first subtree merge, then the history starts being reported via git log --follow <filename>.

Options I can see:

  1. Fix git log to follow renames that happen during merges
  2. Change git subtree to create two commits for every add, re-parenting the tree in one commit first, and then merging the reparented commit after that
  3. Work around the problem manually by accomplishing #2 with .git/info/grafts and git filter-branch

Workaround:

$ git log --grep git-subtree-mainline
commit 8789f3c80122d1fc52ff43ab776a7b186f51c3c6
Merge: 0c11300 4757376
Author: John Sumsion <email>
Date:   Wed Apr 17 09:43:21 2013

    Add 'some-subdir/' from commit 'f54875a391499f910eeb8d6ff3e6b00f9778a8ab'

    git-subtree-dir: some-subdir
    git-subtree-mainline: 0c113003278e58d32116c8bd5a60f2c848b61bbb
    git-subtree-split: f54875a391499f910eeb8d6ff3e6b00f9778a8ab
$ git checkout -b fix 
Switched to a new branch 'fix'
$ mkdir -p some-subdir
$ git mv <files> some-subdir
$ git commit -m "Re-parenting files before subtree merge to preserve 'git log --follow' history"
$ echo <orig_merge> <orig_parent> <fixed_merge_parent> >> .git/info/grafts
$ git filter-branch --index-filter true --tag-name-filter cat master

Where the following commits are:

  • orig_merge: 8789f3c80122d1fc52ff43ab776a7b186f51c3c6
  • orig_parent: 0c113003278e58d32116c8bd5a60f2c848b61bbb
  • fixed_merge_parent: sha from the git commit

Unfortunately, subsequent changes merged in via git subtree after the first subtree merge do NOT appear to be reported via git log --follow <filename> even when the first subtree merge is synthetically re-parented.

For some reason, I seem to remember that this was working ok in the Git 1.7.x timeframe, but that is a vague memory from the distant past that I don't have time to research. The above was observed with Git 1.8.3.2.

0
VLRoyrenn On

Adding to what jdsumsion said, subtree-merge (or git subtree, which does the same thing in one step) won't work, since all it does is give you a merge commit that moves all files from the root to your sub-directory. In order to have your file history be maintained, the file would need to always have been at its final location, which would require rewriting all previous commits.

So the way you do that is you don't use git filter-branch, because this is one little bash script that very much does not want you to use it. You should use git-filter-repo instead.

The procedure just involves fetching the external project as its own remote, as with the subtree merge, then making a local tracking branch and rewriting all commits on that branch to have retroactively always used the path you want. You can then just merge the branch into your main project with the unrelated-histories flag.

The use of bash variables is mainly for ease of reuse and readability. I don't expect this to work if you want your sub-directory to contain spaces and the like, but it should be fairly easy to adjust by hand in that sort of case.

export SUBTREE_PREFIX="MySubproject"

git remote add -f "${SUBTREE_PREFIX:?}-remote" https://my-git-repo.invalid/Subproject.git

git checkout "${SUBTREE_PREFIX:?}-remote"/master -b "${SUBTREE_PREFIX:?}-master"

# --force is to skip the "freshly cloned repo" check.
# All the refs we'll be operating on are fresh, even if the repo isn't

# Remove --dry-run once you've checked .git/filter-repo/fast-export.filtered
# to be sure that everything is correct.
git filter-repo --refs "${SUBTREE_PREFIX:?}-master" --to-subdirectory-filter "${SUBTREE_PREFIX:?}" --force --dry-run

git checkout master
git merge "${SUBTREE_PREFIX:?}-master" --allow-unrelated-histories

# Repeat for however many repos you need to add

Speaking for myself, given how the entire point of the manipulation is to group the commit history of multiple repositories into one, I would also want to prefix the commit messages with which subproject these are from, so I can keep track afterwards.

git filter-repo --refs "${SUBTREE_PREFIX:?}-master" --to-subdirectory-filter "${SUBTREE_PREFIX:?}" --message-callback="return message if message.startswith(b'${SUBTREE_PREFIX:?}:') else b'${SUBTREE_PREFIX:?}: ' + message" --force --dry-run

Also, some git servers will deny your branch if you attempt pushing commits that were not committed by you. git rebase normally sets the committer to you while leaving the commit author intact, but here you need to do it manually.

git filter-repo --refs "${SUBTREE_PREFIX:?}-master" --to-subdirectory-filter "${SUBTREE_PREFIX:?}" --commit-callback '
        commit.committer_name = "You"
        commit.committer_email = "[email protected]"
' --message-callback="return message if message.startswith(b'${SUBTREE_PREFIX:?}:') else b'${SUBTREE_PREFIX:?}: ' + message" --force --dry-run

Keep in mind that, unlike with git subtree or a submodule, you won't be able to separately maintain the standalone and altered copies of the project, since they will no longer any history. If this is a third party library you're trying to keep a vendored, up-to-date copy of in your tree, you will find that merging the upstream changes isn't really possible.