Git Squash by author - All author commits into a single commit

6.8k views Asked by At

I am trying squash many commits into a single one, the problem is that I need do that by author (name or email).

The case:

Lets say I have a branch called feature-a, in this branch I have many commits for many authors. How can I squash all commits by author (email for example) into a single commit. I want do that to be able to merge all author commits into master.

Any help here?

Thanks in advance

3

There are 3 answers

2
VonC On

With Kenkron's caveats in mind, you could do a:

SORTED_GIT_LOGS=$(git log --pretty="format:%an %H" master..feature_a | sort -g | cut -d' ' -f2); \
IFS=$(echo -en "\n\b"); for LOG in $SORTED_GIT_LOGS; do \
    git cherry-pick $LOG; \
done | less

The git log --pretty="format:%an %H" master..feature_a | sort -g would sort the logs of the feature_a commits (not the ones from master because of the master..feature_a syntax)

You would still need to do an interactive rebase to squash the (now ordered by author) commits on master.

0
Kenkron On

Be careful rewriting history

The end result you want might be possible if you create branches for each author, cherry-pick the commits from each author into the right branch, then squash those changes. However, I don't think that will work if these commits meaningfully depend on each other.

If you have a series of commits:

            Author1                Author2                Author1
version1 ---commit---> version2 ---commit---> version3 ---commit--->...

If you were to try to extract the changes from Author2, and apply them to version1, there's a good chance it won't make sense (For example, if Author2 modifies code that Author1 created).

4
John Vandenberg On

I needed to do a similar rewrite on an unnecessarily large repository while the repo was offline. The approach I took was trying automated 'interactive' rebase using GIT_SEQUENCE_EDITOR which is covered in this answer by @james-foucar & @pfalcon.

For this to work well, I found it better to first remove the merges from the section of the history being rewritten. For my own case, this was done using lots of git rebase --onto which is covered amply in other questions on StackOverflow.

I created a small script generate-similiar-commit-squashes.sh to generate the pick & squash commands so that consecutive similar commits would be squashed. I used author-date-and-shortlog to match similar commits, but you only need author (my gist has a comment about how to make it match only on author).

$ generate-similiar-commit-squashes.sh > /tmp/git-rebase-todo-list

The output looks like

...
pick aaff1c556004539a54a7a33ce2fb859af0c4238c [email protected]
squash aa190ea2323ece42f1cd212041bf61b94d751d5c [email protected]
pick aab8c98981a8d824d2bc0d5278d59bc1a22cc7b0 [email protected]_config.yml

The repository was also full of self-reverts with the same style 'Update xyz' commit messages. When squashed, they resulted in empty commits.

The commits I was merging had identical commit messages. git rebase -i offers a revised commit message with all squashed commit messages appended, which would have been repetitive. To address that, I used a small perl script from this answer to remove duplicate lines from the commit message offered by git rebase. It is better in a file, as it will be used in a shell variable.

$ echo 'print if ! $x{$_}++' > /tmp/strip-seen-lines.pl

Now for the final step:

$ GIT_EDITOR='perl -i -n -f /tmp/strip-seen-lines.pl ' \
  GIT_SEQUENCE_EDITOR='cat /tmp/git-rebase-todo-list >' \
  git rebase --keep-empty -i $(git rev-list --max-parents=0 HEAD)

Despite using --keep-empty, git complained a few times through this process about empty commits. It would dump me out to the console with an incomplete git rebase. To skip the empty commit and resume processing, the following two commands were needed (rather frequently in my case).

$ git reset HEAD^
$ GIT_EDITOR='perl -i -n -f /tmp/strip-seen-lines.pl ' git rebase --continue

Again despite --keep-empty, I found I had no empty commits in the final git history, so the resets above had removed them all. I assume something is wrong with my git, version 2.14.1 . Processing ~10000 commits like this took just over 10 minutes on a crappy laptop.