Finding the first commit on a branch with GitPython

6.4k views Asked by At

I'm writing a git post-receive hook using Python and Git-Python that gathers information about the commits contained in a push, then updates our bug tracker and IM with a summary. I'm having trouble in the case where a push creates a branch (i.e. the fromrev parameter to post-receive is all zeroes) and also spans several commits on that branch. I'm walking the list of parents backwards from the torev commit, but I can't figure out how to tell which commit is the first one in the branch, i.e. when to stop looking.

On the command line I can do

git rev-list this-branch ^not-that-branch ^master

which will give me exactly the list of commits in this-branch, and no others. I've tried to replicate this using the Commit.iter_parents method which is documented to take the same parameters as git-rev-list but it doesn't like positional parameters as far as I can see, and I can't find a set of keyword params that work.

I read the doco for Dulwich but it wasn't clear whether it would do anything very differently from Git-Python.

My (simplified) code looks like this. When a push starts a new branch it currently only looks at the first commit and then stops:

import git
repo = git.Repo('.')
for line in input:
    (fromrev, torev, refname) = line.rstrip().split(' ')
    commit = repo.commit(torev)
    maxdepth = 25    # just so we don't go too far back in the tree
    if fromrev == ('0' * 40):
        maxdepth = 1
    depth = 0
    while depth < maxdepth:
        if commit.hexsha == fromrev:
            # Reached the start of the push
            break
        print '{sha} by {name}: {msg}'.format(
            sha = commit.hexsha[:7], user = commit.author.name, commit.summary)
        commit = commit.parents[0]
        depth += 1
3

There are 3 answers

1
chlunde On

I just played around with dulwich, maybe there's a much better way to do this (with a builtin walker?). Assuming there's just one new branch (or multiple new branches with nothing in common):

#!/usr/bin/env python
import sys
from dulwich.repo import Repo
from dulwich.objects import ZERO_SHA


def walk(repo, sha, shas, callback=None, depth=100):
    if not sha in shas and depth > 0:
        shas.add(sha)

        if callback:
            callback(sha)

        for parent in repo.commit(sha).parents:
            walk(repo, parent, shas, callback, depth - 1)


def reachable_from_other_branches(repo, this_branch):
    shas = set()

    for branch in repo.refs.keys():
        if branch.startswith("refs/heads") and branch != this_branch:
            walk(repo, repo.refs[branch], shas)

    return shas


def branch_commits(repo, fromrev, torev, branchname):
    if fromrev == ZERO_SHA:
        ends = reachable_from_other_branches(repo, branchname)
    else:
        ends = set([fromrev])

    def print_callback(sha):
        commit = repo.commit(sha)
        msg = commit.message.split("\n")[0]
        print('{sha} by {author}: {msg}'
              .format(sha=sha[:7], author=commit.author, msg=msg))

    print(branchname)
    walk(repo, torev, ends, print_callback)


repo = Repo(".")
for line in sys.stdin:
    fromrev, torev, refname = line.rstrip().split(' ')
    branch_commits(repo, fromrev, torev, refname)
0
jrial On

Using pure Git-Python, it can also be done. I have not found a way to identify a set of kwargs that would do it in one go either. But one can simply construct a set of shas of the master branch, then use iter_commits on the to-be-examined branch in order to find the first one that doesn't appear in the parent:

from git import *

repo_path = '.'
repo = Repo(repo_path)
parent_branch = repo.branches.master
examine_branch = repo.branches.test_feature_branch

other_shas = set()
for parent_commit in repo.iter_commits(rev=parent_branch):
    other_shas.add(parent_commit.hexsha)
for commit in repo.iter_commits(rev=examine_branch):
    if commit.hexsha not in other_shas:
        first_commit = commit

print '%s by %s: %s' % (first_commit.hexsha[:7],
        first_commit.author.name, first_commit.summary)

And if you really want to be sure to exclude all commits on all other branches, you can wrap that first for-loop in another for-loop over repo.branches:

other_shas = set()
for branch in repo.branches:
    if branch != examine_branch:
        for commit in repo.iter_commits(rev=branch):
            other_shas.add(commit.hexsha)
  • Caveat 1: the 2nd approach shows the first commit that does not appear on any other branch, which is not necessarily the first commit on this branch. If feat_b is branched off from feat_a, which comes from master, then this will show the first commit on feat_a after feat_b has been branched off: the rest of feat_a's commits are already on feat_b.
  • Caveat 2: git rev-list and both of these solutions only work as long as the branch isn't merged back into master yet. You're literally asking it to list all commits on this branch but not on the other.
  • Remark: the 2nd approach is overkill and takes a fair bit more time to complete. A better approach is to limit the other branches to a list of known merge branches, should you have more than just master.
0
jelmer On

Something like this will find the first commit:

x = Repo('.')
print list(x.get_walker(include=[x.head()]))[-1].commit

(Note that this will use O(n) memory for large repositories, use an iterator to get around that)