I'm working in a multi-project Gradle project implemented using Spring Boot and Java, and version controlled using Git. One of the subprojects depends on a particular library (let's call it com.example:mylib), declared in that particular subproject's build.gradle file. This library was added a while back, and I would like to find the commit that added it so I can get some context to why it was added, and whether it's still needed. There are no imports in my code for the library, but since Spring Boot's auto-configuration works in part based on which libraries are on the classpath, the absense of direct usage within my code is not sufficient to determine whether or how the library is used.

To add a wrinkle to this scenario, various project restructurings have happened since the start of the project: subprojects have been renamed, moved to child projects, etc. multiple times. The build files themselves have been restructured and reordered as well. Additionally, the Gradle files have switched from Groovy (build.gradle) to Kotlin (build.gradle.kts) and then back again.

build.gradle

// ...
dependencies {
    // ...
    compile 'com.example:mylib'
    // ...
}
// ...

build.gradle.kts

// ...
dependencies {
    // ...
    implementation("com.example:mylib:1.0.5")
    // ...
}
// ...

I know of a few manual ways I could trace back and find the change. For instance, I could use git blame and work my way back through the changes to the line, manually switching files as the renames and restructurings happen. But these approaches can be a bit time consuming and tedious when there have been a lot of changes.

How can I quickly and easily find the commit that added the library?

2

There are 2 answers

0
M. Justin On

git bisect can be used to quickly find the change in an automated fashion.

Given that develop has the library dependency, and abc123 is a commit before the library dependency was added, the following will find the commit that added the dependency:

git bisect start
git bisect new develop
git bisect old abc123
echo "! grep -rI --include \*.gradle\* mylib ." > /tmp/bisect.sh 
chmod u+x /tmp/bisect.sh 
git bisect run /tmp/bisect.sh
git bisect reset

git bisect start starts a new bisect session. We then need to identify at least one one old and one new commit — that is to say, one commit before the change ("old") and one commit after the change ("new"). git bisect new develop specifies develop as a "new" commit, and git bisect old abc123 specifies "abc123" as an "old" commit. (Note that the standard terminology is bad/good, not new/old, but as this is more of a change than a breakage, I'm choosing to use the alternate terms "old" & "new" instead of the standard terms.)

Git will then employ a binary search algorithm, switching to a commit roughly halfway through the old & new commits.

output:

Bisecting: 818 revisions left to test after this (roughly 10 steps)
[1234567890abcdef1234567890abcdef12345678] Committed some stuff

At this point, you could manually identify this commit as old or new (git bisect old/git bisect new), with Git binary searching its way to the commit that introduces the change. However, we can automate this process by having Git check each potential commit with a script by using git bisect run <scriptname.sh>. If the script returns with exit code 0, it's identified as "old"; if it exists with a code between 1 and 127 (other than 125), it's "new".

For this case, we can use the command grep -rI --include \*.gradle\* mylib .. This uses grep to recursively find any non-binary files that match the pattern *.gradle* containing the term "mylib". This will match build.gradle and build.gradle.kts, but skip many other files that we don't care about (thus speeding up the process).

However, the grep command returns the opposite exit code that we want. It returns a successful ("old") response on a match, but we want it to be a failure ("new") response. The command can be negated to the desired value by using the ! reserved word: ! grep -rI --include \*.gradle\* mylib ..

This command is stored to the /tmp/bisect.sh script and made executable:

echo "! grep -rI --include \*.gradle\* mylib ." > /tmp/bisect.sh 
chmod u+x /tmp/bisect.sh 

Now that we have the script, we pass it to git bisect run to get the commit where the dependency was added:

git bisect run /tmp/bisect.sh

output:

abcdefg1234567890abcdefg1234567890abcdef is the first new commit
commit abcdefg1234567890abcdefg1234567890abcdef
Author: John Doe <[email protected]>
Date:   Tue May 26 17:37:25 2020 -0400

    adding a new library for fun and profit
    
    1. add the library
    2. ???
    3. profit

 subproject1/build.gradle                           |   6 +-
 .../nested/subpackage/MyAwesomeClass.java          |  37 -------
 .../nested/subpackage/LessAwesomeClass.java        | 118 +++++++++++++++++++++
 .../nested/BoilerplateJunk.java                    |  32 ++++--
 subproject1/src/main/resources/application.yaml    |  12 +++
 .../another/subpackage/Whatever.java               |  28 +++++
 6 files changed, 189 insertions(+), 44 deletions(-)

Finally, we call git bisect reset to end the bisect session and change back to the branch we were on before we started the bisect session.

0
M. Justin On

git log -S displays only the revisions the number of appearance a given text string appears. The last item in the list will be the revision the text string was first added. In the case of the library, this will be the dependency it was first added (assuming the build file is the only location the dependency has ever appeared).

git log -S myLib

-S<string>

Look for differences that change the number of occurrences of the specified string (i.e. addition/deletion) in a file. Intended for the scripter’s use.

It is useful when you’re looking for an exact block of code (like a struct), and want to know the history of that block since it first came into being: use the feature iteratively to feed the interesting block in the preimage back into -S, and keep going until you get the very first version of the block.

Binary files are searched as well.

If you want to output just the revision the dependency was first added, this can be done with some command piping. One way to do this is:

git log -S mylib --pretty=format:"%H" | tail -1 | xargs git log -1 --stat

output:

commit abcdefg1234567890abcdefg1234567890abcdef
Author: John Doe <[email protected]>
Date:   Tue May 26 17:37:25 2020 -0400

    adding a new library for fun and profit
    
    1. add the library
    2. ???
    3. profit

 subproject1/build.gradle                           |   6 +-
 .../nested/subpackage/MyAwesomeClass.java          |  37 -------
 .../nested/subpackage/LessAwesomeClass.java        | 118 +++++++++++++++++++++
 .../nested/BoilerplateJunk.java                    |  32 ++++--
 subproject1/src/main/resources/application.yaml    |  12 +++
 .../another/subpackage/Whatever.java               |  28 +++++
 6 files changed, 189 insertions(+), 44 deletions(-)