I am using the R package git2r to interface with libgit2. I would like to obtain the list of files that were updated in each commit, similar to the output from git log --stat
or git log --name-only
. However, I am unable to obtain the files that were included in the initial commit. Below I provide code to setup an example Git repository as well as my attempted solutions based on my research.
Reproducible example
The code below creates a temporary directory in /tmp
, creates empty text files, and then commits each file separately.
# Create example Git repo
path <- tempfile("so-git2r-ex-")
dir.create(path)
setwd(path)
# Set the number of fake files
n_files <- 3
file.create(paste0("file", 1:n_files, ".txt"))
library("git2r")
repo <- init(".")
for (i in 1:n_files) {
add(repo, sprintf("file%d.txt", i))
commit(repo, sprintf("Added file %d", i))
}
Option 1 - compare diff of two trees
This SO post recommends you perform a diff comparing the tree object of the desired commit and its parent commit. This works well, except for the initial commit because there is no parent commit to compare it to.
get_files_from_diff <- function(c1, c2) {
# Obtain files updated in commit c1.
# c2 is the commit that preceded c1.
git_diff <- diff(tree(c1), tree(c2))
files <- sapply(git_diff@files, function(x) x@new_file)
return(files)
}
log <- commits(repo)
n <- length(log)
for (i in 1:n) {
print(i)
if (i == n) {
print("Unclear how to obtain list of files from initial commit.")
} else {
files <- get_files_from_diff(log[[i]], log[[i + 1]])
print(files)
}
}
Option 2 - Parse commit summary
This SO post suggests obtaining commit information like the files changed by parsing the commit summary. This gives very similar to git log --stat
, but again the exception is the initial commit. It lists no files. Looking at the source code, the files in the commit summary are obtained via the same method above, which explains why no files are displayed for the initial commit (it has no parent commit).
for (i in 1:n) {
summary(log[[i]])
}
Update
This should be possible. The Git command diff-tree
has a flag --root
to compare the root commit to a NULL tree (source). From the man page:
--root When --root is specified the initial commit will be shown as a big creation event. This is equivalent to a diff against the NULL tree.
Furthermore, the libgit2 library has the function git_diff_tree_to_tree, which accepts a NULL tree. Unfortunately, it is unclear to me if it is possible to pass a NULL tree to the git2r C function git2r_diff via the git2r diff method for git-tree objects. Is there a way to create a NULL tree object with git2r?
> tree()
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘tree’ for signature ‘"missing"’
> tree(NULL)
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘tree’ for signature ‘"NULL"’
I came up with a solution based on the insight from my colleague that you can obtain the files currently being tracked by inspecting the
git_tree
object. This shows all the files that have been tracked up to this point, but since the root commit is the first commit, this means these files had to be added in that commit.The summary method prints the files, and this data frame can be captured using the
as
method.The function below obtains the files from the root commit. While it is not apparent in this small example, the main complication is that subdirectories are represented as trees, so you need to recursively search the tree to obtain all the filenames.