Is it possible to remote access and parse git revision history?

82 views Asked by At

I have a usecase where I need to be able to inspect Git repositories as part of a web service and the average repo size will be very large - 1GB+ due to being used for video game projects. I need to do simple actions such as listing the revision history, etc.

Right now I'm implementing it via API calls to the remote Git host services (Github, Bitbucket, etc). This works okay, however there are some great Git projects like GitVersion that only work with real Git repos, that use libGit2sharp, and I cannot easily write a work around for.

I feel like this'll be a longshot, but I was wondering if anyone has discussed or begun work upon an implementation of libGit2sharp that works with the major Git hosts via their API's. Obviously not all actions available in libGit2 will work with an API interface, but at least most read-only actions should be.

If this is an entirely new feature request - I'd like to get the opinion of someone with knowledge of the libGit2sharp codebase about how difficult such a feature request would be to implement.

1

There are 1 answers

1
Carlos Martín Nieto On BEST ANSWER

Git only specifies the network protocol for fetching, pushing and creating an archive. Nothing else can be done via the Git protocol (and providers will likely disable the archive so they can leverage their existing caching solutions).

If this is an entirely new feature request - I'd like to get the opinion of someone with knowledge of the libGit2sharp codebase about how difficult such a feature request would be to implement.

This feature would be out of scope and impossible as Git does not provide a way to perform these tasks.

Once you're trying not to do Git, then you're out of the Git world into each provider's API. Trying to replicate Git operations and git commands on top of each provider's API is a whole project unto itself, and one which is likely to get you to hit these provider's API limits, as in-depth analysis of the repositories is not generally why they provide these services.

Not to mention that looking up each necessary object over HTTP would be extremely slow and you'd likely not gain anything over grabbing a Gigabyte or two from the network.

But if all you need is a few questions that can be easily answered from the APIs themselves (say, latest commit and its relationship to different branches), and you do need the logic in GitVersion, then you're probably better off making its history analysis pluggable so you can put in the data from your API lookups.

I'm not familiar with how GitVersion makes its decisions, but if it doesn't just want references and their relationships to each other and the tags, but rather it wants to look at the repositories themselves, and you do need it rather than just replicate some of its logic, I would recommend to download the repositories and perform all the analysis there. It'll be a much more efficient use of time to rent a bit of disk space from some provider than try to fit each individual provider's API into some idealised version of a git command where you then still need to figure out the edge cases of both the command and its API you're using.