I want to query the latest remote HEAD commit for 10000+ https git repositories, every hour. Basically this x10000:
remote=https://github.com/torvalds/linux
git ls-remote $remote HEAD | awk '{print $1}'
Most, but not all, remotes are on a single server (github.com). I do not want to use the github API, because some repos are not on github, and because the API has limits.
Hence I want to use the git https remote protocol, but I would prefer to implement this with (lib)curl
instead of git
to get more control over the https settings, and hopefully do requests in parallel over the same connection.
Where can I find more information about what http request git ls-remote
is making under the hood (using the "smart" git protocol), such that I can perform the same call with libcurl?
I had a look at HTTP transfer protocols spec and docs on the Git-Internals-Transfer-Protocols but this is very generic, and doesn't go into the details of ls-remote
.
I think the Discover references section of the http protocol documentation is what you want.
If you're interacting with GitHub, you need to use the "smart" protocol, because:
So, following the documentation, we need to run:
This produces binary output, which
curl
will by default not display on your terminal. If we dump it to a file (-o refs.txt
) and then inspect the file, we see we have almost exactly the output ofgit ls-remote
.Compare:
And:
There's some protocol data there you would need to decode based on the documentation, but otherwise this provides you with the same list of references as
git ls-remote
.Most servers should support the "smart" protocol.