The GitHub Archive project states
GitHub Archive is a project to record the public GitHub timeline, archive it, and make it easily accessible for further analysis.
This archive is also queryable through Google Big Query. However, it looks like that I'm either missing something or only a portion of the data is available.
Indeed, running the following query only returns 1636
WatchEvents (started or stopped), whereas the Rails repository accounts more than 14300
watchers.
SELECT actor_attributes_login, created_at, payload_action
FROM [githubarchive:github.timeline]
where repository_name = "rails"
and type="WatchEvent"
order by created_at asc;
It looks like the oldest retrieved piece of data is more or less 2.5 months old.
Would the data be truncated (which might seem strange for an archive)? Is there a limit/quota I wouldn't know of related to the use of BigQuery?
That's correct. The project/crawler went live on March 11th of this year, hence the current archive starts on that day. There is a note about this on the githubarchive.org page, but I guess I should make it more visible and explicit.
There is a thread with the GitHub team about making more of their history available, but I don't have an ETA for it yet. fingers crossed :-)