Retrieving languages and stargazers of GitHub repos

121 views Asked by At

I'm new to SQL and GitHubArchieve and trying to get the list of languages and stargazers of the popular repositories on GitHub.

The information I'm looking for are repo id, repo languages (languages + percentage), repo stargazers (and their timezones).

So, my solution to this was by retrieving the most popular repositories and then parsing the languages_url and stargazers_url to get their information.

I started with the following query to retrieve the popular repositories:

SELECT
  repo.name,
  repo.id,
  repo.url,
  payload,
  COUNT(*) watch_count,
  JSON_EXTRACT_SCALAR(payload, '$.watch.action.repository.languages_url') AS repo_languages_url 
FROM [githubarchive:month.201601],
WHERE type = 'WatchEvent'
GROUP BY 1,2,3,4,6
HAVING watch_count >= 2000 ORDER BY watch_count DESC
LIMIT 1000

The payload for the watch events was just: {"action":"started"} so I didn't get any information about the languages and the contributors of the repositories.

Any suggestions to get the information I'm looking for? Is there any way for retrieving that information without parsing?

Thanks in advance

0

There are 0 answers