I am currently trying to obtain the top 100 java repositories having maximum number of stars and less than 100 commits using GitHub Archive and BigQuery. Could you please help to come up with a query for obtaining the top 100 repositories having maximum number of stars.
The final query that I have obtained is:
SELECT repository_name
FROM [githubarchive:github.timeline]
WHERE repository_language = 'Java'
AND PARSE_UTC_USEC(repository_created_at) BETWEEN PARSE_UTC_USEC('1996-01-01 00:00:00') AND PARSE_UTC_USEC('2015-05-30 00:00:00')
GROUP BY repository_name
HAVING COUNT(*) < 100
ORDER BY COUNT(*) DESC
LIMIT 100
I think this query will work for you. Your existing query would not run because the
ORDER BY
clause references an aggregate calculation.ORDER BY
requires the expression to refer to a field. Moving theCOUNT
into theSELECT
clause fixes that part.Furthermore, if you're looking for counts of git commits, you should actually check that the timeline event is a commit by adding
AND payload_commit IS NOT NULL
to theWHERE
clause!