I am currently trying to obtain the top 100 java repositories having maximum number of stars and less than 100 commits using GitHub Archive and BigQuery. Could you please help to come up with a query for this purpose.
The initial query I have written to obtain the top 100 records for Java language as follows:
SELECT repository_name
FROM [githubarchive:github.timeline]
WHERE repository_language = 'Java'
AND PARSE_UTC_USEC(repository_created_at) >= PARSE_UTC_USEC('1996-01-01 00:00:00')
AND PARSE_UTC_USEC(repository_created_at) < PARSE_UTC_USEC('2015-05-30 00:00:00')
GROUP BY repository_name
LIMIT 100
Add a
HAVING
clause to put a condition on the group:I don't know if the
100
of theLIMIT
is related to the100
maximum commit count. If so, it's not needed. If not (and you actually want to limit the rows to 100) then add an ORDER BY to chose which 100 rows you want (rather than 100 random ones).I also simplified the date range condition to a
BETWEEN
.Although the above is valid SQL (tested OK in MySQL, Postgres, SQLServer and Oracle), in case bigquery can't handle order by
COUNT(*)
(as reported by OP), use a subquery: