How to obtain java repositories having maximum number of stars in GitHub-Archive

134 views Asked by At

I am currently trying to obtain the top 100 java repositories having maximum number of stars and less than 100 commits using GitHub Archive and BigQuery. Could you please help to come up with a query for obtaining the top 100 repositories having maximum number of stars.

The final query that I have obtained is:

SELECT repository_name
FROM [githubarchive:github.timeline]
WHERE repository_language = 'Java' 
AND PARSE_UTC_USEC(repository_created_at) BETWEEN PARSE_UTC_USEC('1996-01-01 00:00:00') AND PARSE_UTC_USEC('2015-05-30 00:00:00') 
GROUP BY repository_name
HAVING COUNT(*) < 100 
ORDER BY COUNT(*) DESC 
LIMIT 100
1

There are 1 answers

2
Matthew Wesley On

I think this query will work for you. Your existing query would not run because the ORDER BY clause references an aggregate calculation. ORDER BY requires the expression to refer to a field. Moving the COUNT into the SELECT clause fixes that part.

Furthermore, if you're looking for counts of git commits, you should actually check that the timeline event is a commit by adding AND payload_commit IS NOT NULL to the WHERE clause!

SELECT
  repository_name,
  COUNT(1) AS CommitCount
FROM
  [githubarchive:github.timeline]
WHERE
  repository_language = 'Java'
  AND PARSE_UTC_USEC(repository_created_at)
    BETWEEN PARSE_UTC_USEC('1996-01-01 00:00:00')
    AND PARSE_UTC_USEC('2015-05-30 00:00:00')
AND payload_commit IS NOT NULL
GROUP BY
  repository_name
HAVING
  CommitCount < 100
ORDER BY
  CommitCount DESC
LIMIT
  100