I'm having issues getting data from GitHub Archive.
The main issue is my problem with encoding {}
and ..
in my URL. Maybe I am misreading the Github API or not understanding encoding correctly.
require 'open-uri'
require 'faraday'
conn = Faraday.new(:url => 'http://data.githubarchive.org/') do |faraday|
faraday.request :url_encoded # form-encode POST params
faraday.response :logger # log requests to STDOUT
faraday.adapter Faraday.default_adapter # make requests with Net::HTTP
end
#query = '2015-01-01-15.json.gz' #this one works!!
query = '2015-01-01-{0..23}.json.gz' #this one doesn't work
encoded_query = URI.encode(query)
response = conn.get(encoded_query)
p response.body
The GitHub Archive example for retrieving a range of files is:
The
{0..23}
part is being interpreted by wget itself as a range of 0 .. 23. You can test this by executing that command with the-v
flag which returns:In other words, wget is substituting values into the URL and then getting that new URL. This isn't obvious behavior, nor is it well documented, but you can find mention of it "out there". For instance in "All the Wget Commands You Should Know":
To do what you want, you need to iterate over the range in Ruby using something like this untested code: