URL encoding issues with curly braces

Question

URL encoding issues with curly braces

606 views Asked by Andy Kwong At 11 June 2015 at 19:37

I'm having issues getting data from GitHub Archive.

The main issue is my problem with encoding {} and .. in my URL. Maybe I am misreading the Github API or not understanding encoding correctly.

require 'open-uri'
require 'faraday'

conn = Faraday.new(:url => 'http://data.githubarchive.org/') do |faraday|
  faraday.request  :url_encoded             # form-encode POST params
  faraday.response :logger                  # log requests to STDOUT
  faraday.adapter  Faraday.default_adapter  # make requests with Net::HTTP
end

#query = '2015-01-01-15.json.gz' #this one works!!
query = '2015-01-01-{0..23}.json.gz' #this one doesn't work
encoded_query = URI.encode(query)

response = conn.get(encoded_query)
p response.body

Original Q&A

There are 2 answers

seane On 11 June 2015 at 20:40

To get a better idea of what's going wrong, let's start with the example given in the GitHub documentation:

wget http://data.githubarchive.org/2015-01-01-{0..23}.json.gz

The thing to note here is that {0..23} is automagically getting expanded by bash. You can see this by running the following command:

echo {0..23}
> 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

This means wget doesn't get called just once, but instead gets called a total of 24 times. The problem you're having is that Ruby doesn't automagically expand {0..23} like bash does, and instead you're making a literal call to http://data.githubarchive.org/2015-01-01-{0..23}.json.gz, which doesn't exist.

Instead you will need to loop through 0..23 yourself and make a single call every time:

(0..23).each do |n|
  query = "2015-01-01-#{n}.json.gz"
  encoded_query = URI.encode(query)
  response = conn.get(encoded_query)
  p response.body
end

**the Tin Man** · Accepted Answer · 2015-06-11T20:39:10+00:00

The GitHub Archive example for retrieving a range of files is:

wget http://data.githubarchive.org/2015-01-01-{0..23}.json.gz

The {0..23} part is being interpreted by wget itself as a range of 0 .. 23. You can test this by executing that command with the -v flag which returns:

wget -v http://data.githubarchive.org/2015-01-01-{0..1}.json.gz
--2015-06-11 13:31:07--  http://data.githubarchive.org/2015-01-01-0.json.gz
Resolving data.githubarchive.org... 74.125.25.128, 2607:f8b0:400e:c03::80
Connecting to data.githubarchive.org|74.125.25.128|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2615399 (2.5M) [application/x-gzip]
Saving to: '2015-01-01-0.json.gz'

2015-01-01-0.json.gz                                        100%[===========================================================================================================================================>]   2.49M  3.03MB/s   in 0.8s

2015-06-11 13:31:09 (3.03 MB/s) - '2015-01-01-0.json.gz' saved [2615399/2615399]

--2015-06-11 13:31:09--  http://data.githubarchive.org/2015-01-01-1.json.gz
Reusing existing connection to data.githubarchive.org:80.
HTTP request sent, awaiting response... 200 OK
Length: 2535599 (2.4M) [application/x-gzip]
Saving to: '2015-01-01-1.json.gz'

2015-01-01-1.json.gz                                        100%[===========================================================================================================================================>]   2.42M   867KB/s   in 2.9s

2015-06-11 13:31:11 (867 KB/s) - '2015-01-01-1.json.gz' saved [2535599/2535599]

FINISHED --2015-06-11 13:31:11--
Total wall clock time: 4.3s
Downloaded: 2 files, 4.9M in 3.7s (1.33 MB/s)

In other words, wget is substituting values into the URL and then getting that new URL. This isn't obvious behavior, nor is it well documented, but you can find mention of it "out there". For instance in "All the Wget Commands You Should Know":

7. Download a list of sequentially numbered files from a server
wget http://example.com/images/{1..20}.jpg

To do what you want, you need to iterate over the range in Ruby using something like this untested code:

0.upto(23) do |i|
  response = conn.get("/2015-01-01-#{ i }.json.gz")
  p response.body
end

TechQA.

URL encoding issues with curly braces

There are 2 answers

Related Questions in RUBY

Related Questions in GITHUB-API

Related Questions in URL-ENCODING

Related Questions in FARADAY

Popular Questions

Popular Tags

Trending Questions