In Ruby/Rails, how can I encode/escape special characters in URLs?

56.3k views Asked by At

How do I encode or 'escape' the URL before I use OpenURI to open(url)?

We're using OpenURI to open a remote url and return the xml:

getresult = open(url).read

The problem is the URL contains some user-input text that contains spaces and other characters, including "+", "&", "?", etc. potentially, so we need to safely escape the URL. I saw lots of examples when using Net::HTTP, but have not found any for OpenURI.

We also need to be able to un-escape a similar string we receive in a session variable, so we need the reciprocal function.

4

There are 4 answers

1
the Tin Man On BEST ANSWER

Ruby has the built-in URI library, and the Addressable gem, in particular Addressable::URI

I prefer Addressable::URI. It's very full featured and handles the encoding for you when you use the query_values= method.

I've seen some discussions about URI going through some growing pains so I tend to leave it alone for handling encoding/escaping until these things get sorted out:

3
Jacob On

Ruby Standard Library to the rescue:

require 'uri'
user_text = URI.escape(user_text)
url = "http://example.com/#{user_text}"
result = open(url).read

See more at the docs for the URI::Escape module. It also has a method to do the inverse (unescape)

4
Arsen7 On

The main thing you have to consider is that you have to escape the keys and values separately before you compose the full URL.

All the methods which get the full URL and try to escape it afterwards are broken, because they cannot tell whether any & or = character was supposed to be a separator, or maybe a part of the value (or part of the key).

The CGI library seems to do a good job, except for the space character, which was traditionally encoded as +, and nowadays should be encoded as %20. But this is an easy fix.

Please, consider the following:

require 'cgi'

def encode_component(s)
  # The space-encoding is a problem:
  CGI.escape(s).gsub('+','%20')
end

def url_with_params(path, args = {})
  return path if args.empty?
  path + "?" + args.map do |k,v|
    "#{encode_component(k.to_s)}=#{encode_component(v.to_s)}" 
  end.join("&")
end

def params_from_url(url)
  path,query = url.split('?',2)
  return [path,{}] unless query
  q = query.split('&').inject({}) do |memo,p|
    k,v = p.split('=',2)
    memo[CGI.unescape(k)] = CGI.unescape(v)
    memo
  end
  return [path, q]
end

u = url_with_params( "http://example.com",
                            "x[1]"  => "& ?=/",
                            "2+2=4" => "true" )

# "http://example.com?x%5B1%5D=%26%20%3F%3D%2F&2%2B2%3D4=true"

params_from_url(u)
# ["http://example.com", {"x[1]"=>"& ?=/", "2+2=4"=>"true"}]
2
Ernest On

Don't use URI.escape as it has been deprecated in 1.9.

Rails' Active Support adds Hash#to_query:

 {foo: 'asd asdf', bar: '"<#$dfs'}.to_query
 # => "bar=%22%3C%23%24dfs&foo=asd+asdf"

Also, as you can see it tries to order query parameters always the same way, which is good for HTTP caching.