In the following link https://github.com/swannodette/enlive-tutorial/blob/master/src/tutorial/scrape1.clj
it shows how to parse the page from a URL, but I need to use a sock5 proxy, and I can't figure out how to use proxy inside enlive, but I know how to use proxy in httpclient, but how to parse the result from httpclient, I have the following code, but the last line show empty result
(:require [clojure.set :as set]
[clj-http.client :as client]
[clj-http.conn-mgr :as conn-mgr]
[clj-time.core :as time]
[jsoup.soup :as soup]
[clj-time.coerce :as tc]
[net.cgrand.enlive-html :as html]
)
(def a (client/get "https://news.ycombinator.com/"
{:connection-manager (conn-mgr/make-socks-proxied-conn-manager "127.0.0.1" 9150)
:socket-timeout 10000 :conn-timeout 10000
:client-params {"http.useragent" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.20 (KHTML, like Gecko) Chrome/11.0.672.2 Safari/534.20"}}))
(def b (html/html-resource a))
(html/select b [:td.title :a])
When using enlive the
html-resourcefn performs a fetch from a URL and then converts it to a data structure it can parse. It seems that when you pass it an already fulfilled request, it just returns back the request instead of throwing an error.Either way, the function you want is
html-snippetand you will want to pass it the body of your request. Like so: