Can Nokogiri interpret javascript? - Web Scraping

5k views Asked by At

We are trying to scrape the availabilities on this page: http://www.equityapartments.com/new-york/new-york-city-apartments/midtown-west/mantena-apartments.aspx

I need to use my spider to select on the "All Floorplans" and fetch all the availabilities. But the data are actually sent through Javascript request I believe. Is there a way for my Nokogiri spider to render it? Or maybe simulate the process of clicking on buttons?

2

There are 2 answers

1
Maxim On

Nokogiri is just a parser. It also allows to search content.

To interact with web pages you need to use something else, e.g. Watir and PhantomJS.

Combining them all together:

browser = Watir::Browser.new(:phantomjs)

browser.goto(your_url_above)
browser.link(text: 'All floorplans').click

document = Nokogiri::HTML(browser.html)
document.search(...)
2
Milind On

Yes, you can do it if the Floorplans have an id/class. You can get those from your page.

You will be needing firepath to help you get the XPath of the elements and then you can iterate them using it. For example, recently I worked on webpagescraper to scrape HTML from fundly.com.

To get all titles, as all titles elements in the HTML had the same class, I was able to get EVERY title on https://fundly.com/search/%60 using that XPath with the class name like:

require 'rubygems'
require 'nokogiri'
require 'open-uri'

doc.search('h4.f-width-100').each do |title|
   @campaign_titles <<  title.text
end  

Please refer to my above project if you need any more assistance to grab the values from any website.