Nokogiri - find the value inside a javascript array

2.4k views Asked by At

I'm trying to scrap something using nokogiri, I want to get the value inside JavaScript array, like the value of 'b' in this code.

<script>
     var foo = [bar, [a, b, c , d], value, some value, . . ]
</script>

I got the script block by using doc.search("script")[18].content, How can I get the value of 'b' here?

2

There are 2 answers

0
the Tin Man On BEST ANSWER

You can do this pretty easily:

require 'nokogiri'

doc = Nokogiri::HTML('<script>
     var foo = [bar, [a, b, c , d], value, some value, . . ]
</script>
')

js = doc.at('script').text
right_side = js.split('=', 2).last
b = right_side.split(',')[2]
b # => " b"

Testing with a real value:

require 'nokogiri'

doc = Nokogiri::HTML('<script>
     var foo = [bar, [a, 123, c , d], value, some value, . . ]
</script>
')

js = doc.at('script').text
right_side = js.split('=', 2).last
b = right_side.split(',')[2]
b # => " 123"
b.to_i # => 123

The downside is it's susceptible to changes in the JavaScript string formatting, which makes it fragile. You get to decide whether you want to go down that path.

Remember, all content in HTML source is a string, so you can tear things up using normal string processing once you've narrowed down what you want to look at.

1
Chase On

So first install gem rkelly-remix, rkelly seems abandoned and the remix does es6(sweet).

Require 'rkelly' and instantiate a parser parser = RKelly::Parser.new

Then grab the script as you are with something like:

doc = '<script> var foo = [bar, [a, b, c , d], 1, 2, 3, 4] </script>'
d = Nokogiri::HTML doc
js = d.search('script').text

Next parse that with Rkelly-remix.

ast = parser.parse(js)

Then you can iterate over then nodes and play with their values. You example seems a bit incomplete, so I can't offer much more than this. If you want to interrogate b any further you'll need more of the js that sets the value. From here you can use execjs or the ruby racer to eval the js if you want.

Hope this helps!