there is a site that uses strapdown.js that I am trying to mirror using httrack or wget, but I fall short, because the site contains markdown and not HTML. Only strapdown converts the links to html links. Hence the client needs to interpret Javascript first and then search for links in the generated dom.
Is there a tool in the market that is able to do this?
I have tried
wget -erobots=off --no-parent --wait=3 --limit-rate=20K -r -p -U "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)" -A htm,html,css,js,json,gif,jpeg,jpg,bmp http://my.si.te
and
httrack -w -v --extended-parsing=N -n -t -r -p -U "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)" --robots=0 http://my.si.te "+*" "-r6"
Any help is highly appreciated.
If you are comfortable in Java to write your client, I have used HTMLUnit.
A stripped down example to fetch a page with Javascript would look like the following. It's adapted from an actual script I use to scrape one of the sites I administer. I've used the strapdownjs.com as the example. You'll have to ignore the css warnings if you run it, but you'll notice it finds and outputs the link to bootswatch.com, generated by javascript from the markdown in the page source. You might prefer the tool's own Getting started page.