I'm a total newbie at HTMLUnit trying to scrape a vbulletin web forum. I'm having trouble getting it to enter the user/pass and actually login.
Here's my code so far:
package scraper;
import java.io.IOException;
import java.net.UnknownHostException;
import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
public class Scraper {
public static void main(String[] args) {
try {
Scraper ocau = new Scraper("http://forums.overclockers.com.au/forumdisplay.php?f=15&order=desc");
} catch (UnknownHostException e) {
e.printStackTrace();
}
}
public Scraper(String url) throws UnknownHostException {
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_24);
webClient.getOptions().setJavaScriptEnabled(false);
webClient.getOptions().setCssEnabled(false);
HtmlPage page;
try {
page = webClient.getPage(url);
HtmlForm login = page.getForms().get(0);
System.out.println(login);
} catch (FailingHttpStatusCodeException | IOException e) {
e.printStackTrace();
}
webClient.closeAllWindows();
}
}
The output of this is just the login form (I think):
HtmlForm[<form action="login.php?do=login" method="post" onsubmit="md5hash(vb_login_password, vb_login_md5password, vb_login_md5password_utf, 0)">]
The script/form on the page:
<script type="text/javascript" src="clientscript/vbulletin_md5.js?v=384"></script>
<form action="login.php?do=login" method="post" onsubmit="md5hash(vb_login_password, vb_login_md5password, vb_login_md5password_utf, 0)">
<input type="hidden" name="do" value="login" />
<input type="hidden" name="url" value="/forumdisplay.php?f=15&order=desc" />
<input type="hidden" name="vb_login_md5password" />
<input type="hidden" name="vb_login_md5password_utf" />
<input type="hidden" name="s" value="" />
<input type="hidden" name="securitytoken" value="guest" />
I'm not too sure where to go from here to actually enter the username/password and click submit. I read this answer that said that I need to set vb_login_md5password
and vb_login_md5password_utf
, which are hidden inputs on the page, but I have no idea how to reference or set these. There is a javascript md5 script referenced in the html at src="clientscript/vbulletin_md5.js?v=384"
.
Any help would be greatly appreciated.
Edit: Thanks to arya, it is now working, I had to use this code to log in and print the page:
((HtmlElement) page.getFirstByXPath("//fieldset/table/tbody/tr/td/input")).type("secretusername");
((HtmlElement) page.getFirstByXPath("//fieldset/table/tbody/tr[2]/td/input")).type("secretpassword");
HtmlPage loggedin = ((HtmlElement) page.getFirstByXPath("//tr[4]/td/input")).click();
System.out.println(loggedin.asXml());
Try inputting the values with xpath and see if that works.
If the solution above does not work, you would have to capture the traffic with something like Fiddler and emulate it with HTMLUnit, let me know if it does not work so I can edit my answer.