How to auto submit forms using Java headless browser HtmlUnit

Question

How to auto submit forms using Java headless browser HtmlUnit

106 views Asked by Munny At 07 November 2023 at 23:57

I'm quite new to HtmlUnit but what I'm trying to do here is as follows

we have a crystal server where we need to call to fetch reports

we are using Restful APIs that are exposed from crystal server to fetch reports

In this process of fetching document crustal don't have a Direct API to fetch the reports

So we got a final link from one of the API endpoint and by opening that link in the regular browser it loads the pdf document after roughly three different redirects

so I'm trying to achieve this browser behavior inside java using HtmlUnit library

try (final WebClient webClient = new WebClient()) {
    webClient.getOptions().setJavaScriptEnabled(true);
    webClient.getOptions().setThrowExceptionOnScriptError(false);
    webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
    webClient.getOptions().setRedirectEnabled(true);
    htmlPage = webClient.getPage(linkString);
}

here until this I'm getting to second redirect but not to the document itself.

any suggestions on how to archive final page which is document?

Do I need the capture the end result and perform the third call again using new webclient or is there any easy way to achieve the end page

Original Q&A

There are 2 answers

happyDayJeffrey On 08 November 2023 at 01:34

I don't know the exactly meaning of getPage(), get a list of DOM which you can query of modify? or get the PDF document data? Different result decides different ways to handle it.

If I have the problem like yours, I will do that:

1.Find the final path through repeatedly redirecting.

2.Use Http tools to call the path with correct request method.

3.Get the data from http body(maybe Blog, JSON ,etc).

4.Convert the data to PDF file by some opensource library like Apache PDFBox ,etc.

Then you get what you want.

**RBRi** · Accepted Answer · 2023-11-08T06:09:25+00:00

There might be several reasons for all that. One is the way the redirect is done - by HttpHeader or by js magic. Both is supported but if the redirect is done by js sometimes a bit more code is required.

And second, the browser handling of non-html responses is looking easy if you are a real person in front of your real browsers but for headless browsers the handling is not that simple (see https://www.htmlunit.org/filedownload-howto.html for details how HtmlUnit tries to do that).

What you can do:

At first try to understand what page you reach with your current code / check the page type HtmlPage or UnexpectedPage. If you got an HtmlPage use asXml() to get an idea what you really got and try to understand how browsers moving on from there.

Next thing to check is the number of windows you got - maybe the download opens a new window containing the content (again see https://www.htmlunit.org/filedownload-howto.html). You can ask the webClient for the list of windows and check before/after.

And finally feel free to open an issue at github and i will try to help with more details.

TechQA.

How to auto submit forms using Java headless browser HtmlUnit

There are 2 answers

Related Questions in JAVA

Related Questions in CRYSTAL-REPORTS

Related Questions in HTMLUNIT

Related Questions in HEADLESS-BROWSER

Related Questions in OPENDOCUMENT

Popular Questions

Trending Questions