Watson Content Analytics: How to make web crawler plug-in to get data, sending POST request?

522 views Asked by At

I have WCA 3.5.0 server and I need to get documents from the site, using web crawler. The problem is in the fact that I have to send a POST request to the site to get some data (initialy my site consists only of a form with some fields and submit button to send the request to the server). So, my POST request body should be something like that:

{"DateFrom":"2000-01-01T00:00:00","DateTo":"2030-01-01T23:59:59","Bundles":[{"Name":"the test name that i passed","Type":-1}],"Company":[],"Transaction":[],"Text":""}

I was thinking about making a a prefetch plugin for a web crawler. But from the documentation I've found it looks like it is hardly possible:

"The first element ([0]) in the argument array that is passed to your plug-in is an object of type PrefetchPluginArg1, which is an interface that extends the interface PrefetchPluginArg. This is the only argument and the only argument type that is passed to the prefetch plug-in."

PrefetchPluginArg1 class has only getHTTPHeader(), setHTTPHeader(), getURL(), setURL(), doFetch(), setFetch(), where:

  • The getHTTPHeader method returns a String that contains the all of the content of the HTTP request header that the crawler sends so that the crawler can download the document.
  • The getURL method returns the URL (in String form) of a document that the crawler downloads. You can use this URL to decide if the document requires additional information in the request header, such as a cookie.

And it looks like there is no way to change request body.

So, is it realy possible to control POST request body, but not only header, and if it is so, can you please, share some information about the ways of solving this task?

0

There are 0 answers