Using web harvest on Android

369 views Asked by At

I'm building a mobile app that is using the web harvest api to extract data from a web site and store it in a file. Then the app will use the data to manipulate it and show it. My problem is that when using webharvest for java, the config file's and the output file's paths are relative to the local disk, like "C:/config.xml" and "C:/docs", but when using that on an Android project on Eclipse, the config file should be in the project and the output file should be in the project or the cache. Can anybody tell me what should I put the path to read the web harvest config, and the path to write the output xml file?

1

There are 1 answers

0
AlvaroSantisteban On

I have the same issue and sadly I'm not being able to make it work. I thought about the following solution, but it throws an exception.

InputStream in_s = context.getResources().openRawResource(R.raw.webharvestconfig);
InputSource inputSource = new InputSource(in_s);
ScraperConfiguration config = new ScraperConfiguration(inputSource);

The exception:

04-25 16:47:26.835: W/System.err(1057): org.webharvest.exception.ParserException: asset
04-25 16:47:26.835: W/System.err(1057): at org.webharvest.definition.XmlParser.parse(Unknown Source)
04-25 16:47:26.846: W/System.err(1057): at org.webharvest.definition.XmlNode.getInstance(Unknown Source)
04-25 16:47:26.846: W/System.err(1057): at org.webharvest.definition.ScraperConfiguration.createFromInputStream(Unknown Source)
04-25 16:47:26.846: W/System.err(1057): at org.webharvest.definition.ScraperConfiguration.<init>(Unknown Source)

I also tried writing directly the xml string into the InputStream like this:

InputStream in = new ByteArrayInputStream("<?xml version=\"1.0\" encoding=\"UTF-8\"?><config charset=\"UTF-8\"><html-to-xml> <http url=\"http://www.google.com\"/> </html-to-xml></config>".getBytes());

But without better luck (although the exception was different).

I don't know, in theory, it should work. While researching for the exception that I just showed, I even found the following example which uses the ScraperConfiguration the same way I do but...

Just for the shake of completeness and to give more information, I provide the source code from the ScrapeConfiguration class.

If I'm able to make it work, I will edit this post.