Watson Discovery News Java API to fetch Top Stories

283 views Asked by At

I am trying to develop a Java program to query on Watson Discovery news. My goal is to filter Top stories for a specified date range. Below is the query I am using in the program. Is there any API so that I can get filtered value on Top story headings and corresponding site link?

{
  "query": "\"IBM\",language:(english|en)",
  "filter": "crawl_date>2017-06-26T12:00:00-0400,crawl_date<2017-08-26T12:00:00-0400",
  "count": 5,
  "return": "title,url,host,crawl_date"
}

Thanks in advance.

1

There are 1 answers

0
Leonardo Kenji Shikida On

yes, it's possible and your code will look like this

import com.ibm.watson.developer_cloud.discovery.v1.Discovery;
import com.ibm.watson.developer_cloud.discovery.v1.model.QueryOptions;
import com.ibm.watson.developer_cloud.discovery.v1.model.QueryResponse;

public class DiscoveryNewsDemo {

    public static void main(String[] args) {
        Discovery discovery = new Discovery("2017-09-01");
        discovery.setEndPoint("https://gateway.watsonplatform.net/discovery/api/");
        discovery.setUsernameAndPassword("<username>", "<password>"); //replace with the appropriate values here

        String environmentId = "system";
        String collectionId = "news";
        QueryOptions queryOptions = new QueryOptions.Builder(environmentId, collectionId)
                .query("IBM,language:(english|en)")
                .filter("crawl_date>2017-06-26T12:00:00-0400,crawl_date<2017-08-26T12:00:00-0400")
                .count(5)
                .addReturnField("title")
                .addReturnField("url")
                .addReturnField("host")
                .addReturnField("crawl_date")
                .build();
        QueryResponse queryResponse = discovery.query(queryOptions).execute();

        System.out.println(queryResponse);
    }

}

However, please notice that Discovery News seems to keep only the last 90 days data, so most probably you'll have to replace

.filter("crawl_date>2017-06-26T12:00:00-0400,crawl_date<2017-08-26T12:00:00-0400")

with something more recent (I am writing this answer in Nov 27th 2017) such as

.filter("crawl_date>2017-10-26T12:00:00-0400,crawl_date<2017-11-26T12:00:00-0400")

If you're using maven, you need to add this dependency

...
  <dependencies>
    <dependency>
        <groupId>com.ibm.watson.developer_cloud</groupId>
        <artifactId>discovery</artifactId>
        <version>4.0.0</version>
    </dependency>
  </dependencies>
...

Unfortunately, Discovery News documentation does not explicitly state HOW you can access the default system news data repository, the trick here is to use these values

...
            String environmentId = "system";
            String collectionId = "news";
            QueryOptions queryOptions = new QueryOptions.Builder(environmentId, collectionId)
...

It will return results like this

Nov 27, 2017 8:29:04 AM okhttp3.internal.platform.Platform log
INFO: --> GET https://gateway.watsonplatform.net/discovery/api/v1/environments/system/collections/news/query?version=2017-09-01&filter=crawl_date%3E2017-10-26T12:00:00-0400,crawl_date%3C2017-11-26T12:00:00-0400&query=IBM,language:(english%7Cen)&count=5&return=title,url,host,crawl_date http/1.1
Nov 27, 2017 8:29:06 AM okhttp3.internal.platform.Platform log
INFO: <-- 200 OK https://gateway.watsonplatform.net/discovery/api/v1/environments/system/collections/news/query?version=2017-09-01&filter=crawl_date%3E2017-10-26T12:00:00-0400,crawl_date%3C2017-11-26T12:00:00-0400&query=IBM,language:(english%7Cen)&count=5&return=title,url,host,crawl_date (1912ms, unknown-length body)
{
  "matching_results": 19922,
  "results": [
    {
      "score": 3.2059636,
      "host": "mcts-mcitp.com",
      "crawl_date": "2017-11-18T06:41:05Z",
      "id": "txAkTtulpLhTd6FqrCCqWhiXfaQFLOz8SQBH7DiC2FqGwOIDPl4udzDBV0_6p0xK",
      "title": "C9020-667 IBM New Workloads Sales V1",
      "url": "http://www.mcts-mcitp.com/2017/11/17/c9020-667-ibm-new-workloads-sales-v1/"
    },
    {
      "score": 3.1392605,
      "host": "developer.com",
      "crawl_date": "2017-11-03T00:38:21Z",
      "id": "h1kKQlSfPKM3guslW9wfVJyyvWoTDv8UHdOksDaqCEC3CV3Ya8sWl6JXDgn02yMN",
      "title": "IBM Renames Its Cloud Computing Service",
      "url": "https://www.developer.com/daily_news/ibm-renames-its-cloud-computing-service.html"
    },
    {
      "score": 3.0926523,
      "host": "prominic.net",
      "crawl_date": "2017-10-30T22:08:03Z",
      "id": "S94YGfvGGL3K84j2i75GmvcSfxJRu9b7CS73vSncmYr3gHBObjbLF7rziVtdOkJn",
      "title": "Prominic.NET Offers Free Hosting to IBM Champions - Prominic.NET",
      "url": "https://prominic.net/2017/05/08/prominic-net-offers-free-hosting-ibm-champions/"
    },
    {
      "score": 3.0866172,
      "host": "mcts-mcitp.com",
      "crawl_date": "2017-11-16T00:32:11Z",
      "id": "KG6QUpkwnv02cD0oi2EN_dSCpcpbfi3JtuIjULVER1cWtxVZ77bV8vWkaHY5atsW",
      "title": "C9020-662 IBM Virtualized Storage V1",
      "url": "http://www.mcts-mcitp.com/2017/11/15/c9020-662-ibm-virtualized-storage-v1/"
    },
    {
      "score": 3.0718818,
      "host": "techrepublic.com",
      "crawl_date": "2017-11-10T15:42:00Z",
      "id": "qgFND-rmtO0F5xkZLqbam-Di2vdjQ2iIkeTR8DqR3vNgturI2AuPJn6sDI_mz_Ct",
      "title": "Conjure VR objects with your voice, using new IBM Watson service - Video | ZDNet",
      "url": "http://www.techrepublic.com/videos/conjure-vr-objects-with-your-voice-using-new-ibm-watson-service/"
    }
  ]
}