Proper Formatting of Data for Watson Retrieve & Rank

43 views Asked by At

Thanks for your time.

I would like to know your thoughts on the best practice for formatting specific data to be uploaded to Watson Retrieve and Rank.

I am building a service for answering questions about municipal laws and ordinances to help educate newly elected officials in resource/network poor rural areas.

Here is the conundrum I am facing:

Let's say there are 200 towns in the region that I am servicing. Each town has similar but different sets of ordinances and regulations. Everyone who poses a question to the system will pose 'relatively' similar questions in terms of what they are trying to accomplish. However, the answer will greatly differ depending on the town.

I.E. Zoning regulations will be similar across towns but retrieve the wrong town's ordinance will be completely useless, despite being fairly close.

"What is the setback ordinance for Smallville?" might pull up any towns setback ordinance or something related just to Smallville but not their setback ordinance.

I have all the documents detailing the ordinances and regs needed. I'm just looking for some advice on how to structure it to ensure people get the accurate data.

Should I create a separate cluster for each individual town's set of documents? Should I put everything in one and just rigorously train to refine the accuracy or is there another path I haven't thought of.

Thanks again,

Matt

1

There are 1 answers

1
Sayuri Mizuguchi On BEST ANSWER

It's just a little help to you find your solution.

Suppond that you have many questions mapped to a single answering document suggests that the use case here might be a good fit for the Natural Language Classifier or some combination of NLC and Retrieve-and-Rank (RnR).

I really recommend to you take a look at this Articles inside the Medium:

  • Part I - Developing with IBM Watson Retrieve and Rank: Solr Configuration

  • Part II - Developing with IBM Watson Retrieve and Rank: Training and Evaluation

  • Part III - Developing with IBM Watson Retrieve and Rank: Custom Features (Important for your question).

Links for reference:

  • See the Official documentation about Preparing Training Data in RnR
  • See the Official documentation for using NLC.