Apache Solr mapping custom JSON can't index nested documents

950 views Asked by At

I have some kind of issue and I can't solve it following the documentation. I have this custom JSON that I can't flatten it because comes from an external source. I'm omitting parts of the code to make it easy to read and understand. This is the document I'm trying to index:

{
"status": "ok",
"message": {
    ... some other fields ...
    "title": "The good and the bad",
    "editor": [
        {"given": "Bianca", "family": "Racioppe", "sequence": "additional", "affiliation": [] },
        {"given": "Virginia", "family": "C\u00e1neva", "sequence": "additional", "affiliation": []}],
    ... other fields ...
}}

This is my schema:

<field name="editors" type="string" multiValued="true" indexed="true" stored="true">
    <field name="given_name" type="string" indexed="true" stored="true"/>
    <field name="family_name" type="string" indexed="true" stored="true"/>
    <field name="affiliation" type="string" multiValued="true" indexed="true" stored="true"/>
    <field name="orcid" type="string" indexed="true" stored="true"/>
</field>
<field name="title" type="string" multiValued="false" indexed="true" stored="true"/>
<field... other fields ... />

I've tried also <field name="editors.given_name".... />

And what I'm mapping is:

curl -X POST "http://localhost:8983/solr/papers/update/json/docs?
split=/message|/message/autor|/message/editor
&f=editors:/message/editor
&f=editors.given_name:/message/editor/given
&f=editors.family_name:/message/editor/family
&f=editors.affiliation:/message/editor/affiliation/name
&f=title:/message/title
&f=.... other fields....
&commit=true" -H 'Content-type:application/json' --data-binary @file.json

The indexing works fine for all the fields except for the "editors" field, nothing happens! What am I doing wrong or missing?

Thanks.

1

There are 1 answers

1
Hammad Shabbir On

For indexing nesting documents in Solr you need to know three basic steps for this

  • Defining a schema
  • Posting nested documents
  • Query nested documents

Step 1

You simply need to define schema same as you did for parents field like this . You can change attributes according to your business requirements i-e type, stored,multivalued etc

  <field name="editor" type="text_general"/>
  <field name="editors" type="string" multiValued="false"/>
  <field name="family" type="string" indexed="true" stored="true"/>
  <field name="given" type="string" indexed="true" stored="true"/>
  <field name="id" type="string" multiValued="false" indexed="true" required="true" stored="true"/>
  <field name="sequence" type="string" indexed="true" stored="true"/>

Step 2

Now for indexing you simply need to post your required json to solr. In this example I have included less fields just for the sake of sample

curl --location --request POST 'http://localhost:8983/solr/stackoverflow_core/update?commit=true&overwrite=true&wt=json' \
--header 'Accept: application/json' \
--header 'Content-Type: application/json' \
--data-raw '[{
  "title": "The good and the bad",
  "editor": [
    {
      "given": "Bianca",
      "family": "Racioppe",
      "sequence": "additional"
    },
    {
      "given": "Virginia",
      "family": "Cu00e1neva",
      "sequence": "additional"
    }
  ]
}]'

Step 3

Now for querying nested documents in solr you need to use block join query parser and child doc transformer. Also you need to know that solr don't create nested structures for storing rather it creates all documents at parent level. We need to create a logical relationship between parent and child documents and get documents from solr with the help of child doc transformer

https://lucene.apache.org/solr/guide/8_0/searching-nested-documents.html