Elasticsearch processor for shingles similar to split?

94 views Asked by At

Is there a processor that will do shingles or can I make a custom one somehow?

In the pipeline processor below, I split on the space character, but I'd also like to combine words like a shingle analyzer would:

PUT _ingest/pipeline/split
{
  "processors": [
    {
      "split": {
        "field": "title",
        "target_field": "title_suggest.input",
        "separator": "\\s+"
      }
    }
  ]
}

Example:

"Senior Business Developer" needs a suggestion field with these terms.

  1. Senior Business Developer
  2. Business Developer
  3. Developer

Here are the links to the article and answer that inspired this question:

  1. https://blog.mimacom.com/autocomplete-elasticsearch-part3/
  2. How to combine completion, suggestion and match phrase across multiple text fields?
1

There are 1 answers

0
Mark Petersen On BEST ANSWER

Here is one solution I came up with using a custom script:

PUT _ingest/pipeline/shingle
{
  "description" : "Create basic shingles from title field and input in another field title_suggest",
  "processors" : [
    {
      "script": {
        "lang": "painless",
        "source": """
              String[] split(String s, char d) {                                   
                int count = 0;
            
                for (char c : s.toCharArray()) {                                 
                    if (c == d) {
                        ++count;
                    }
                }
            
                if (count == 0) {
                    return new String[] {s};                                     
                }
            
                String[] r = new String[count + 1];                              
                int i0 = 0, i1 = 0;
                count = 0;
            
                for (char c : s.toCharArray()) {                                 
                    if (c == d) {
                        r[count++] = s.substring(i0, i1);
                        i0 = i1 + 1;
                    }
            
                    ++i1;
                }
            
                r[count] = s.substring(i0, i1);                                  
            
                return r;
              }
              
              if (!ctx.containsKey('title')) { return; }
              def title_words = split(ctx['title'], (char)' ');
              def title_suggest = [];
              for (def i = 0; i < title_words.length; i++) {
                def shingle = title_words[i];
                title_suggest.add(shingle);
                for (def j = i + 1; j < title_words.length; j++) {
                  shingle = shingle + ' ' + title_words[j];
                  title_suggest.add(shingle);
                }
              }
              ctx['title_suggest'] = title_suggest;
              
            """
      }
    }
  ]
}