Elasticsearch sorting only by alphabetical not by numeric

726 views Asked by At

I am having problem with sorting in PHP, here is my mapping:

{
  "jdbc": {
    "mappings": {
      "jdbc": {
        "properties": { 
          "admitted_date": {
            "type": "date",
            "format": "dateOptionalTime"
          },
          "et_tax": {
            "type": "string"
          },  
          "jt_tax": {
            "type": "string"
          }, 
          "loc_cityname": {
            "type": "string"
          }, 
          "location_countryname": {
            "type": "string"
          },
          "location_primary": {
            "type": "string"
          },  
          "pd_firstName": {
            "type": "string"
          } 
        }
      }
    }
  }
}

When I use order the result by sort, it will order the results with alphanumeric, it will load the results with numeric as first. I need to order the results only starting letter alphabets. Now it orders like this:

http://localhost:9200/jdbc/_search?pretty=true&sort=pd_lawFirmName:asc

  1. BM&A
  2. Gomez-Acebo & Pombo
  3. Addleshaw Goddard

How to order the results like this?

  1. Addleshaw Goddard
  2. BM&A
  3. Gomez-Acebo & Pombo

Here is my query i using for indexing

{
    "type" : "jdbc",
    "jdbc" : {
        "driver" : "com.mysql.jdbc.Driver",
        "url" : "jdbc:mysql://localhost:3306/dbname",
        "user" : "user",
        "password" : "pass",
        "sql" : "SQL QUERY",
        "poll" : "24h",
        "strategy" : "simple", 
        "scale" : 0,
        "autocommit" : true,
        "bulk_size" : 5000,
        "max_bulk_requests" : 30,
        "bulk_flush_interval" : "5s",
        "fetchsize" : 100,
        "max_rows" : 149669,
        "max_retries" : 3,
        "max_retries_wait" : "10s",
        "locale" : "in",
        "digesting" : true,
        "mappings": {
        "sorting": {
        "properties": { 
        "pd_lawFirmName": {
        "type": "string",
        "fields": {
          "raw": {
            "type": "string",
            "index": "not_analyzed"
          }
        }
      }
      }
    }
  }
  }
}
1

There are 1 answers

6
Andrei Stefan On

This is like that because Elasticsearch will tokenize the text using the default analyzer, which is standard. For example, McDermott Will Amery is indexed like:

              "amery",
              "mcdermott",
              "will"

If you want to sort like that, I would suggest to change the mapping of your pd_lawFirmName in something like this:

  "pd_lawFirmName": {
    "type": "string",
    "fields": {
      "raw": {
        "type": "string",
        "index": "not_analyzed"
      }
    }
  }

and sort by the raw subfield:

http://localhost:9200/jdbc/_search?pretty=true&sort=pd_lawFirmName.raw:asc