Using field instead of "_id" for more-like-this query

353 views Asked by At

I have a slug field that I want to use to identify object to use as a reference instead of "_id" field. But instead of using it as a reference, doc seems to use it as query to comapre against. Since slug is a unique field with a simple analyzer, it just returns exactly one result like the following. As far as I know, there is no way to use a custom field as _id field: https://github.com/elastic/elasticsearch/issues/6730

So is double look up, finding out elasticsearch's id first then doing more_like_this the only way to achieve what I am looking for? Someone seems to have asked a similar question three years ago, but it doesn't have an answer.

ArticleDocument.search().query("bool",
                                should=Q("more_like_this",
                                        fields= ["slug", "text"],
                                        like={"doc": {"slug": "OEXxySDEPWaUfgTT54QvBg",
                                        }, "_index":"article", "_type":"doc"},
                                        min_doc_freq=1,
                                        min_term_freq=1
                                        )
                                ).to_queryset()

Returns:

<ArticleQuerySet [<Article: OEXxySDEPWaUfgTT54QvBg)>]>
1

There are 1 answers

0
dejanmarich On

You can make some of your documents field as "default" _id while ingesting data.

Logstash

output {
    elasticsearch {
        hosts => ["localhost:9200"]
        index => "my_name"
        document_id => "%{some_field_id}"
    }
}

Spark (Scala)

DF.saveToEs("index_name" + "/some_type", Map("es.mapping.id" -> "some_field_id"))

Index API

PUT twitter/_doc/1
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}

{
    "_shards" : {
        "total" : 2,
        "failed" : 0,
        "successful" : 2
    },
    "_index" : "twitter",
    "_type" : "_doc",
    "_id" : "1",
    "_version" : 1,
    "_seq_no" : 0,
    "_primary_term" : 1,
    "result" : "created"
}