Is it possible to transform the data before retrieving from mongoDB?

311 views Asked by At

Let's say I have a collection with only one field BlogText. When a user searches for a word and if that word is present in BlogText, I want to:

  1. Retrieve only 10 words before the matched word and 10 words after the matched query, led and followed by an ellipsis.
  2. Also, I want to replace Matched word by <b>Matched word</b>

For example, if the searched query is 1500, I want to retrieve following:

... has been the industry's standard dummy text ever since the <b>1500<b>s, when an unknown printer took a galley of type and ...

given that original text in BlogText is:

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.

I know this can by done on the server as well, but I want to avoid retrieving data that I don't need(referring to 1st point).

1

There are 1 answers

1
Alex Blex On

You can return a substring of a long text using aggregation.

Assuming you need a substring around first occurrence of the matched term, and a space is used as a word delimiter, the pipeline can be like this:

db.collection.aggregate([
    { $match: { BlogText:/1500/ } },
    { $project: {
        match: {
            $let: {
                vars: { pos: { $indexOfCP: [ "$BlogText", "1500" ] }},
                in: { $concat: [
                    { $reduce: {
                        input: { $slice: [ 
                            { $split: [ 
                                { $substrCP: [ "$BlogText", 0, "$$pos" ] }, 
                                " " 
                            ]}, 
                            -10 
                        ]},
                        initialValue: "",
                        in: { $concat : [ "$$value", " ", "$$this" ] }
                    }},
                    { $reduce: {
                        input: { $slice: [ 
                            { $split: [ 
                                { $substrCP: [  "$BlogText", "$$pos", { $strLenCP: "$BlogText" } ] }, 
                                " " 
                            ]}, 
                            10 
                        ]},
                        initialValue: "",
                        in: { $concat : [ "$$value", " ", "$$this" ] }
                    }}            
                ]}
            }
        } 
    }}
]);