How do I fix a slow search function?

84 views Asked by At

I am looking into the search functionality of a Vue application built off of Grove-Vue. This application uses MarkLogic's REST API v1/search

The search result for a two word search (from about 2 million documents, and about 10 different collections) takes 14 seconds to execute. 28 seconds if the pageLength is set to 20 and 42 seconds for thirty records.

To isolate the problem, when I search for the same 2 words using search.search function from MarkLogic in the query console, it takes about the same time.

const search = require('/MarkLogic/appservices/search/search');
search.search("one two");

Which eliminates the delay from the UI layer. There is negligible difference with or without options - so options are not a problem either.

I looked at the query tuning documentation and added query-meters to the search.search function. I see this :

"elapsedTime": "PT14.190265S", 
"requests": 0, 
"listCacheHits": 53, 
"listCacheMisses": 18, 
"listSize": 73324, 
"inMemoryListHits": 0, 
"tripleCacheHits": 0, 
"tripleCacheMisses": 0, 
"tripleValueCacheHits": 0, 
"tripleValueCacheMisses": 0, 
"expandedTreeCacheHits": 19, 
"expandedTreeCacheMisses": 0, 
"compressedTreeCacheHits": 0, 
"compressedTreeCacheMisses": 0, 
"compressedTreeSize": 0, 
"inMemoryCompressedTreeHits": 0, 
"valueCacheHits": 35, 
"valueCacheMisses": 35, 
"regexpCacheHits": 23, 
"regexpCacheMisses": 8, 
"linkCacheHits": 0, 
"linkCacheMisses": 0, 
"filterHits": 19, 
"filterMisses": 0, 
"fragmentsAdded": 0, 
"fragmentsDeleted": 0, 
"fsProgramCacheHits": 0, 
"fsProgramCacheMisses": 0, 
"dbProgramCacheHits": 0, 
"dbProgramCacheMisses": 0, 
"envProgramCacheHits": 0, 
"envProgramCacheMisses": 0, 
"fsMainModuleSequenceCacheHits": 0, 
"fsMainModuleSequenceCacheMisses": 0, 
"dbMainModuleSequenceCacheHits": 0, 
"dbMainModuleSequenceCacheMisses": 0, 
"fsLibraryModuleCacheHits": 0, 
"fsLibraryModuleCacheMisses": 25, 
"dbLibraryModuleCacheHits": 0, 
"dbLibraryModuleCacheMisses": 2, 
"readLocks": 0, 
"writeLocks": 0, 
"lockTime": 0, 
"contemporaneousTimestampTime": 0, 
"compileTime": 0.210152, 
"commitTime": 0, 
"runTime": 0, 
"indexingTime": 0, 
"fsSchemaCacheHits": 0, 
"fsSchemaCacheMisses": 0, 
"dbSchemaCacheHits": 0, 
"dbSchemaCacheMisses": 0, 
"envSchemaCacheHits": 16184, 
"envSchemaCacheMisses": 0, 
"fragments": [
], 
...

Also used query-trace and this is the output

2024-01-22 16:13:07.072 Info: /MarkLogic/appservices/search/search-impl.xqy at 2666:37: impl:apply-search(map:map(<map:map xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" .../>), (), fn:false())
2024-01-22 16:13:07.072 Info: /MarkLogic/appservices/search/search-impl.xqy at 2666:37: Analyzing path for search: fn:collection()
2024-01-22 16:13:07.072 Info: /MarkLogic/appservices/search/search-impl.xqy at 2666:37: Step 1 is searchable: fn:collection()
2024-01-22 16:13:07.072 Info: /MarkLogic/appservices/search/search-impl.xqy at 2666:37: Path is fully searchable.
2024-01-22 16:13:07.072 Info: /MarkLogic/appservices/search/search-impl.xqy at 2666:37: Gathering constraints.
2024-01-22 16:13:07.072 Info: /MarkLogic/appservices/search/search-impl.xqy at 2666:37: Search query contributed 2 constraints: cts:and-query((cts:word-query("one", ("lang=en"), 1), cts:word-query("two", ("lang=en"), 1)), ())
2024-01-22 16:13:07.072 Info: /MarkLogic/appservices/search/search-impl.xqy at 2666:37: Executing search.
2024-01-22 16:13:07.077 Info: /MarkLogic/appservices/search/search-impl.xqy at 2666:37: Selected 24292 fragments to filter

Examined the query-trace output in the ErrorLog.txt file and there were no steps that are unsearchable.

There are a fair few cache misses in the query meters output. The documentation then says cache misses indicate that the query might be able to be optimized, either by rewriting the parts of the query that have cache misses to better take advantage of the indexes or by adding indexes that the query can use. Because the application uses the MarkLogic Rest API, my assumption is I dont think I can optimize the query. Is there a way I can optimize this query further? If not, then what do I need to do with the indexes so the query can use the indexes?

Assuming the cache misses are the only reason for the slow search. Any pointers from here please?

The server specs (disk space, RAM, etc) are up to date with the recommendations of MarkLogic.

EDIT: The options specify <search-option>unfiltered</search-option> - so you are right, the search is unfiltered. There is very minor difference when re-running the search. The disk usage for the whole of MarkLogic (installation, forests, logs, etc) is ~300G - so the document sizes are not very large. Search result times with suggestions:

default: PT13.944414S

return-results false: PT0.233757S

return-results false and snippet = raw(instead of = apply): PT0.02006S

Just with snippet = raw(instead of = apply): PT12.063621S

So indeed looks like the most time is spent generating snippets. Is there a way to hasten that up?

These are the snippets being passed in as options:

<transform-results apply="snippet">
    <preferred-matches>
      <element ns="http://marklogic.com/entity-services" name="instance"/>
    </preferred-matches>
    <max-matches>1</max-matches>
    <max-snippet-chars>1000</max-snippet-chars>
    <per-match-tokens>20</per-match-tokens>
  </transform-results>

  <return-query>1</return-query>

  <!-- This controls the snippet size toggle -->
  <operator name="results">
    <state name="compact">
      <transform-results apply="snippet">
        <preferred-matches>
          <element ns="http://marklogic.com/entity-services" name="instance"/>
          <json-property>instance</json-property>
        </preferred-matches>
        <max-matches>5</max-matches>
        <max-snippet-chars>2000</max-snippet-chars>
        <per-match-tokens>100</per-match-tokens>
      </transform-results>
    </state>
    <state name="detailed">
      <transform-results apply="snippet">
        <preferred-matches>
          <element ns="http://marklogic.com/entity-services" name="instance"/>
          <json-property>instance</json-property>
        </preferred-matches>
        <max-matches>5</max-matches>
        <max-snippet-chars>2000</max-snippet-chars>
        <per-match-tokens>100</per-match-tokens>
      </transform-results>
    </state>
  </operator>

EDIT2: Time difference in repeated running of the same function above:

1-PT14.17
2-PT13.96
3-PT13.96
4-PT13.91

With <max-matches>1</max-matches> PT13.96862S

Added condition <max-snippet-chars>200</max-snippet-chars> PT13.97606S

Added condition with no per-match-tokens tag PT13.977932S

Added back <per-match-tokens>10</per-match-tokens> PT14.173813S

1

There are 1 answers

1
Mads Hansen On

It sounds as if the majority of the time is from reading the documents for the snippets.

If reading and processing 10 documents in the search results is taking 10-15 seconds, I'd be suspicious about the size of the documents and/or the capacity of the cluster.

If the searches are fast, but processing the documents is slow(er than you would like), you could try separating the search from the snippeting.

For instance, using apply="empty-snippet" to return the information to support the work, but then do that after the search is returned to decorate the results async?

https://docs.marklogic.com/guide/search-dev/query-options#id_58295

The apply="empty-snippet" option returns no result node, but does return an empty search:snippet element for each search:result. The search:result wrapper element does have the information (for example, the URI and path to the node) needed to access the node and perform your own transformation on the matching search node(s), so you can write your own code outside of the Search API to process the results.