I have a large index definition that takes too long to index. I suspect the main problem is caused by the many LEFT OUTER JOINs generated.
I saw this question, but can't find documentation about using source: :query
, which seems to be part of the solution.
My index definition and the resulting query can be found here: https://gist.github.com/jonsgold/fdd7660bf8bc98897612
How can I optimize the generated query to run faster during indexing?
This is the solution that worked best (from the linked question). Basically, you can remove a piece of the main query
sql_query
and define it separately as asql_joined_field
in thesphinx.conf
file.It's important to add all relevant sql conditions to each
sql_joined_field
(such as sharding indexes by modulo on the ID). Here's the new definition:The magic that defines the field
site
as a separate query is the optionsource: :query
at the end of the line.Notice the core index definition has the parameter
delta?: false
, while the delta index definition has the parameterdelta?: true
. That's so I could use the conditionWHERE incidents.delta = 1
in the delta index and filter out irrelevant records.I found sharding didn't perform any better, so I reverted to one unified index.
See the whole index definition here: https://gist.github.com/jonsgold/05e2aea640320ee9d8b2.
Important to remember!
The Sphinx document ID offset must be handled manually. That is, whenever an index for another model is added or removed, my calculated document ID will change. This must be updated.
So, in my example, if I added an index for a different model (not
:incident
), I would have to runrake ts:configure
to find out my new offset and changeincidents.id * 51 + 7
accordingly.