thinking sphinx scope yielding unexpected results from range input

63 views Asked by At

I have an active record scope that I am attempting to duplicate in a sphinx_scope:

scope :active, -> (date) { where("DATE(?) BETWEEN active_date AND hide_date-1 ", date) } # between is inclusive

This is close enough for government work, unless someone could show me a better approach:

sphinx_scope(:search_active) { |today|
  {:with => {:active_date => 100.years.ago.to_i..today.to_s.to_time.to_i, :hide_date => today.to_s.to_time.to_i..100.years.from_now.to_i}
  }

(yes, today.to_s.to_time.to_i is a little awkward...)

My problem is that the results don't seem to be accurate. For example, the query without the scope yields

sphinxQL> SELECT weight(),* FROM `standard_core` WHERE MATCH('1910.15') AND `sphinx_deleted` = 0 LIMIT 0, 10 OPTION field_weights=(section_number=2); 
+----------+------+----------------+--------------------+-------------+-----------+------------+------------+-----------------------+-----------------------------------------------------------------------------------+---------------------+
| weight() | id   | sphinx_deleted | sphinx_internal_id | active_date | hide_date | created_at | updated_at | sphinx_internal_class | title_sort                                                                        | section_number_sort |
+----------+------+----------------+--------------------+-------------+-----------+------------+------------+-----------------------+-----------------------------------------------------------------------------------+---------------------+
|     8557 | 3633 |              0 |                908 |  1436936400 | 297642704 | 1451164539 | 1451164539 | Standard              | § 1910.15 Shipyard employment.                                                    | 1910.15             |
|     6549 | 3637 |              0 |                909 |  1436936400 | 297642704 | 1451164539 | 1451164539 | Standard              | § 1910.15(a) Adoption and extension of established safety and health...           | 1910.15(a)          |
|     6549 | 3641 |              0 |                910 |  1436936400 | 297642704 | 1451164539 | 1451164539 | Standard              | § 1910.15(b) Definitions. For purposes of this section:                           | 1910.15(b)          |

But with the scope, the most relevant results are missing:

sphinxQL> SELECT weight() as weight,* FROM `standard_core` WHERE MATCH('1910.15') AND `active_date` BETWEEN -1672108252 AND 1482127200 AND `hide_date` BETWEEN 1482127200 AND 4639325348 AND `sphinx_deleted` = 0 ORDER BY weight DESC LIMIT 0, 10 OPTION field_weights=(section_number=2);
+--------+------+----------------+--------------------+-------------+------------+------------+------------+-----------------------+-----------------------------------------------------------------------------------+---------------------+
| weight | id   | sphinx_deleted | sphinx_internal_id | active_date | hide_date  | created_at | updated_at | sphinx_internal_class | title_sort                                                                        | section_number_sort |
+--------+------+----------------+--------------------+-------------+------------+------------+------------+-----------------------+-----------------------------------------------------------------------------------+---------------------+
|   4566 | 5469 |              0 |               1367 |  1436936400 | 1484632800 | 1451167759 | 1451167759 | Standard              | § 1910.27(d)(1)(vi) Ladder wells shall have a clear width of at least 15...       | 1910.27(d)(1)(vi)   |
|   4549 | 5413 |              0 |               1353 |  1436936400 | 1484632800 | 1451167757 | 1451167757 | Standard              | § 1910.27(c)(2) Ladders without cages or wells. A clear width of at least 15...   | 1910.27(c)(2)       |
|   4549 | 5453 |              0 |               1363 |  1436936400 | 1484632800 | 1451167758 | 1451167758 | Standard              | § 1910.27(d)(1)(ii) Cages or wells (except as provided in subparagraph (5) of...  | 1910.27(d)(1)(ii)   |

I don't think this is actually a thinking sphinx error, but rather something with sphinx itself. Or, more likely... something I'm misunderstanding :p

Can anyone shed some light on what's going on here?

[edit] ---------------------------------------------------------

OK, with @DGM, I discovered that the values for hide_date are not correct in the sphinx database record. For the record id 3633, the mysql database record 908 has a date value of 2115-07-15.

'2115-07-15'.to_time.to_i
=> 4592610000 

Clearly, there is a bit of discrepancy between 297642704 and 4592610000.

So I'm either figuring the range incorrectly or the sphinx database is being populated with an error.

indices/standard_index.rb

ThinkingSphinx::Index.define :standard, :with => :real_time do
  # fields
  indexes title, :sortable => true
  indexes content
  indexes section_number, :sortable => true

  # attributes
  has active_date, :type => :timestamp
  has hide_date,   :type => :timestamp
  has created_at,  :type => :timestamp
  has updated_at,  :type => :timestamp
end

Is this losing something in the translation between a mysql date field and a sphinx timestamp?

2

There are 2 answers

3
pat On BEST ANSWER

I'm not entirely sure what the problem is here, but some thoughts:

If you're storing those dates as string values, then that's what will be passed through to Sphinx. The :type settings are only used for generating the Sphinx configuration file, not for translating attribute values (though certainly, there's a strong argument they should do both!). Of course, that doesn't quite explain how 2115-07-15 becomes 297642704, but could be part of what's at play. Again, only if the model's returning hide_date as a String rather than a Date.

A separate issue is that Sphinx stores timestamps as unsigned integers, so anything before 1st Jan 1970 should be avoided.

Not a solution to the problem, but perhaps to the overarching problem: I would recommend using integer representations of the dates themselves in Sphinx. e.g. 2115-07-15 becomes 21150715.

So, something like the following in your model:

def hide_date_to_i
  hide_date.to_s.gsub('-', '').to_i
end

And then in the index definition:

has hide_date_to_i, :as => :hide_date, :type => :integer

You'll need to update your scope accordingly as well.

Hopefully that'll get things working accordingly, but if not, should at least make the Sphinx values easier to debug!

1
DGM On

Your hide_date data appears to be too low compared to the other time ranges: 297642704