Sphinx correctly match json array with all values

1.8k views Asked by At

I'm trying to match rows that have json arrays that contain all given elements.

Example search items:

['gym', 'sofa']

Expected rows matched:

['gym', 'sofa']
['gym', 'sofa', 'bed']
['pool', 'gym', 'sofa']
['pool', 'gym', 'sofa', 'bed']

Should not match:

['pool', 'gym', 'bed']
['pool', 'sofa', 'bed']

I'm storing the items in a json text that is indexed as a json attribute.

Table example:

ID  ITEMS
1   {'items': ['gym', 'sofa']}
2   {'items': ['gym', 'sofa', 'bed']}
3   {'items': ['pool', 'gym', 'bed']}
4   {'items': ['pool', 'sofa', 'bed']}

My sphinx.conf is something like:

source srcItems
{
    type            = mysql

    sql_host        = localhost
    sql_user        = root
    sql_pass        =
    sql_db          = items
    sql_port        = 3306    # optional, default is 3306

    sql_query        = \
        SELECT id, items \
        FROM items

    sql_attr_json        = items
}

index items
{
    source          = srcItems
    path            = /opt/local/var/sphinx/data/items
}

indexer
{
    mem_limit        = 128M
}

searchd
{
    listen          = 9312
    listen          = 9306:mysql41
    log             = /opt/local/var/sphinx/log/searchd.log
    query_log       = /opt/local/var/sphinx/log/query.log
    read_timeout    = 5
    max_children    = 30
    pid_file        = /opt/local/var/sphinx/log/searchd.pid
    max_matches     = 1000
    seamless_rotate = 1
    preopen_indexes = 1
    unlink_old      = 1
    workers         = threads # for RT to work
    binlog_path     = /opt/local/var/sphinx/data
}

I tried using the following with no results:

SELECT id, 
    ALL(var='gym' AND var='sofa' FOR var IN items.items) as i 
FROM items 
WHERE i=1;

SELECT id, 
    ANY(var='gym' AND var='sofa' FOR var IN items.items) as i 
FROM items 
WHERE i=1;

Also tried the following, which returns wrong results:

SELECT id, 
    ALL(var='gym' OR var='sofa' FOR var IN items.items) as i 
FROM items 
WHERE i=1;

SELECT id, 
    ANY(var='gym' OR var='sofa' FOR var IN items.items) as i 
FROM items 
WHERE i=1;

I got the expected results when I did:

SELECT id, 
    IN(items.items, 'gym') AS gym, 
    IN(items.items, 'sofa') AS sofa 
FROM offers 
WHERE gym = 1 AND sofa = 1 

But it slows down the query substantially and will make the building of the query a lot more complicated.

What am I doing wrong?

What is the correct way to do this query in Sphinx?

2

There are 2 answers

0
barryhunter On

I think

SELECT id FROM offers  
  WHERE items.items IN('gym')
  AND items.items IN('sofa')

might work. If not, try

SELECT id, 
    IN(items.items, 'gym')+IN(items.items, 'sofa') AS i
FROM offers 
WHERE i = 2 

otherwise

SELECT id, 
   ANY(var='gym' FOR var IN items.items)+ANY(var='sofa' FOR var IN items.items) as i 
FROM items 
WHERE i=2;
0
mumia On

Ended up using:

SELECT id, 
    IN(items.items, 'gym') AS gym, 
    IN(items.items, 'sofa') AS sofa 
FROM offers 
WHERE gym = 1 AND sofa = 1

It works fine and the extra time isn't that big of an issue and can be mitigated with better hardware.

Thanks