delete documents from solr index - index is not touched

6.6k views Asked by At

I'm trying to understand why my Solr index is not even touched when I delete my whole index!

So far I've tried a query directly to solr:

curl 'http://localhost:8080/solr/update?stream.body=<delete><query>*:*</query></delete>&commit=true'

I've also tried pysolr:

In [242]: from pysolr import Solr

In [243]: conn = Solr('http://localhost:8080/solr/')

In [244]: conn.delete(q='*:*')

The output of these two above commands is the same in catalina log.

No matter how I try this (even tried from the admin panel), the index still shows same number of docs:

Num Docs:
323
Max Doc:
323
Version:
52
Segment Count:
1

At first I thought it was a permission issue of my solr/data folder, but it was not. I commented out my cache in my solrconfig.xml, result was the same.

It would be great if anyone has any tips!


Later edit:

Everytime I run the above commands, only the files that have a later timestamp are modified (only from spellchecker dir) - you can also see that the user ownership is changed from www-data to root - when I manually run those commands

data/index:
total 2112
-rw-r--r-- 1 www-data root 1268535 2012-08-10 13:41 _f.fdt
-rw-r--r-- 1 www-data root    2618 2012-08-10 13:41 _f.fdx
-rw-r--r-- 1 www-data root    1135 2012-08-10 13:41 _f.fnm
-rw-r--r-- 1 www-data root  201513 2012-08-10 13:41 _f_Lucene40_0.frq
-rw-r--r-- 1 www-data root  207400 2012-08-10 13:41 _f_Lucene40_0.prx
-rw-r--r-- 1 www-data root  419705 2012-08-10 13:41 _f_Lucene40_0.tim
-rw-r--r-- 1 www-data root   11199 2012-08-10 13:41 _f_Lucene40_0.tip
-rw-r--r-- 1 www-data root     245 2012-08-10 13:41 _f_nrm.cfe
-rw-r--r-- 1 www-data root    2751 2012-08-10 13:41 _f_nrm.cfs
-rw-r--r-- 1 www-data root     382 2012-08-10 13:41 _f.si
-rw-r--r-- 1 www-data root      20 2012-08-10 13:41 segments.gen
-rw-r--r-- 1 www-data root      98 2012-08-10 13:41 segments_h
-rw-r--r-- 1 root     root       0 2012-08-10 13:55 write.lock

data/spellchecker:
total 792
-rw-r--r-- 1 root root 129251 2012-08-10 14:16 _q.fdt
-rw-r--r-- 1 root root  84282 2012-08-10 14:16 _q.fdx
-rw-r--r-- 1 root root   1119 2012-08-10 14:16 _q.fnm
-rw-r--r-- 1 root root 288855 2012-08-10 14:16 _q_Lucene40_0.frq
-rw-r--r-- 1 root root 257208 2012-08-10 14:16 _q_Lucene40_0.tim
-rw-r--r-- 1 root root   9355 2012-08-10 14:16 _q_Lucene40_0.tip
-rw-r--r-- 1 root root    306 2012-08-10 14:16 _q.si
-rw-r--r-- 1 root root     69 2012-08-10 14:16 segments_1p
-rw-r--r-- 1 root root     20 2012-08-10 14:16 segments.gen

data/tlog:
total 444
-rw-r--r-- 1 www-data root 363169 2012-08-10 12:11 tlog.0000000000000000019
-rw-r--r-- 1 www-data root  79280 2012-08-10 12:11 tlog.0000000000000000020

Catalina log truncated, conn.delete(q=':') ran - here is what is logged in catalina.log

Aug 10, 2012 3:17:57 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit{flags=0,version=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}
Aug 10, 2012 3:17:57 PM org.apache.solr.search.SolrIndexSearcher <init>
INFO: Opening Searcher@1d4eeb5 main
Aug 10, 2012 3:17:57 PM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to Searcher@1d4eeb5 main{StandardDirectoryReader(segments_h:52 _f(4.0):C323)}
Aug 10, 2012 3:17:57 PM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener done.
Aug 10, 2012 3:17:57 PM org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener buildSpellIndex
INFO: Building spell index for spell checker: default
Aug 10, 2012 3:17:57 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
Aug 10, 2012 3:18:02 PM org.apache.solr.core.SolrCore registerSearcher
INFO: [collection1] Registered new searcher Searcher@1d4eeb5 main{StandardDirectoryReader(segments_h:52 _f(4.0):C323)}
Aug 10, 2012 3:18:02 PM org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [collection1] webapp=/solr path=/update/ params={commit=true} {deleteByQuery=*:*,commit=} 0 5608

later later edit:

I tried to delete by id and it works! So for some reason deleting by

q=*:*

fails ...

3

There are 3 answers

1
marius_5 On BEST ANSWER

I found out that the issue was in my schema.xml

I rewrote it and now works like a charm!

1
Chris On

Give this a shot in curl, has worked for me in the past:

curl http://localhost:8080/solr/update/?commit=true -H "Content-Type: text/xml" --data-binary '<delete><query>*:*</query></delete>'; 

Hope it helps.

1
Paige Cook On

I think you need to switch the order of your commit and stream.body parameters. From looking at the last entry from your catalina log, it appears that the commit value is not being included in the delete query.

try this:

 curl 'http://localhost:8080/solr/update?commit=true&stream.body=<delete><query>*:*</query></delete>'