I am trying to query the windows desktop search API using SQL.
I have to say I really HATE the windows 7 search GUI, and so I decided to write my own. I have a lot of files indexed (approx 1.000.000), and I want to do a search for names. Something like: Show me every name which contains "bunny".
But here i run into a performance problem. Searching for
SELECT "System.ItemPathDisplay"
FROM "SystemIndex"
WHERE System.FileName LIKE 'egon%'
is really fast. Also the %egon
alternative. But %egon%
takes forever. I am not sure if it is in the nature of the index (I understand that the possibilities increase enormously) or if I am doing something wrong.
The question is:
- Is it correct that the windows index is only a big SQL database?
- If so, where can I find exact information about the structure of the DB (primary keys, indexes).
If I have that, its basically just optimizing SQL.
Alternative question: Does anybody knows a fast SQL statement to find all files with egon somewhere in the name.
Edit: Why I do not like the search GUI
Well, its just not intuitive, compared to XP. If you disable the dog and use the old XP interface, I could create a search query like:
- All files older than 1 month
- bigger than 10 MB
- name pattern
*_homework_*.docx
Try this in Windows 7 without "learning" the syntax. And hell, I do not want to learn another syntax just to find one file.
The other main problem are maybe my search habits. Most of the time I somehow know the file name (or parts) and simply want the location. And if you use the search this way you ran into several problem:
- First of all, you always have to prefix it with name:
- Then the folder name layout is stupid (it is ordering by parent folder, not full path, I think, because.. tada... see next point)
- Then, even more annoying, if you have a list of results and you try to sort them, it takes forever
And now I really think my system has a bug. I tried to quickly check it, searched in some average size folder for "test" and he found some files. Then I tried to sort them for folders (to verify my second point) and now he is just searching forever... I mean really, while I am typing he tries to find the word "hello"... oh, finished - he found approx 20 files. So, now, lets try something.... Ok, now it seems like he has recovered.. But still, to slow for my taste...
So, enough cursing about search :-)
It looks like they're building an index on the name, so it can use the index as long as you've specified the beginning of the string, but if you haven't, it has to use a table scan.
Assuming they're using Microsoft's full-text search engine, then try using something like:
... WHERE system.filename CONTAINS 'egon'
There are basically two choices: it'll be rejected as invalid (i.e. this SQL interface doesn't support their F-T search extension) or else it'll be quite a bit faster.
EDIT:Oops -- the syntax should be "contains(system.filename, 'egon')". Sorry 'bout that.