How to create an index using Whoosh

Question

How to create an index using Whoosh

4k views Asked by user3422243 At 11 June 2015 at 11:04

I am trying to use Whoosh for text searching for the first time. I want to search for documents containing the word "XML". But because I am new to Whoosh, I just wrote a program that search for a word from a document. Where the document is a text file (myRoko.txt)

import os, os.path
from whoosh import index
from whoosh.index import open_dir
from whoosh.fields import Schema, ID, TEXT
from whoosh.qparser import QueryParser
from whoosh.query import *

if not os.path.exists("indexdir3"):
   os.mkdir("indexdir3")

schema =  Schema(name=ID(stored=True), content=TEXT)
ix = index.create_in("indexdir3", schema)
writer = ix.writer()
path = "myRoko.txt"

with open(path, "r") as f:
   content = f.read()
   f.close()
   writer.add_document(name=path, content= content)

  writer.commit()

  ix = open_dir("indexdir3")
  query_b = QueryParser('content', ix.schema).parse('XML')
  with ix.searcher() as srch:
    res_b = srch.search(query_b)
    print res_b[0]

The above code is supposed to print the document that contain the word "XML". However the code return the following error:

    raise ValueError("%r is not unicode or sequence" % value)

    ValueError: 'A large number of documents are now represented and stored      
    as XML document on the web. Thus ................

What could be the cause of this error?

Original Q&A

There are 2 answers

**Assem** · Answer 1 · 2015-06-27T12:30:07+00:00

You have a Unicode problem. You should pass unicode strings to the indexer. For that, you need to open the text file as unicode:

import codecs
with codecs.open(path, "r","utf-8") as f:
   content = f.read()

and use unicode string for file name:

path = u"myRoko.txt"

After fixes I got this result:

<Hit {'name': u'myRoko.txt'}>

**AudioBubble** · Answer 2 · 2016-09-07T17:33:15+00:00

AudioBubble On 07 September 2016 at 17:33

writer.add_document(name=unicode(path), content=unicode(content))

It has to be UNICODE

TechQA.

How to create an index using Whoosh

There are 2 answers

Related Questions in PYTHON

Related Questions in SEARCH

Related Questions in INDEXING

Related Questions in UNICODE

Related Questions in WHOOSH

Popular Questions

Popular Tags

Trending Questions