How to index plain text files for search in Sphinx

3.7k views Asked by At

I scanned dozens of articles and forum threads, looked through official documentation, but couldn't find an answer. This article sounds promising, since is says that The data to be indexed can generally come from very different sources: SQL databases, plain text files, HTML files, but unfortunately as all other articles and forum threads it is devoted to MySQL.

It is rather strange to hear that Sphinx is so cool, it can do this and that, it can do practically anything you want with any data source you like. But where are all those examples with data sources other than MySQL ? Just one tiniest and trivial step-by-step example of Sphinx configuration when you want to scan the easiest source of data in the world - plain text files. Let's say, I've installed Sphinx and want to scan my home directory (recursively) to find all plain text files, containing "Hello world". What should I do to implement this?

Prerequisites:

  • Ubuntu
  • sudo apt-get install sphinxsearch
  • ... what is next????
1

There are 1 answers

3
Lakshman Srikanth D On BEST ANSWER

Have a look at this before proceeding Sphinx without SQL! .

Ideally I would do this.

We are going to use Sphinx's sql_file_field to index a table with file path. Here is the PHP script to create a table with file path for a particular directory(scandir).

<?php
$con = mysqli_connect("localhost","root","password","database");

mysqli_query($con,"CREATE TABLE fileindex ( id INT(6) UNSIGNED AUTO_INCREMENT PRIMARY KEY,text VARCHAR(100) NOT NULL);");

// Check connection
if (mysqli_connect_errno()) {
    echo "Failed to connect to MySQL: " . mysqli_connect_error();
}

$dir = scandir('/absolute/path/to/your/dir/');

foreach ($dir as $entry) {
    if (!is_dir($entry)) {
        $path= "/absolute/path/to/your/dir/$entry";
        mysqli_query($con,"INSERT INTO fileindex ( text ) VALUES ( '$path' )");
    }
}

mysqli_close($con);


?>

Below code is sphinx.conf file to index the table with filepath. Notice sql_file_field which will index those files which are specified in the text(filepath) column

source src1
{

    type            = mysql
    sql_host        = localhost
    sql_user        = root
    sql_pass        = password
    sql_db          = filetest
    sql_port        = 3306  # optional, default is 3306
    sql_query_pre = SET CHARACTER_SET_RESULTS=utf8
    sql_query_pre = SET NAMES utf8
    sql_query       = SELECT id,text from fileindex
    sql_file_field = text

}

index filename
{
    source          = src1
    path            = /var/lib/sphinxsearch/data/files
    docinfo         = extern
}

indexer
{
    mem_limit   = 128M
}

searchd
{
    log                 = /var/log/sphinxsearch/searchd.log
    pid_file            = /var/log/sphinxsearch/searchd.pid
}

After creating table, saving the sphinx.conf in /etc/sphinxsearch/sphinx.conf just run sudo indexer filename --rotate, your indexes are ready! Type search and then keyword to get results.