do searching in a very big ARPA file in a very short time in java

522 views Asked by At

I have an ARPA file which is almost 1 GB. I have to do searching in it in less than 1 minute. I have searched a lot, but I have not found the suitable answer yet. I think I do not have to read the whole file. I just have to jump to a specific line in the file and read the whole line. The lines of the ARPA file do not have the same length. I have to mention that ARPA files have a specific format.

File Format

\data\

ngram 1=19

ngram 2=234

ngram 3=1013

\1-grams:

-1.7132 puluh -3.8008

-1.9782 satu -3.8368

\2-grams:

-1.5403 dalam dua -1.0560

-3.1626 dalam ini 0.0000

\3-grams:

-1.8726 itu dan tiga

-1.9654 itu dan untuk

\end\

As you see in the sample file I have 19 lines of 1-grams, 234 lines of 2-grams and 1013 lines of 3-grams. I give the string part of the line to the program and get the numbers which are at the left and at the right side of the string. The input string can help me to know in which part of the file I have to do searching.I have to find a way not to read the file completely, because my file is very big and reading the whole file takes a lot of time. I think it is a good way to jump to the specific line in the file without using the index file and access to the whole line.

It will be great if you can help me to do my assignment.

1

There are 1 answers

3
Speck On

I don't know what an ARPA file is. I'm assuming it's some sort of file containing text.

What you want to do is first index the file so you can associate line numbers in the file to Strings.

That's a big file so you'd probably store your index in a separate file.

First, prior to the user searching, you'd run your index. Then you'd search your index for the line numbers where the String the user is looking for is found.