How does OEIS do subsequence search?

510 views Asked by At

The Online Encyclopedia of Integer Sequences supports search for sequences containing your query as a subsequence, eg. searching for subseq:212,364,420,428 will return the 8*n+4 sequence. (http://oeis.org/search?q=subseq:212,364,420,428)

This amazing feature was apparently implemented by Russ Cox as by http://oeis.org/wiki/User:Russ_Cox/OEIS_Server_Features, but it is not specified with what algorithm this is implemented.

I'm wondering how it is done. Clearly going through nearly a million of sequences for every search is impractical for a search engine. Just keeping an index (which is how the same Russ Cox did Google Code Regex Search) of the first number and brute forcing the rest also doesn't work, as numbers like 0 is in nearly all sequences. In fact some queries like 0 1 match a high percentage of the total database, so the algorithm needs a running time sensitive to the desired output size.

Does anyone happen to know how this feature is implemented?

2

There are 2 answers

0
ElKamina On

My guess is part of the data is stored in an inverted index. That is each number is linked to a set of series, and when multiple sequences are entered, the set of common sequences is shown. This is extremely fast and used by almost every search engine.

Storing as a suffix trees or any linked data structure is useless for this application.

At least for some set of sequences (eg ax+b), I think it is better to save them parametrically rather than storing the actual sequence.

0
Amrinder Arora On

First of all, that online search only seems to work with numbers up to a 1000. Does it work for larger numbers too? Secondly, just out of curiosity, for the example that you have provided, for some reason, OEIS does not list A000027, which is just natural numbers, but obviously it should match.

Database Based Solution

If this was implemented purely in DB, for a 4 item search, it would be something like this.

Tables

sequence {seqid, seqname, etc..}

seqitem {value, seqid, location }

Query

select si1.ds, si1.location, si2.location .... from seqitem si1, seqitem si2, seqitem si3, seqitem si4 where si1.seqid = si2.seqid and si2.seqid = si3.seqid and si3.seqid = si4.seqid and si1.location < si2.location and si2.location < si3.location and si3.location < si4.location and si1.value =$v1 and si2.value = $v2 and si3.value = $v3 and si4.value = $v4