What object databases allow indexing of everything in the database?

576 views Asked by At

Currently, db4o does not allow indexing on the contents of collections. What object databases do allow indexing of any individual field in the database?

Example:

class RootClass
{
   string thisIsIndexed; // Field can be indexed for quick searching.
   IList<SubClass> contentsNotIndexed = new ArrayList(); // Creates a 1-to-many relationship.
}

class SubClass
{
   string thisIsNotIndexed; // Field cannot be indexed.
}

For db4o to search by field "thisIsNotIndexed", it would have to load the complete object into memory, then use LINQ-to-Objects to scan through the field. This is slow, as it means you would potentially have to load the entire database into RAM to do a search. The way to work around this is to have all of the fields you want to search by in the root object, however, this seems like an artificial limitation.

Are there any object databases that do not suffer from this limitation, and allow indexing of any string in a sub-object?

Update

Answer #1:

I found a method which gives the best of both worlds: ease of use (with a hierarchical structure), and blindingly fast native queries using full indexing on the whole tree. It involves a bit of a trick, and a method that caches the contents of parent nodes:

  1. Create the nested hierarchy as normal.
  2. For each sub-node, create a reverse reference to the nodes parent.
  3. You can now query the leaf nodes. We are half way there now - we can query, however, its slow as it has to do a join to navigate up the tree nodes if you want to search by some parameter in a parent node.
  4. To speed it up, create a "cache" parameter which caches the search terms in the parent node. Its a method that is initially set to null, the first time its called it does an expensive join, then it mirrors the field, and from that point on the search is extremely quick.
  5. This works well for data that never changes, i.e. temperature samples over time. If the data is going to change, then you need some way of clearing the cached values if the value in the root node changes, perhaps by setting a "dirty" flag in each leaf node.

Answer #2:

If you use an Array instead of a List, you can descend into the child node using SODA. If you use a List, SODA doesn't support it, so you simply can't query with SODA (or anything else that depends on SODA, such as LINQ, QBE, Native queries, etc).

2

There are 2 answers

2
Sam Stainsby On BEST ANSWER

I'm basing this on my experience with DB40 under Scala & Java, but hopefully this is still valid: The field 'contentsNotIndexed' holds ArrayList instances, so indexing that field should only assit you in querying those ArrayList instances. If you want to query the contents of those lists efficiently, you would have to define an index on the objects you expect to find inside the lists and descend you query into the ArrayList under the 'contentsNotIndexed' field. I don't know the internals of ArrayList to suggest where that might descend though.

Depending on your needs, you can also design your class to use an array instead of an ArrayList in some cases to achieve the effect you want.

3
Gamlor On

Well, you can index the SubClass.thisIsNotIndexed in your example. And therefore you quickly can find the subclass-instances.

But of course you are right in that you cannot index collections. By that i mean it's not possible to have efficient queries if a collections contains certain elements etc. For example if you want to query for all RootClass which contain a certain SubClass. That case will be slow, because of lacking of proper collection-indexing.

In db4o you have to work around this issue. A example would be to add a field on the SubClass which contains the reference to the parent. Then you can do the query efficiently.

Another small thing. You can set a index on a collection field. But thats just an index on the reference to the collection-object. That would allow you to find the object which has a reference to a certain collection-instance. That usually pretty useless.

I guess larger object-databases do support indexing of collection and queries which go with it.