How to create and maintain couchDB/pcouchDB doc _id's

993 views Asked by At

I'm pretty new to couchDB and I'm trying to wrap my mind behind the doc _id's usage. What I read and learned so far, is that I should generate a doc _id so I can use the B-tree for Index/Maping. Suggested are tools like Docuri or pouchdb/collate. Let some code speak for itself:

    // define a docuri route
    Docuri.routes({
        ':type/:name/:created_at': 'list'
    });

    var doc = {}; 
        doc.name = 'Testname_1';
        doc.type = 'List';
        doc.created_at = Math.floor(Date.now() / 1000);
        doc.updated_at = Math.floor(Date.now() / 1000);
        doc._id = Docuri.list(doc);

console.log(doc');
// {
//    _id: "list/Testname_1/1433973431"
//     created_at: 1433973431
//     name: "Testname_1"
//     type: "list"
//     updated_at: 1433973431
// }

Next I would add some items for a list with the following doc sturcture.

    // define a docuri route
    Docuri.routes({
        '/:list_id/:type/:item/:created_at': 'item'
    });

    var doc = {}; 
        doc.item = 'Item_1';
        doc.type = 'Item';
        doc.list_id = 'List/Testname_1/1433973431';
        doc.created_at = Math.floor(Date.now() / 1000);
        doc.updated_at = Math.floor(Date.now() / 1000);
        doc._id = Docuri.item(doc);

console.log(doc');
// {
//    _id: "List/Testname_1/1433973431/Item/Item_1/1433973431"
//     list_id: "List/Testname_1/1433973431"
//     created_at: 1433973431
//     item: "Item_1"
//     type: "Item"
//     updated_at: 1433973431
// }

Question No.1

Is this a good structure for smaller databases?

Question No.2

(And this bugs me mostly) Let's say I would use the List _id's like <a href="List/Testname_1/1433973431/">Testname_1</a> . And now what if the List Name would change, should I change the List _id's too and then change all list_id's from the corresponding Items?

This seems pretty odd to me since I would normally not change the ID from a database entry.

But on the other hand a user would expect that the HMTL-Link corresponds to his new Listname.

Maybe someone can push me in the right direction, how to manage and use the _id's in couchDB and pouchDB

Edit

Here are the two tutorials were I read about the UUIDs

Before deciding on using a random value as doc _id, read the section When not to use map reduce

Use domain specific document ids where possible. With CouchDB it is best practice to use meaningful ids.

http://docs.ehealthafrica.org/couchdb-best-practices/

In this example, you're getting all those "indexes" for free, each time a document is added to the database. It doesn't take up any additional space on disk compared to the randomly-generated UUIDs, and you don't have to wait for a view to get built up, nor do you have to understand the map/reduce API at all.

Of course, this system starts to get shaky when you need to search by a variety of criteria: e.g. all albums sorted by year, artists sorted by age, etc. And you can only sort strings – not numbers, booleans, arrays, or arbitrary JSON objects, like the map/reduce API supports. But for a lot of simple applications, you can get by without using the query() API at all.

Performance tip: if you're just using the randomly-generated doc IDs, then you're not only missing out on an opportunity to get a free index – you're also incurring the overhead of building an index you're never going to use. So use and abuse your doc IDs!

http://pouchdb.com/2014/05/01/secondary-indexes-have-landed-in-pouchdb.html

2

There are 2 answers

0
superfly On BEST ANSWER

I ended up using the two Helper Scripts, docuri and speakingurl.

The entries in my "List" database now have a new field slug. First I use speakingUrl to create a slug from the List Name provided by the User, and then use docuri to generate the _id with slug value.

docUri.routes({ ':type/:slug/:created_at': 'list' });

var slug = speakingUrl('My List Name is test');

var listObj = {};    
listObj.name = 'My List Name is test';
listObj.type = 'list';  
listObj.created_at = Math.floor(Date.now() / 1000);
listObj.updated_at = Math.floor(Date.now() / 1000);
listObj.slug = slug;  
listObj._id = docuri.list( listObj );

My list docs look like this:

[
  {
    "id": "list/my-list-name-is-test/1436098113",
    "key": "list/my-list-name-is-test/1436098113",
    "value": {
      "rev": "1-d96c34ce1732e3e8088c4fa9d6e54c14"
    },
    "doc": {
      "name": "My List Name is test",
      "type": "list",
      "created_at": 1436098113,
      "updated_at": 1436098113,
      "slug": "my-list-name-is-test",
      "_id": "list/my-list-name-is-test/1436098113",
      "_rev": "1-d96c34ce1732e3e8088c4fa9d6e54c14"
    }
  }
]

Sort Lists by name in p/couchDB { startkey: 'list', endkey: 'list\uffff' }

With this setup I can use the slug field for the List URL www.foo.bar/list/my-list-name-is-test. On the destination page I use the URL slug to query the List Items with the following filter

{ startkey: 'item/' + URL_SLUG_VAR, endkey: 'item/' + URL_SLUG_VAR + '\uffff' }

My Item docs look like this:

[
  {
    "id": "item/my-list-name-is-test/This is the item Title/1436098113",
    "key": "item/my-list-name-is-test/This is the item Title/1436098113",
    "value": {
      "rev": "1-c023db010d075d6a9129288b0649554d"
    },
    "doc": {
      "Title": "This is the item Title",
      "type": "item",
      "created_at": 1436098113,
      "updated_at": 1436098113,
      "slug": "this-is-the-item-title",
      "_id": "item/my-list-name-is-test/This is the item Title/1436098113",
      "_rev": "1-c023db010d075d6a9129288b0649554d"
    }
  }
]

When the User now changes the List Name value, the slug should stay the same and so the query for the Items should work.

The downside of this solution is, when the user changes the List Name, the slug would NOT change and so the URL would stay the way it was created in the first place. This is IMHO not the best usability, since the user would expect that the URL to his List corresponds to the NEW List Name.

I'm still thinking about to also change the corresponding Items _ids when the List Name changes. But this "feels" like the wrong way to go in consideration of database performance & design.

If anyone comes up with a better solution or any suggestions, please post them.

1
natevw On

Docuri is an interesting idea, and I'm all for CouchDB tricks and "hacks" like that, but please don't be too misled by it. It is a trick and it is something of a "hack".

I basically have only a few policies/habits with document ids:

  • they are completely random and meaningless (to the app code), although I often prefix them with their type strictly only for debugging convenience — any code that needs to know the document type gets it from a doc.type or similar field and not from doc._id. So I might call a document "photo-1qr333qew3qadeiof" just so I can notice it in web logs or something, but app logic does not assume anything based on the id.
  • occasionally I need to ensure uniqueness of a related document, for example a "user" document may want exactly (or at least, no more than) one related "profile" document. Or, a better example, maybe I want to make sure a particular transaction occurs no more than once: avoid duplicate purchase or something. So I take a "tuple" of perhaps the user id and the content id and identify the purchase record by 'txn-'+hash(username + song._id). Then if that particular combination accidentally happens again I will get a 409* because of the deterministically derived id.
  • even less often, I will sometimes make special ids for an app (think a doc called "SHARED_CONFIG" or "MY_APP_GLOBAL_COUNTER" or something…) so it can access them directly in limited cases for specific purposes.

But the point being that you should, by default, use some sort of UUID for your documents unless you have a strong reason not to. The fact that CouchDB only provides atomicity at the document level means you might put a tiny bit more meaning into your document id (as in the second case) and see also tricks like Docuri that kind of use the id to "optimize" some cases, but start by thinking of them as nothing more than a "meaningless but unique" string.

You normally use views to deal make meaningful [secondary] indexes based on the data inside the document. (And yes, again, you can use/abuse the "primary" index i.e. the database itself via _all_docs in special cases as an optimization/trick/hack, but this would not be normal practice.)

[*proper handling of the last case is more complicated under Cloudant and probably CouchDB 2.0 due to its totally broken quorum handling but that's a different topic.]