I have experience on Orm Frameworks and i start to understand structure of NoSql database solutions.I will go on with some samples based on object models.
I have below document model and i want to think few scenario handling.
- Save post with few tags
- Show tag list with post count
- Update a tag
public class Post
{
public string Title { get; set; }
public List<Tag> Tags { get; set; }
}
public class Tag
{
public string Name { get; set; }
}
And few questions appear in my mind about my scenarios.
Post class is a document which will be saved with tags. In RDBMS, Tag and Post has many-to-many relations but i understand that it has no any relationship in NoSql so post object is saved with whole members.So show tag list with post count scenario will cause to heavy query in whole post items with some effort in every query so don't i lose all benefit of NoSql power in this scenario ?
Update a tag name will not cause some complex jobs ? I have to query whole post items and find that it has that tag name and update it. By the way it require multi-document transaction and long process so failing will cause inconsistency in my db because no support for multi-document transactions in NoSql so how can i handle this ?
I am not trying to show cons of NoSql against RDBMS(Sql) systems. I am just trying to understand my thinking is correct about this scenarious or not, there can be something that i missed or the things look bad is not bad as i thought. I need scalability so that is why i am interested in NoSql solutions.
At first, NoSQL is just a buzzword that covers a lot of different database types like key-value stores, document stores, graph databases, ... See http://nosql-database.org/ for a list of different types and implementations. Some of these systems also have transactional guarantees, e.g. for your case that a Post is written completely into the database.
I will now focus on key-value stores as they seem to be a very prominent NoSQL instance.
Regarding your first question: You are right, one would not use a strict relationship like a foreign key in a RDBMS, but you would just keep a list of tags associated with a post instance:
For querying by tag you have a so called Inverted Index ( http://en.wikipedia.org/wiki/Inverted_index ) that gives you all document ids for one tag:
This makes it pretty easy to do a post count.
Updating tag names is actually not that complex, if you have a map-reduce based access to your data, then you can e.g. update the tag 'Sql' to 'SQL' with a simple job (pseudo code):
But I don't think, that renaming tags is a common thing to do. The problem with long process times and inconsistency is stated by Brewer in the CAP-theorem ( http://en.wikipedia.org/wiki/CAP_theorem ) which basically says that you can not have consistency, availability and partition tolerance at the same time, and you have to trade of at least one for the other two. In your case: If you want to have a consistent update of a tag (such that no two documents can be read where one has the tag 'Sql' and the other one has already 'SQL), you had to lock the table for other readers, and therefore you would not have availability.
Final thought: If you want to build a highly available, good scaling platform, you don't want to think too much in a relational way.