Mnesia Fragmentation and replication: resultant availability and reliability

1.6k views Asked by At

Following the solutions to the question i asked recently about mnesia fragmentation, I still have a number of challenges. Consider the following scenario (The question I am asking is based on what follows below):

You have a data driven enterprise application which should be highly available
within the enterprise. If the internal information source is down for any reason,
the enterprise applications must switch to fetch data from a recovery center
which is offsite (remote).

You decide to have the database replicated onto Two Nodes within the enterprise
(refered to as DB side A and DB side B). These two are running on separate
Hardware but linked together with say, a Fast Ethernet or Optical Fibre link.
Logically, you create some kind of tunnel or secure communications between these
two Mnesia DBs. The two (A and B) should have the same replica of data and are
in sync all the time.

Now, meanwhile, the recovery center must too, have the same copy of data and in
sync all the time just in case the local data access is cutoff due to an attack
or hardware failure. So the same Database schema must be replicated across the 3
sites (Side A , Side B and recovery center).

Now, within the enterprise, the application middle ware is capable of switching data requests amongst the database sites. If A is down, then without the application realizing it, the request is re-routed to Database B and so on. The middle ware layer can be configured to do load balancing (request multiplexing) or to do be flexible with fail over techniques.

Further Analysis:

At Database/Schema creation time, all involved Nodes must be up and running 
Mnesia. To achieve this, you create say: '[email protected]',
'[email protected]' and finally, '[email protected]'

Now, at Table creation, you would want to have your mnesia tables fragmented. So you decide on the following parameters:

n_disc_only_copies =:= number of nodes involved in the pool =:= 3 
Reason: You are following the documentation that this parameter regulates how 
many disc_only_copies replicas that each fragment should have.
So you want each table to have each of its fragments on each mnesia Node.
node_pool =:= all nodes involved =:= ['[email protected]',
'[email protected]',
'[email protected]']
All your tables are then created based on the following arrangement
Nodes = [
                '[email protected]',
                '[email protected]',
                '[email protected]'
            ],
    No_of_fragments = 16,
    {atomic,ok} = mnesia:create_table(TABLE_NAME,[
                    {frag_properties,[
                        {node_pool,Nodes},
                        {n_fragments,No_of_fragments},
                        {n_disc_only_copies,length(Nodes)}]
                    },
                    {index,[]},
                    {attributes,record_info(fields,RECORD_NAME_HERE)}]
                ),
NOTE: In the syntax above, RECORD_NAME_HERE cannot be a variable in reality since records must be known at compile time with Erlang. From the installation, you see that for each table, every fragment, say, table_name_frag2, appears on every Node's file system.

Challenges and arising Questions:
After following what is listed down above, your first database start is okay since mnesia is running on all nodes. Several challenges start to show up as the application runs and am listing the below:

  1. Supposing you decide that all writes are first tried on DB Side A and if side A at that instant is unavailable, the call is re-tried on DB Side B and so on to recovery center, and if the call fails to return on all the 3 database nodes, then the application network middle ware layer reports back that the database servers are all unavailable (this decision could have been influenced by the fact that if you let applications randomly write to your mnesia replicas, its very possible to have inconsistent database errors showing up in case your mnesia nodes lose a network connection with each other yet writes are being committed on each by different Erlang applications. If you decide on having master_nodes, then you could be at risk of losing data). So by behavior, you are forcing DB Side A to be the master. This makes the other Database Nodes Idle for all the time as long as DB Side A is up and running and so as many requests as hit side A and it does not go down, No request will hit side B and recovery center at all.

  2. Mnesia on start, normally, should see all involved nodes running (mnesia must be running on all involved nodes) so that it can do its negotiations and consistency checks. It means that if mnesia goes down on all nodes, mnesia must be started on all nodes before it can fully initialize and load tables. Its even worse if the Erlang VM dies along with Mnesia on a remote site. Well, several tweaks and scripts here and there could help restart the entire VM plus the intended applications if it goes down.

To cut a long story short, let me go to the questions.

Questions:

  1. What would a Database administrator do if mnesia generates events of inconsistent_database, starting to run database behind a partitioned network, in a situation where setting a mnesia master node is not desirable (for fear of data loss)?

  2. What is the consequence of the mnesia event inconsistent_database, starting to run database behind a partitioned network as regards my application? What if I do not react to this event and let things continue the way they are? Am I losing data?

  3. In large mnesia clusters, what can one do if Mnesia goes down together with the Erlang VM on a remote site? Are there any known good methods of automatically handling this situation?

  4. There times when one or two nodes are unreachable due to network problems or failures, and mnesia on the surviving Node reports that a given file does not exist especially in cases where you have indexes. So at run time, what would be the behavior of my application if some replicas go down? Would you advise me to have a master node within a mnesia cluster?

As you answer the questions above, you could also highlight on the layout described at the beginning, whether or not it would ensure availability. You can give your personal experiences on working with mnesia fragmented and replicated databases in production. In reference to the linked (quoted) question at the very beginning of this text, do provide alternative settings that could offer more reliability at database creation, say in terms of the number of fragments, operating system dependencies, node pool size, table copy types, etc.

0

There are 0 answers