Friday, 13 December 2013

Why Statement Based Replication Diverges?

MySQL Statement-Based Replication Services has plenty of opportunities to diverge. I will attempt to list the most common here, in no particular order.

Changes Applied Directly to a Replica

If you write to the Replica, it’s data will diverge from the Master’s data. Yes, I know, it sounds stupid. Depending on how your developers are accessing the database, though, it might be quite difficult to decide if a given SQL statement is being pushed to the Master or to the Replicas by just looking at the code. Take for example a Java Framework-based Data Access Layer relying on Transaction Annotations to tell if a given transaction is read-only or read-write. Transactions can span multiple methods, sometimes located far from each other in the code. If all information you have is the code non-trivial to tell when a transaction is read-only or read-write, and sometimes misplaced Annotations make it to production, damaging whole database clusters at one, many times not in immediately noticeable ways.

Misconfiguration at Replication Setup

Setting up Replication (at least in MySQL 5.1 and 5.5 series) is a manual process and can be quite error-prone depending on the DBA or Operations Engineer experience and how reliable is the source of the information you're using to determine what should be the next transaction for a given MySQL Replica. Missing a digit on the SQL statement that sets up Replication is enough to make a Database Replica diverge by a considerable amount of transactions.

Incorrect Transaction Serialisation

MySQL is multi-threaded. This means that at any given point-in-time there might be as many transactions changing the database as there are Processor Cores available on the Master Host. In order to enable the replication implementation to work, MySQL serialises the transactions in a file (the Binary Log File). This is done automatically, by the Database Server, on-the-fly. Deciding the order in which to write transactions to the Binary Log File is a non-trivial task, specially in very busy database servers. I won't be surprised if MySQL sometimes gets the serialisation order wrong, causing the Replicas to diverge from the master on a given statement, specially for statement-based replication set-ups. If you are using some sort of high-availability supervisor and maintain the database cluster by failing over form one master to another (I do), or if you add with fresh clones from arbitrarily chosen Database Replicas, constantly shuffling the cracks in the data, this might even make this kind of divergences harder to spot or track.

Database Replica Crash

Unfortunately, every now and again a Database Server Crashes, for several different reasons. Depending on the workload and local instance configuration, it is possible that up to a second of transactions might be lost after a crash. It is standard practice to lower ACID compliant restrictions of a Database Replicas to allow it to catch up to the master faster in special, controlled situations (e.g., right after adding a local index). This technique is specially useful on write-intensive workloads where you're reaching the limits of your hardware, where the disproportional number of writes makes it really though for a Replica Database to catch up with the Master during peak hours.

Non-Deterministic SQL Statements

SQL is a Fourth-Generation Declarative Language. This implies certain SQL statement interpretation liberties to both the Database Server and the Developer. This leads to a lot of apparently harmless mistakes, like attempting to throttle the rate at which certain changes happen with “LIMIT” clauses, for example by writing statements like the ones below:

UPDATE some_table SET a_column = X WHERE other_col = “some-value” LIMIT 10;

DELETE FROM another_table WHERE some_column = Y LIMIT 100;

Both statements select rows to be changed upfront, to satisfy their respective LIMIT clauses. As the ANSI-SQL (and most implementations, including MySQL) don't place any guarantees on result set order (or if the rows to be chosen come from the beginning or the end of the result set), it's quite possible that the rows selected to be changed on the master are different from the rows selected by each and every Database Replica. In MySQL case, this will be dictated mostly by the presence of rows matching the conditions in memory, which in turn is a function of the queries answered by that specific MySQL instance which in turn is a round-robin-balanced share of the traffic that has been hitting this system for the last few hours… to put it short, this is probably a good Random Number Generator Implementation, with very high entropy.

Unnecessary to say, if the same valid SQL statement applied to the Master and to the Replicas has the ability to change different sets of rows in both servers, we have to account for the fact that Statement-Based Replication as implemented by MySQL will diverge. It's not a matter of "if", it's just a matter of "when".


  1. You can mitigate the first category by putting 'read-only' in my.cnf on the replicas.

    1. This only works for users without the "SUPER" privilege. Users with the "SUPER" privilege ignore the read-only flag and are able to write even when the flag is on.