Friday, March 16, 2018

Discovering rows that have been updated since last checkpoint

One important problem that requires a solution is to decide whether
a row has been updated since the last checkpoint or not.

Most implementations use some kind of mechanism that requires extra
memory resources and/or CPU resources to handle this.

NDB uses the fact that each row is already stamped with a timestamp.
The timestamp is what we call a global checkpoint id. A new global
checkpoint is created about once every 2 seconds (can be faster or
slower by configuration).

Thus we will overestimate the number of rows written since last checkpoint
with a little bit, but with checkpoints taking a few minutes, the extra overhead
of this is only around 1%.

Thus when we scan rows we check the global checkpoint id of the row, if
it is bigger than the global checkpoint that the last checkpoint had fully
covered we will write the row as changed since last checkpoint. Actually
we also have the same information on the page level, thus we can check
the page header and very quickly scan past an entire page if it hasn't been
updated since last checkpoint.

The same type of scanning is used also to bring a restarting node up to
synch with the live node. This algorithm has been present in NDB since
MySQL 5.1.

No comments: