Testing MariaDB/Mysql backups restoration in Kubernetes

MySQL logo

What is it about?

Shortly - how to test Mysql/Mariadb backups in an automated manner in Kubernetes? This article briefly describes, how to do it.

Why?

There're still many people, who hosts own databases servers. Even using K8S (yeah, I know - this might be an antipattern, but not always. I's an academic discussion imo).

So, having your own DB server, you need to take care of taking backups. And recovering data, when there's a need. And you can only tell, that you've got backups covered only (and really - ONLY), when those backups are tested.

Testing backup file - the procedure

  1. Fetch the backup file from remote location
  2. Import it to the new/vanilla Mysql/MariaDB server
  3. Run integrity tests (e.g. CHECK TABLE) on each table, that was recovered

Putting it into Kubernetes

The K8S object would be Cronjob of course. I thought about it a bit, and came to conclusion, that the following assumptions should be correct for this workload:

  1. Start Mysql/MariaDB container from an upstream image (e.g. MariaDB image)
  2. Create another container (using custom image), that:
    1. Fetches the backup image from remote
    2. Unarchives this backup archive
    3. Imports the data into Mysql/MariaDB server container runing in the same pod
    4. Runs CHECK TABLE SQL commands to verify data integrity
    5. Send SHUTDOWN command to Mysql/MariaDB container, so this container exits with 0 - exit code (SUCCESS)
  3. If in any point of this procedure anything fails - the whole Cronjob fails. And Kubernetes monitoring (e.g. Prometheus) may easily alert on that.

And that's it. This way the test environment is always fresh, no data is left, disk space is recovered and that should be enough.

Show me the code!

Ok, ok - BUT! Don't judge. I wrote it just for myself, for my internal use => it's not a pretty code. But it's enough for the job. You may rewrite it in any language of your choice - that's obvious.

  1. The container which run tests:
    1. Containerfile which defines the "test" container image
    2. gsutil.repo used by this container
    3. entrypoint.sh used by this container
  2. The Cronjob resource

What's missing here?

My experience tells me, that there's one more thing needed to be checked in terms of database backups. The size of the backup. Sometimes it happens, that data volume shrinks for some reason. Often those kinds of situations should be verified. Above article doesn't really cover this matter. That's mainly because imo this kind of test is hard to be done by this kind of Cronjob in a situation, when Prometheus is used for monitoring.

Prometheus is pull-based, meaning, there needs to be some service, which provides metrics to it. Described above Cronjob doesn't really provide an API serving any data to Prometheus.

There are several ways to solve this kind of problem. But hey - I'm leaving this to you, as it heavily depends on your monitoring setup.

Comments