In a previous post I discussed four reasons why database schema changes need to be automated. Automation allows to validate schema changes in a CI pipeline. The earlier defects are detected, the easier and less costly they are to resolve. This is referred to as the Shift Left approach. Recently, I had to setup database automation for a project using Flyway and GitLab. This post deep-dives on how to setup a database schema definition using Flyway and validate it using GitLab CI/CD.
- What is Flyway
- Basic schema validation
- Validate workflow issues
- Only validate when needed
- Bonus: validate schema definition locally
- Conclusion
What is Flyway?
Flyway is a tool to manage the lifecycle of database schemas. It supports a wide variety of database engines via JDBC-drivers. The schema definition is kept in a directory with SQL migration files. The file naming convention includes a semantic version number. Based on these version numbers Flyway transitions (aka “migrates”) a database to a new state. Flyway ensures all migrations are executed in order of semantic version number and exactly-once. In case an error occurs it immediately terminates the migration process. A schema history table is maintained to hold the history of applied migrations. This table contains all migration files, their checksum and the time at which they it has been executed. This can be useful information when tracking down production incidents.
Below is an example of a simple database definition using Flyway. Schema version 1
defines an example
table. Version 2
adds the col
column to the example
table.
|
|
|
|
|
|
|
|
A schema definition can be applied on an empty database by running flyway migrate
. Given the example above, the version of the schema will be 2
and contain the example
table with the id
and col
columns.
|
|
Basic schema validation
Let’s first talk about an assumption: the development workflow. All of the explanations and examples assume an trunk based development approach with merge requests. The implication is that only a single master
branch is used. Developers create a short living feature branch followed by a merge request to get changes merged into master
. The software in the master
branch is considered able to be successfully pushed through a delivery pipeline up until production. Automated validation and (human) review of merge requests should prevent errors from happening in live environments.
What would a basic validation of a schema need to cover? At least it needs to capture syntax errors in migration files. Next to that, all sql statements in the migration need to be checked for compatibility with the RDBMS version, like MySQL 8.0. Last but not least, each new migration file needs to be validated on compatibility with the previous schema version.
Performing this validation using static code analysis tools is near impossible. Tools always run behind the features of the RDBMS. Edge cases of database features or sql dialects are often not included. Applying the full schema definition for each (update to a) merge request to a running database provides the most thorough validation. It will naturally be up-to-date with the versions of databases in live environments. Developers also don’t need to be limited in the use of sql or database features due to lacking tools. It is important to ensure that each validation is performed against a fully empty database in an isolated environment. This prevents false positives due to leftovers of a previous validation run.
GitLab CI/CD allows running pipelines when merge requests are created or updated. These pipelines are defined in a .gitlab-ci.yml
file in the repository and can include multiple jobs. Each job runs in a configurable docker container to which the git repository is available. Next to that, a service can be started which is exclusively available for that job. This allows to run flyway
from the official Flyway container maintainer against a MySQL database running as a service. Cool.
A very simple but working example .gitlab-ci.yml
file is included below. It will start a database server as a service and run flyway migrate
to apply the database definition to this clean and empty database. GitLab will run this job for every change in the repository. However, more about that later!
|
|
A few notes on this example:
- The image name defines the version of the flyway container image to use, in this case
boxfuse/flyway:5.2-alpine
. Theentrypoint
must be defined as an array with a single empty string. This allows GitLab CI to run all scripts in a shell in the container. - The
before_scripts
creates aflyway.conf
configuration file with database details. - The MySQL database server is started from the
mysql:8.0
container image. Thealias
becomes the hostname at which the server is reachable. - The
script
applies the full schema definition. When theflyway
command exists normally the job succeeds with a green check mark.
To illustrate this basic validation I created two merge requests. One with a good migration and one with an erroneous migration. The merge requests makes it perfectly clear that an error occurred while running the pipeline.
Merge request with successful pipeline | Merge request with failing pipeline |
The logs of the jobs provide all the details both the developer and the reviewer of the merge request need to assess the results. The log of the job with the good migration provides details about the amount of migrations applied and the final version of the schema:
|
|
Whereas the log of the job with the erroneous migration shows a clear error message:
|
|
Validate workflow issues
Now that we have basic validation of schema definitions in place we can look at some validations to ensure the developer did adhere to the right workflow.
The first workflow issue occurs with a merge request which is not rebased with the master
branch. Suppose a developer started a feature branch based on the schema definition with version 2
. In this feature a new column needs to be added to the schema definition. The developer creates a new migration with version 3
to handle this. When the feature is finished the developer creates his merge request. However, in the meantime another developer already got a merge request accepted which also included a migration with version 3
. This is assumed to either already be applied to the production database or on its way to be applied. The validation of the merge request with the new feature should fail as it is based on an outdated schema definition. The developer should rebase his branch, ensure his schema definition is valid again and update his merge request. The schema version for this feature will become 4
.
Another error can be caused by a developer who makes a change to an existing migration file already merged into master
. As this can cause drift in the definition it should be validated for.
Validating these workflow issues in a merge request requires two steps. The first step applies the schema definition from the master
branch. This will ensure that the validation database is in the same state as the live environments. After that the schema definition of the merge request should be applied. Running flyway migrate
will fail when there are duplicate version numbers. It will also fail when the checksum of a migration file does not match the checksum stored in the schema history table.
This can be accomplished with GitLab CI by extending the before_script
section of the previous example with the items below. These steps clone the master
branch into the ~/db-reference-branch
directory and run flyway migrate
from there.
|
|
The full example merge request with a development workflow issue can be found here. The job log has concluded with the following details:
|
|
Only validate when needed
The full validation of a database schema definition with Flyway is in place now. As mentioned earlier the job runs for every update of the repository. Not all updates do include changes to the schema definition though. Depending on the size and complexity of the schema definition the validation can take quite a while. GitLab includes the only/except
feature to only run a job under specific conditions. It is currently still in alpha, but worth the try! The full specification can be found in the GitLab docs here.
Given our workflow the job only needs to run for any change to a SQL file or the Flyway configuration. This can be implemented by adding the following lines to the GitLab job definition.
|
|
Bonus: validate schema definition locally
The GitLab CI/CD platform uses runners to execute the jobs in isolation for each trigger of a pipeline. When using the hosted version of GitLab the jobs run on a managed and shared set of runners. However, it is also possible to run a job on your development computer before your code is pushed to the repository.
In order to use this feature ensure that Docker and the GitLab Runner are installed. Installation instructions for Docker and GitLab Runner. MacOS users can just install them via Homebrew with brew install docker
and brew install gitlab-runner
. Please not that the installation instructions might want you to register the GitLab runner or start it as a service. This is not needed to just run a job before your code is pushed.
Now to validate the schema definition using the test:migrations
definition run gitlab-runner exec docker test:migrations
. Only committed changes will be included in the job run!
Conclusion
Using the GitLab CI/CD platform to validate Flyway schema definitions can prevent a wide range of errors while applying a schema definition to a live environment. In this blog post we built an example .gitlab-ci.yml
file which validates on syntax errors, database engine compatibility issues, logical migration issues, non-rebased merge requests and changes to earlier migration files. The full source code can be found in the jappievw/validating-flyway-migrations-with-gitlab GitLab repository.
Happy validating schema definitions!
comments powered by Disqus