Validating Flyway schema definitions with GitLab CI/CD

In a previous post I discussed four reasons why database schema changes need to be automated. Automation allows to validate schema changes in a CI pipeline. The earlier defects are detected, the easier and less costly they are to resolve. This is referred to as the Shift Left approach. Recently, I had to setup database automation for a project using Flyway and GitLab. This post deep-dives on how to setup a database schema definition using Flyway and validate it using GitLab CI/CD.

What is Flyway
Basic schema validation
Validate workflow issues
Only validate when needed
Bonus: validate schema definition locally
Conclusion

What is Flyway?

Flyway is a tool to manage the lifecycle of database schemas. It supports a wide variety of database engines via JDBC-drivers. The schema definition is kept in a directory with SQL migration files. The file naming convention includes a semantic version number. Based on these version numbers Flyway transitions (aka “migrates”) a database to a new state. Flyway ensures all migrations are executed in order of semantic version number and exactly-once. In case an error occurs it immediately terminates the migration process. A schema history table is maintained to hold the history of applied migrations. This table contains all migration files, their checksum and the time at which they it has been executed. This can be useful information when tracking down production incidents.

Below is an example of a simple database definition using Flyway. Schema version 1 defines an example table. Version 2 adds the col column to the example table.

1
2
3
4
5
6
7
8
$ tree
.
├── flyway.conf
└── migrations
    ├── V1__create_example_table.sql
    └── V2__add_column_to_example.sql

1 directory, 3 files

1
2
$ cat flyway.conf
flyway.locations=filesystem:./migrations

1
2
3
4
5
$ cat migrations/V1__create_example_table.sql
CREATE TABLE `example` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

1
2
$ cat migrations/V2__add_column_to_example.sql
ALTER TABLE `example` ADD COLUMN `col` VARCHAR(255);

A schema definition can be applied on an empty database by running flyway migrate. Given the example above, the version of the schema will be 2 and contain the example table with the id and col columns.

1
2
3
4
5
6
7
8
9
$ flyway migrate
Flyway Community Edition 5.2.4 by Boxfuse
Database: jdbc:mysql://database:3306 (MySQL 8.0)
Successfully validated 2 migrations (execution time 00:00.038s)
Creating Schema History table: `example`.`flyway_schema_history`
Current version of schema `example`: << Empty Schema >>
Migrating schema `example` to version 1 - create example table
Migrating schema `example` to version 2 - add column to example
Successfully applied 2 migrations to schema `example` (execution time 00:00.283s)

Basic schema validation

Let’s first talk about an assumption: the development workflow. All of the explanations and examples assume an trunk based development approach with merge requests. The implication is that only a single master branch is used. Developers create a short living feature branch followed by a merge request to get changes merged into master. The software in the master branch is considered able to be successfully pushed through a delivery pipeline up until production. Automated validation and (human) review of merge requests should prevent errors from happening in live environments.

What would a basic validation of a schema need to cover? At least it needs to capture syntax errors in migration files. Next to that, all sql statements in the migration need to be checked for compatibility with the RDBMS version, like MySQL 8.0. Last but not least, each new migration file needs to be validated on compatibility with the previous schema version.

Performing this validation using static code analysis tools is near impossible. Tools always run behind the features of the RDBMS. Edge cases of database features or sql dialects are often not included. Applying the full schema definition for each (update to a) merge request to a running database provides the most thorough validation. It will naturally be up-to-date with the versions of databases in live environments. Developers also don’t need to be limited in the use of sql or database features due to lacking tools. It is important to ensure that each validation is performed against a fully empty database in an isolated environment. This prevents false positives due to leftovers of a previous validation run.

GitLab CI/CD allows running pipelines when merge requests are created or updated. These pipelines are defined in a .gitlab-ci.yml file in the repository and can include multiple jobs. Each job runs in a configurable docker container to which the git repository is available. Next to that, a service can be started which is exclusively available for that job. This allows to run flyway from the official Flyway container maintainer against a MySQL database running as a service. Cool.

A very simple but working example .gitlab-ci.yml file is included below. It will start a database server as a service and run flyway migrate to apply the database definition to this clean and empty database. GitLab will run this job for every change in the repository. However, more about that later!

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
test:migrations:
  image:
    name: boxfuse/flyway:5.2-alpine
    entrypoint: [""]
  before_script:
    - echo "flyway.url=jdbc:mysql://database:3306?useSSL=false" >> ${HOME}/flyway.conf
    - echo "flyway.user=root" >> ${HOME}/flyway.conf
    - echo "flyway.password=" >> ${HOME}/flyway.conf
    - echo "flyway.schemas=${MYSQL_DATABASE}" >> ${HOME}/flyway.conf
  script: flyway migrate
  services:
    - name: mysql:8.0
      alias: database
  variables:
    MYSQL_DATABASE: "example"
    MYSQL_ALLOW_EMPTY_PASSWORD: "true"

A few notes on this example:

The image name defines the version of the flyway container image to use, in this case boxfuse/flyway:5.2-alpine. The entrypoint must be defined as an array with a single empty string. This allows GitLab CI to run all scripts in a shell in the container.
The before_scripts creates a flyway.conf configuration file with database details.
The MySQL database server is started from the mysql:8.0 container image. The alias becomes the hostname at which the server is reachable.
The script applies the full schema definition. When the flyway command exists normally the job succeeds with a green check mark.

To illustrate this basic validation I created two merge requests. One with a good migration and one with an erroneous migration. The merge requests makes it perfectly clear that an error occurred while running the pipeline.


Merge request with successful pipeline	Merge request with failing pipeline

The logs of the jobs provide all the details both the developer and the reviewer of the merge request need to assess the results. The log of the job with the good migration provides details about the amount of migrations applied and the final version of the schema:

1
2
3
Migrating schema `example` to version 3 - create a demo table
Successfully applied 3 migrations to schema `example` (execution time 00:00.215s)
Job succeeded

Whereas the log of the job with the erroneous migration shows a clear error message:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
Migrating schema `example` to version 3 - new erroneous validation
WARNING: DB: A table must have at least 1 column (SQL State: 42000 - Error Code: 1113)
ERROR: Migration of schema `example` to version 3 - new erroneous validation failed! Please restore backups and roll back database and code!
ERROR: 
Migration V3__new_erroneous_validation.sql failed
-------------------------------------------------
SQL State  : 42000
Error Code : 1113
Message    : A table must have at least 1 column
Location   : ./migrations/V3__new_erroneous_validation.sql (/builds/jappievw/validating-flyway-migrations-with-gitlab/./migrations/V3__new_erroneous_validation.sql)
Line       : 1
Statement  : CREATE TABLE `syntax-error`

ERROR: Job failed: exit code 1

Validate workflow issues

Now that we have basic validation of schema definitions in place we can look at some validations to ensure the developer did adhere to the right workflow.

The first workflow issue occurs with a merge request which is not rebased with the master branch. Suppose a developer started a feature branch based on the schema definition with version 2. In this feature a new column needs to be added to the schema definition. The developer creates a new migration with version 3 to handle this. When the feature is finished the developer creates his merge request. However, in the meantime another developer already got a merge request accepted which also included a migration with version 3. This is assumed to either already be applied to the production database or on its way to be applied. The validation of the merge request with the new feature should fail as it is based on an outdated schema definition. The developer should rebase his branch, ensure his schema definition is valid again and update his merge request. The schema version for this feature will become 4.

Another error can be caused by a developer who makes a change to an existing migration file already merged into master. As this can cause drift in the definition it should be validated for.

Validating these workflow issues in a merge request requires two steps. The first step applies the schema definition from the master branch. This will ensure that the validation database is in the same state as the live environments. After that the schema definition of the merge request should be applied. Running flyway migrate will fail when there are duplicate version numbers. It will also fail when the checksum of a migration file does not match the checksum stored in the schema history table.

This can be accomplished with GitLab CI by extending the before_script section of the previous example with the items below. These steps clone the master branch into the ~/db-reference-branch directory and run flyway migrate from there.

1
2
3
- apk add --no-cache git
- git clone ${CI_REPOSITORY_URL} --reference ${CI_PROJECT_DIR} --branch master --single-branch ~/db-reference-branch --depth 1
- pushd ~/db-reference-branch && flyway migrate && popd

The full example merge request with a development workflow issue can be found here. The job log has concluded with the following details:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
$ git clone ${CI_REPOSITORY_URL} --reference ${CI_PROJECT_DIR} --branch master --single-branch ~/db-reference-branch --depth 1
Cloning into '/root/db-reference-branch'...
$ pushd ~/db-reference-branch && flyway migrate && popd
~/db-reference-branch /builds/jappievw/validating-flyway-migrations-with-gitlab
Flyway Community Edition 5.2.4 by Boxfuse
Database: jdbc:mysql://database:3306 (MySQL 8.0)
Successfully validated 2 migrations (execution time 00:00.021s)
Creating Schema History table: `example`.`flyway_schema_history`
Current version of schema `example`: << Empty Schema >>
Migrating schema `example` to version 1 - create example table
Migrating schema `example` to version 2 - add column to example
Successfully applied 2 migrations to schema `example` (execution time 00:00.220s)
/builds/jappievw/validating-flyway-migrations-with-gitlab
$ flyway migrate
Flyway Community Edition 5.2.4 by Boxfuse
Database: jdbc:mysql://database:3306 (MySQL 8.0)
ERROR: Validate failed: Migration checksum mismatch for migration version 2
-> Applied to database : -604576161
-> Resolved locally    : -1487664217
ERROR: Job failed: exit code 1

Only validate when needed

The full validation of a database schema definition with Flyway is in place now. As mentioned earlier the job runs for every update of the repository. Not all updates do include changes to the schema definition though. Depending on the size and complexity of the schema definition the validation can take quite a while. GitLab includes the only/except feature to only run a job under specific conditions. It is currently still in alpha, but worth the try! The full specification can be found in the GitLab docs here.

Given our workflow the job only needs to run for any change to a SQL file or the Flyway configuration. This can be implemented by adding the following lines to the GitLab job definition.

1
2
3
4
5
6
only:
  refs:
    - merge_requests
  changes:
    - flyway.conf
    - migrations/*.sql

Bonus: validate schema definition locally

The GitLab CI/CD platform uses runners to execute the jobs in isolation for each trigger of a pipeline. When using the hosted version of GitLab the jobs run on a managed and shared set of runners. However, it is also possible to run a job on your development computer before your code is pushed to the repository.

In order to use this feature ensure that Docker and the GitLab Runner are installed. Installation instructions for Docker and GitLab Runner. MacOS users can just install them via Homebrew with brew install docker and brew install gitlab-runner. Please not that the installation instructions might want you to register the GitLab runner or start it as a service. This is not needed to just run a job before your code is pushed.

Now to validate the schema definition using the test:migrations definition run gitlab-runner exec docker test:migrations. Only committed changes will be included in the job run!

Conclusion

Using the GitLab CI/CD platform to validate Flyway schema definitions can prevent a wide range of errors while applying a schema definition to a live environment. In this blog post we built an example .gitlab-ci.yml file which validates on syntax errors, database engine compatibility issues, logical migration issues, non-rebased merge requests and changes to earlier migration files. The full source code can be found in the jappievw/validating-flyway-migrations-with-gitlab GitLab repository.

Happy validating schema definitions!

Share Tweet