Using Flyway has totally transformed how I think about and deal with the lifecycle of databases I manage. It has opened the doors to automation, better auditing, far less human error, and streamlined my day to day work.
This post goes over some lessons learned during my experience with using Flyway over the years and assumes a basic familiarity with the tool. So if you haven’t read the Get Started on Flyway yet, start there.
This will focus on SQL Server usage, but most of it is applicable to any database.
There are a ton of options for source control these days and most of the time we end up inheriting whatever has been decreed by organizational standards. I’m going to focus on Git(Hub) though since it is extremely popular these days, easy to use, and fits well into a workflow based around Flyway. Most, if not all, of the below can also apply to Atlassian’s BitBucket or Gitlab with minor changes.
This article will only cover source control as far as database changes are concerned. Version controlling the actual database is highly dependent on the RDBMS being used and not in the scope of this.
If you are also pursuing source control for the database itself, I would recommend using separate repositories as I have found that Git tends to get very messy with so many files that have similar contents and names, causing merge conflicts to crop up frequently.
CODEOWNERS file to assign all relevant file types to be owned by the DBA/reviewer team and make sure that we can view diffs properly:
This will enable all pull requests coming in from developers to be automatically assigned to the DBA team for review.
This is probably going to make some hardcore Git fans cry because to achieve a decent workflow without relying on third party tools to do all the heavy lifting we need to do some wacky stuff with branches that you’d normally not want to do static code.
In smaller shops, it may make sense to have a separate repository for each instance or even each database. At a larger scale with more complexity, it probably makes more sense to address things based on application associations. A lot of the following will depend on preference and what is available in your organization, but should provide a high level roadmap.
We can start by putting all of our changes into a DB change repository and create branches specific for each application. These could also be in dedicated app repositories as well depending on your setup.
We’ll need a protected branch for each environment that is going to be managed through a source controlled, Flyway release coordination. Lower environments may not be good candidates due to organization ownership or the “throwaway” nature of environments that are often reset.
Branches might then look like:
As well as any other feature, bug, or hotfix branches that will come and go over time.
When a branch containing new changes is created, it should be branched off of the lowest environment you’re including in your process. In this case we’d branch off of QA to make sure we have the latest set of verified changes. A developer can apply these to their local copy to ensure they are up to date. If a shared development environment is in use, it should already be up to date or refreshed regularly to align it with the changes in QA.
Moving forwards, the commit from the merge can be used to pull it back out into a clean branch to guarantee no other commits have been made. This new branch is then merged again and again via pull requests into each higher branch until it reaches master. After this, the merge-commit version of the branch can be deleted as it has completed its lifecycle.
Based on the above, we have a branching strategy and know what traits our change scripts should have, but how do we organize them in a branch?
\---MyDatabase | .gitattributes | CODEOWNERS | flyway.conf | README.md | +---Migrations | V201709190907__Fixing_Timey_Wimey_Conversions.sql | \---Rollbacks RB201709190907__Fixing_Timey_Wimey_Conversions.sql
This allows the flexibility of multiple databases or a single database contained in the application’s branch.
Now if we place the Flyway application folder at the same level as our repository folder:
+---RepositoryName | \---MyDatabase | \---flyway-4.2.0
We can configure the
flyway.locations value appropriately based on its relative location:
While ideally a database change can be fully automated, there are often scenarios where scripts need to be execution in a coordinated sequence with validation by other teams. In high stakes environments there may also be temporary backups created of objects that require a manual cleanup once the release is confirmed a success far after the actual SQL deployment.
The pull request description then becomes an ideal place to catalog special deployment steps, exceptions, and gotcha’s that may be associated with this release. The steps may vary by environment, so each pull request may have different information. This way we can consolidate the proposed changes and steps in a single place that is external to any ticketing system that may be used in a wider, more generic sense within an organization.
After the branch’s changes have been applied, merge them in.
If a release only has a single script targeting a single database, things are pretty easy. But what about when there are large scale coordinated releases that involve many versions and/or change scripts?
If you have multiple versions targeted for a single release, but that release requires staggered deployment of scripts (i.e. execute script 1, wait for verification, execute script 2, take a backup, execute script 3) then these should be managed as separate files within separate branches to manage the deployment flow. While I highly discourage this type of release coordination, it is often unavoidable for many large scale systems.
Treat each section of code that can be deployed autonomously as a separate release altogether and simply repeat the release process when it is time to deploy each subsequent portion of changes.
Using multiple files (and thus versions) for a single release can be useful for reducing the volume of a single file or creating a logical separation for different types of changes. Since version numbers in this context aren’t meaningful in a traditional way, we can utilize them more flexibly.
If this is the case, it should be kept in mind that scripts should be able to exist and be released independently without creating a broken state in the database if your database does not support DDL transactions. If your database supports DDL transactions or you are not performing any DDL for a given set of migrations, then this section is only relevant in the scenario where Flyway fails to rollback your group and you need a failsafe plan.
If you have five scripts for a release and Flyway fails to migrate the third, then the application should still be able to function with only the first two applied. Doing otherwise creates a more complicated and risky rollback scenario wherein you need to:
If the first two versions can exist independently, however:
This reduces the time, risk, and manual steps needed to fix a production environment when you cannot afford to increase odds of mistakes being made. You’re doing fewer rollbacks, fewer releases, and putting yourself in the best situation possible given the circumstances.
It is not hard to imagine how error prone and difficult the first process becomes if there were to be issues with multiple scripts in your release - you’d be rolling back and re-applying everything several times over. In the second, however, with each subsequent failure your fix process grows smaller and faster as you lock down each successful migration.
This may create the need for larger than desired change scripts in order to create scripts that contain non-breaking sets of changes, but the payoff is certainly worth it.
While Flyway excels at being useful primarily due to its simple approach to database change control, there are still some configuration settings that can make life a lot easier. Many of these are not unique to a particular RDBMS product.
Because we now have developers versioning files based on when they save the file, the idea of sequential versions is totally out the window. Just because Developer A finished their work before developer B doesn’t mean that Script A can be released before Script B. So we need to change our configuration to reflect that:
This setting allows any valid, pending migration script that is detected to be migrated when Flyway is run.
Now we don’t need to worry about version collisions and the database doesn’t need to care about the sequential version numbers. But what we still need to account for is how to make sure Script A doesn’t make it onto the master branch at the same time as Script B, wherein Flyway will not know that B needs to be released before A….
According to Flyway documentation, the group parameter tells Flyway
Whether to group all pending migrations together in the same transaction \ when applying them (only recommended for databases with support for DDL transactions)
This feature is a no brainer if and when you ever need to release multiple versions in the same release:
Rollback scenarios become very straightforward with this feature and you almost don’t need to worry about manual rollbacks. But as a responsible DBA you’d never actually do a production release without a rollback script, right?
While a totally optional feature, callbacks can greatly increase the value proposition of Flyway.
For example, Panko is a lightweight set of callbacks to allow an object level diff to be associated with each deployment. This can be useful for sanity checks, creating an automated changelog, and adding an additional level of detail to the built in schema log table.
When it comes to SQL Server, Microsoft has significantly increased their commitment to the JDBC driver working with this driver is easier than the jTDS driver included by default with Flyway. I’ve had trouble getting Windows Authentication to work with jTDS also, so ODBC is a no brainer.
Otherwise, the standard jTDS drivers are usually sufficient.
There may be some configuration required on the host where Flyway is being executed from to properly take advantage of Windows Authentication for SQL Server.
Download the latest Microsoft JDBC Driver.
Add the appropriate Auth DLL file for the machine’s architecture to a valid
JAVA_PATH folder (
echo %PATH% to see a list of them on the machine).
Add the JDBC jar file to the
/driver folder of Flyway.
Modify the configuration file at `/config/flyway.conf’ to use the JDBC driver, not the jTDS that is default, and point it to a known server that you have Windows Authentication access to:
Uncomment and set the user and password fields to be blank:
Test your connection.
If using traditional version numbering is preferable, Flyway has a great blog post on using git hooks to help prevent developers from committed conflicting version numbers.
Jeremy Jarrel saved me a lot of time when I first started using Flyway by writing up a great post which recommends using timestamps instead of sequential integers for versions in file names. In the real world we’re not working in a vacuum, and neither are our developers, so using a timestamp greatly reduces the odds of two people creating the same version number for a release. People tend to care about application versions, much less so whether the database is at 3.4.56 or not.
While slightly bothersome when saving, I’ve never had issues when using a the following timestamp format with Flyway’s default naming convention:
VMMDDYYHHMM__MyScriptName.sql --> V0928171048__MyScriptName.sql
This approach, however, also requires out of order migrations.
This is another best practice that Jeremy’s blog post saved me a lot of headache with. It seems like a no-brainer, but it isn’t until you try to re-run a failed migration that didn’t fully roll back (like a DDL statement) and realize your script won’t run at all now that you kick yourself for not doing this sooner. This is not an easy one to cajole developers into doing consistently, but it is worth it.
Despite Flyway’s stance on rollbacks being that there shouldn’t be any, the real world tends to necessitate them. Whether it is a regulatory requirement or just as a peace of mind, they often need to exist. And though Flyway is ideally handling most of this process elegantly so that manual rollbacks are a thing of the past, one can never be too safe. There are two pain points that rollbacks have caused in my experience:
Because of these, I wholeheartedly believe there should always be a rollback available. You always need a way to recover and merely adding more scripts on top of the original is not always a viable option. Writing scripts with proper checks and transactions can certainly help mitigate this, but no one is perfect and no deployment is ever a guaranteed success, nor is it guaranteed your release tool is going to work without error.
I recommend keeping a separate file with the same version and name but with a rollback-specific prefix:
In addition to un-doing all of the changes of the primary release script, this script should contain a
DELETE statement to clean up the
schema_version table as well so that it properly reflects the current state of the database. Since the version number is in the file, it is easy to script a statement based on the filename to remove itself.
This eliminates the need for manual grooming of Flyways schema table, a step which is easily forgotten amidst a production crisis. It also streamlines the rollback process which may need to happen many times during early development phases. After a rollback, developers just need to merge in updated versions of the rollback and migration scripts before running the normal migration process again.