Skip to content

Using Liquibase with Databricks Data Lakehouses

Verified on: February 29, 2024

A data lakehouse is a new, open data management architecture that combines the flexibility, cost-efficiency, and scale of data lakes with the data management and ACID transactions of data warehouses, enabling business intelligence (BI) and machine learning (ML) on all data.

The lakehouse architecture and Databricks SQL bring cloud data warehousing capabilities to your data lakes. Using familiar data structures, relations, and management tools, you can model a highly-performant, cost-effective data warehouse that runs directly on your data lake.

For more information on Databricks, see the Databricks website.

Prerequisites

Setup Liquibase

  1. Dive into Liquibase concepts with an Introduction to Liquibase.
  2. Download and install Liquibase on your machine.
  3. (optional) Enable Liquibase Pro capabilities

    To apply a Liquibase Pro key to your project, add the following property to the Liquibase properties file:

    liquibase.licenseKey: <paste key here>
    

Setup Databricks

  1. Create a Databricks account and workspace

    If you don't already have a Databricks account and workspace, follow the Databricks Getting Started instructions.

  2. Navigate to your Workspaces tab and click the Open Workspace button in the upper right of the page.

    Databricks Open Workspace

  3. Create a SQL Warehouse

    If you don't have a SQL Warehouse set up, follow the Databricks instructions on Creating a SQL Warehouse

  4. Create a catalog

    If you don't already have a catalog setup, follow the Databricks instructions on Create and Manage Catalogs

  5. Click the SQL Editor option in the left navigation, enter your SQL to create your database (also called a schema), and click the Run button

    CREATE DATABASE IF NOT EXISTS <catalog_name>.<database_name>;

    Databricks Create Database

  6. Your database is configured and ready to use.

Install drivers

All users

To use Databricks with Liquibase, you need to install two additional JAR file.

  1. Download the jar files

    • Download the Databricks JDBC driver (DatabricksJDBC42-<version>.zip) from driver download site and unzip the folder to locate the DatabricksJDBC42.jar file.
    • Download the Liquibase Databricks extension (liquibase-databricks-<version>.jar) from the GitHub Assets listed at the end of the release notes.
  2. Place your JAR file(s) in the <liquibase_install_dir>/lib directory.

    • DatabricksJDBC42.jar
    • liquibase-databricks-<version>.jar

    Note

    If you are running your project on MacOS or Linux, you might need to run the following command in your terminal (you can add it to your bash profile as well) to allow the dependencies to work properly:

    export JAVA_OPTS=--add-opens=java.base/java.nio=ALL-UNNAMED

Maven users (additional step)

If you use Maven, note that this database does not provide its driver JAR on a public Maven repository, so you must install a local copy and add it as a dependency to your pom.xml file.

<dependency>
    <groupId>com.databricks</groupId>
    <artifactId>databricks-jdbc</artifactId>
    <version>[2.6.36,)</version>
</dependency>
<dependency>
    <groupId>org.liquibase.ext</groupId>
    <artifactId>liquibase-databricks</artifactId>
    <version>[1.1.3,)</version>
</dependency>

Verify installation

Run the following command to confirm you have successfully installed everything:

liquibase --version

Review the libaries listing output for the two newly installed jar files: DatabricksJDBC42-<version>.zip and liquibase-databricks-<version>.jar.

Databricks Install Verification

Database connection

Configure connection

  1. Specify the database JDBC URL in the liquibase.properties file (defaults file), along with other properties you want to set a default value for. Liquibase does not parse the URL.

    liquibase.command.url: jdbc:databricks://<your_workspace_host_name>:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/warehouses/<your_warehouse_id>;ConnCatalog=<your_catalog>;ConnSchema=<your_schema>;
    

    Note

    Your base JDBC connection string can be found on the SQL Warehouses -> your_warehouse -> Connection details tab.

    Note

    Additional information on specifying the Databricks JDBC connection can be found in the Databricks JDBC Driver documentation.

  2. Specify your username and password in the liquibase.properties file (defaults file)

    1. The username, in our case is just “token” for the User or Service Principal you want to manage Liquibase.
    # Enter the username for your Target database.
    liquibase.command.username: token
    
    1. This is the token for the User or Service Principal we want to authenticate. This is usually passed in dynamically using frameworks like GitActions + Secrets.
    # Enter the password for your Target database.
    liquibase.command.password: <your_token_here>
    

    Tip

    To find or setup your Databricks user token:

    1. Log into your Databricks workspace.
    2. Access the User Settings. Click on your profile at the bottom left corner of the workspace, then select "User Settings" from the menu.
    3. Navigate to the Access Tokens tab. In the User Settings window, you will find a tab for "Access Tokens."
    4. Generate a New Token. If you haven't already created a token, you can generate a new one by clicking on the "Generate New Token" button. You'll be asked to provide a description for the token and, optionally, set an expiration time for it.
    5. Copy the Token. Once the token is generated, make sure to copy and save it securely. This token will not be shown again, and you'll need it to establish connections to your Databricks SQL Warehouse.

Test connection

  1. Create a text file called changelog (.xml, .sql, .json, or .yaml) in your project directory and add a changeset.

    If you already created a changelog using the init project command, you can use that instead of creating a new file. When adding onto an existing changelog, be sure to only add the changeset and to not duplicate the changelog header.

    --liquibase formatted sql
    
    --changeset my_name:1
    CREATE TABLE test_table 
    (
      test_id INT, 
      test_column INT, 
      PRIMARY KEY (test_id)
    )
    

    <?xml version="1.0" encoding="UTF-8"?>
    <databaseChangeLog
      xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xmlns:ext="http://www.liquibase.org/xml/ns/dbchangelog-ext"
      xmlns:pro="http://www.liquibase.org/xml/ns/pro"
      xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
        http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-latest.xsd
        http://www.liquibase.org/xml/ns/dbchangelog-ext http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-ext.xsd
        http://www.liquibase.org/xml/ns/pro http://www.liquibase.org/xml/ns/pro/liquibase-pro-latest.xsd">
    
      <changeSet id="1" author="my_name">
        <createTable tableName="test_table">
          <column name="test_id" type="int">
            <constraints primaryKey="true"/>
          </column>
          <column name="test_column" type="INT"/>
        </createTable>
      </changeSet>
    
    </databaseChangeLog>
    

    databaseChangeLog:
      - changeSet:
        id: 1
        author: my_name
        changes:
        - createTable:
          tableName: test_table
          columns:
          - column:
            name: test_column
              type: INT
              constraints:
                primaryKey:  true
                nullable:  false
    

    {
      "databaseChangeLog": [
        {
          "changeSet": {
            "id": "1",
            "author": "my_name",
            "changes": [
              {
                "createTable": {
                  "tableName": "test_table",
                  "columns": [
                    {
                      "column": {
                        "name": "test_column",
                        "type": "INT",
                        "constraints": {
                          "primaryKey": true,
                          "nullable": false
                        }
                      }
                    }
                  ]
                }
              }
            ]
          }
        }
      ]
    }
    

  2. Navigate to your project folder in the CLI and run the Liquibase status command to see whether the connection is successful:

    liquibase status --changelog-file=<changelog.xml>
    

    If your connection is successful, you'll see a message like this:

    1 changeset has not been applied to <your_jdbc_url>
    Liquibase command 'status' was executed successfully.
    

    Tip

    If you see this error message:

    Connection could not be created to jdbc:databricks://...; with driver 
    com.databricks.client.jdbc.Driver.  
    
    [Databricks][JDBCDriver](500593) Communication link failure. Failed to connect to server. 
    Reason: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: 
    PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: 
    unable to find valid certification path to requested target.
    

    This is the likely problem:

    This issue was seen with Java 1.8. The SSL certificate is not available in that version of Java.

    This is the suggested resolution:

    Upgrade Java to a more recent version.

  3. Inspect the SQL with the update-sql command. Then, make changes to your database with the update command.

    liquibase update-sql --changelog-file=<changelog.xml>
    liquibase update --changelog-file=<changelog.xml>
    

    If your update is successful, Liquibase runs each changeset and displays a summary message ending with:

    Liquibase: Update has been successful.
    Liquibase command 'update' was executed successfully.
    
  4. From a database UI tool, ensure that your database contains the test_table you added along with the DATABASECHANGELOG table and DATABASECHANGELOGLOCK table.

Now you're ready to start making deployments with Liquibase!

Troubleshooting

If you use v1.1.3 of the Liquibase Databricks extension, you may receive this error running Liquibase:

Unexpected error running Liquibase: 
Error executing SQL SELECT MD5SUM FROM main.default.DATABASECHANGELOG WHERE MD5SUM IS NOT NULL: [Databricks][JDBCDriver](500540) Error caught in BackgroundFetcher. Foreground thread ID: 1. Background thread ID: 20. 
Error caught: Could not initialize class com.databricks.client.jdbc42.internal.apache.arrow.memory.util.MemoryUtil.

To resolve this, append ;UserAgentEntry=Liquibase;EnableArrow=0; to your JDBC URL. For example:

jdbc:databricks://<host>:<port>/<schema>;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/warehouses/<warehouse>;ConnCatalog=<catalog>;UserAgentEntry=Liquibase;EnableArrow=0;