Streamlining BI with dbt Cloud and Lightdash - A Developer's Guide

May 10, 2024

In the world of data analytics, efficiency is key. As our thirst for insights grows, so does the need for streamlined processes that get us from raw data to impactful visualizations without unnecessary delays. That's where dbt Cloud and Lightdash step in, offering an integrated development workflow that turns raw data into dashboards rapidly. In this post, we'll explore a development flow that maximizes your productivity without the need to install dbt Core locally.

A BI DevOps flow for dbt Cloud + Lightdash, supporting those less interested in setting up local CLI tools.

Developing with dbt Cloud: Start Coding Without Hassles

Dbt Cloud's IDE is the starting point of your development journey. Here’s what you do:

1. Branching Out: Create a new feature branch right from the IDE. This is your sandbox, where all the magic happens.

2. Modeling Time: Get down to business with your SQL models and `model.yml` configurations. For good measure, run and test all new or changed models before committing changes to the remote branch.

3. Commit to Change: Once you’re confident with your work, commit those changes, push them upstream, and let the IDE guide you to the next step.

GitHub Actions: The Bridge to Lightdash

As the GitHub Action kicks in, a Lightdash preview project is born. Here's the cool part – this project is named after your branch, making it easy to track. This seamless connection allows you to jump into Lightdash and start testing your visual prototypes in no time, ensuring everything aligns with your expectations before it hits production.

From Preview to Production: The Final Stretch

Once you've explored your charts and are happy with the results, you're almost home. It's time to get your PR reviewed and deployed.

Adopt this workflow into your project

Here's the code and a breakdown of how it works. You'll need to start from the Lightdash docs on how to setup GitHub Actions for preview projects. Those details live here. Make note of all the variables/secrets needed for setting up GitHub, then see below as you will want to modify how you save the profiles.yml code to your GitHub secret.

Once you have the basics covered from the Lightdash docs, here are the modifications to make so this is more dbt Cloud IDE friendly.

1. Modify the profiles.yml to follow this example structure, replacing config parameters with your project details as needed:

dbt_project:
  target: prod
  outputs:
	  # this is the production target
    prod:
      type: bigquery
      method: oauth
      keyfile: keyfile.json 
      project: gcp_project
      dataset: prod # Or whatever your production dbt schema is
      job_execution_timeout_seconds: 1500
      job_retries: 1
      location: US
      priority: interactive
      threads: 4
      
      # this is the pull request target
    pr:
      type: bigquery
      method: oauth
      keyfile: keyfile.json 
      project: gcp_project
      dataset: "{{ env_var('DBT_SCHEMA') }}" # Will use this in the GH Action yml
      job_execution_timeout_seconds: 1500
      job_retries: 1
      location: US
      priority: interactive
      threads: 4

As you can see, there is now one pr entry in the outputs which uses an environment variable for the dataset. The exact name of the dataset will be provided by the DBT_SCHEMA environment variable defined in the GitHub Actions workflow.

You would store this modified profiles.yml as a secret in your GitHub repository's settings. Navigate to your repository on GitHub, click on Settings > Secrets > New repository secret. Add the entire content of the modified profiles.yml as the secret value, and name the secret something like DBT_PROFILES. All other secrets should remain as directed from the Lightdash doc.

2. Next, see below for the full GitHub Actions code. Place this lightdash_pr_start_preview.yml file under a .github/workflows path in your repo root directory.

# This should work for either open source or Lightdash Cloud
name: lightdash_cloud_pr_start_preview

on:
  pull_request:
    branches: [ "main", "master" ]
    types: [opened, reopened, synchronize]

env:
  DBT_VERSION: "1.7.0" # or whatever version you're on
  JOB_ID_NUM: 12345 # Get this from the url of your dbt Cloud job

jobs:
  preview:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3.4.1
      - uses: actions/setup-python@v1
        with:
          python-version: "3.9.x"

      - name: Copy Google credentials file
        env: 
          GOOGLE_CREDENTIALS: ${{ secrets.GCP_SA_KEY }} # or whatever your json key secret is named for GCP auth
        if: "${{ env.GOOGLE_CREDENTIALS != '' }}"
        id: create-json
        uses: jsdaniell/create-json@1.1.2
        with:
          name: "googlecredentials.json"
          json: ${{ env.GOOGLE_CREDENTIALS }}

      - name: Move credentials to /tmp 
        run: mv googlecredentials.json /tmp || true

      - name: Locate dbt_project.yml
        run: echo "PROJECT_DIR=$(find . -name "dbt_project.yml"  | sed  's/dbt_project.yml//g')" >> $GITHUB_ENV

      - name: Get lightdash version 
        uses: sergeysova/jq-action@v2
        id: version 
        env:
          LIGHTDASH_URL: ${{ secrets.LIGHTDASH_URL_CLOUD }}   
        with: 
          cmd: curl -s "${LIGHTDASH_URL}/api/v1/health" | jq -r '.results.version'

      - name: Set DBT_SCHEMA based on PR
        env:
          PR_NUMBER: ${{ github.event.pull_request.number }}
        run: echo "DBT_SCHEMA=dbt_cloud_pr_${JOB_ID_NUM}_${PR_NUMBER}" >> $GITHUB_ENV
  
      - name: Copy profiles.yml
        env:
          PROFILES_YML_CONTENT: ${{ secrets.DBT_PROFILES }} # This should match the name of your secret
          SCHEMA: ${{ env.DBT_SCHEMA }}
        run: |
          echo "$PROFILES_YML_CONTENT" | sed "s/{{ env_var('DBT_SCHEMA') }}/${SCHEMA}/g" > profiles.yml
      
      - name: Install dbt
        run: |
          pip install dbt-bigquery==$DBT_VERSION #Change if using a different data warehouse
          dbt deps --project-dir "$PROJECT_DIR" --profiles-dir . --target pr
      - name: Install lightdash CLI
        run: npm install -g "@lightdash/cli@${{ steps.version.outputs.value }}" || npm install -g @lightdash/cli@latest

      - name: Lightdash CLI start preview
        id: start-preview
        env:
          LIGHTDASH_API_KEY: ${{ secrets.LIGHTDASH_API_KEY_CLOUD }}          
          LIGHTDASH_PROJECT: ${{ secrets.LIGHTDASH_PROJECT_CLOUD }}          
          LIGHTDASH_URL: ${{ secrets.LIGHTDASH_URL_CLOUD }}    
          GOOGLE_APPLICATION_CREDENTIALS: '/tmp/googlecredentials.json'
          DBT_TARGET: 'pr'
        run: |
          lightdash start-preview --project-dir "$PROJECT_DIR" --profiles-dir . --name ${{ github.head_ref }} --target $DBT_TARGET
            
      - uses: jwalton/gh-find-current-pr@v1
        id: finder
        
      - name: Leave a comment after deployment
        uses: marocchino/sticky-pull-request-comment@v2
        with:
          number: ${{ steps.finder.outputs.pr }}
          message: |
            :rocket: Deployed ${{ github.sha }} to ${{ steps.start-preview.outputs.url }}

This GitHub Action, named lightdash_cloud_pr_start_preview, is designed to automate the process of launching a preview environment for a Lightdash project upon a pull request (PR) event against the main or master branches. It is configured to trigger when a pull request is opened, reopened, or synchronized.

The environment variables specify the DBT version and a job identification number. The job named "preview" runs on the latest version of Ubuntu and comprises steps to set up the environment, including checking out the code, setting up Python and Node.js, copying Google credentials, finding the dbt_project.yml file, and dynamically setting the DBT schema based on the PR number.

It continues with installing the correct versions of dbt for BigQuery and the Lightdash CLI, and then uses the Lightdash CLI to start a preview environment. Subsequently, it finds the current PR and leaves a comment with the deployment URL, providing direct access to the deployed preview version relevant to the git SHA of the commit that triggered the action.

3. Lastly, you need a preview project close step or you'll have to manually delete an excessive number of projects. This script should stay the same as the Lightdash docs example, but for clarity we will include the latest version here:

name: close-preview

on:
  pull_request:
    types: [closed]

jobs:
  preview:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3.4.1

      - name: Get lightdash version 
        uses: sergeysova/jq-action@v2
        id: version 
        env:
          LIGHTDASH_URL: ${{ secrets.LIGHTDASH_URL_CLOUD }}   
        with: 
          cmd: curl -s "${LIGHTDASH_URL}/api/v1/health" | jq -r '.results.version'

      - name: Install lightdash CLI
        run: npm install -g "@lightdash/cli@${{ steps.version.outputs.value }}" || npm install -g @lightdash/cli@latest

      - name: Lightdash CLI stop preview 
        env:
          LIGHTDASH_API_KEY: ${{ secrets.LIGHTDASH_API_KEY_CLOUD }}          
          LIGHTDASH_PROJECT: ${{ secrets.LIGHTDASH_PROJECT_CLOUD }}          
          LIGHTDASH_URL: ${{ secrets.LIGHTDASH_URL_CLOUD }}          

        run:  lightdash stop-preview --name ${{ github.head_ref }}
        

And that's it! Implementing the above, along with the GitHub Actions for Lightdash Validate and Automatically Deploy Your Changes Using GitHub Action will have you setup for a great dbt + Lightdash development workflow! If you're interested in how we might help you with a similar solution, feel free to schedule an intro call here.