Streamlining BI with dbt Cloud and Lightdash - A Developer's Guide
May 10, 2024
In the world of data analytics, efficiency is key. As our thirst for insights grows, so does the need for streamlined processes that get us from raw data to impactful visualizations without unnecessary delays. That's where dbt Cloud and Lightdash step in, offering an integrated development workflow that turns raw data into dashboards rapidly. In this post, we'll explore a development flow that maximizes your productivity without the need to install dbt Core locally.
Developing with dbt Cloud: Start Coding Without Hassles
Dbt Cloud's IDE is the starting point of your development journey. Here’s what you do:
1. Branching Out: Create a new feature branch right from the IDE. This is your sandbox, where all the magic happens.
2. Modeling Time: Get down to business with your SQL models and `model.yml` configurations. For good measure, run and test all new or changed models before committing changes to the remote branch.
3. Commit to Change: Once you’re confident with your work, commit those changes, push them upstream, and let the IDE guide you to the next step.
GitHub Actions: The Bridge to Lightdash
As the GitHub Action kicks in, a Lightdash preview project is born. Here's the cool part – this project is named after your branch, making it easy to track. This seamless connection allows you to jump into Lightdash and start testing your visual prototypes in no time, ensuring everything aligns with your expectations before it hits production.
From Preview to Production: The Final Stretch
Once you've explored your charts and are happy with the results, you're almost home. It's time to get your PR reviewed and deployed.
Adopt this workflow into your project
Here's the code and a breakdown of how it works. You'll need to start from the Lightdash docs on how to setup GitHub Actions for preview projects. Those details live here. Make note of all the variables/secrets needed for setting up GitHub, then see below as you will want to modify how you save the profiles.yml
code to your GitHub secret.
Once you have the basics covered from the Lightdash docs, here are the modifications to make so this is more dbt Cloud IDE friendly.
1. Modify the profiles.yml
to follow this example structure, replacing config parameters with your project details as needed:
dbt_project:
target: prod
outputs:
# this is the production target
prod:
type: bigquery
method: oauth
keyfile: keyfile.json
project: gcp_project
dataset: prod # Or whatever your production dbt schema is
job_execution_timeout_seconds: 1500
job_retries: 1
location: US
priority: interactive
threads: 4
# this is the pull request target
pr:
type: bigquery
method: oauth
keyfile: keyfile.json
project: gcp_project
dataset: "{{ env_var('DBT_SCHEMA') }}" # Will use this in the GH Action yml
job_execution_timeout_seconds: 1500
job_retries: 1
location: US
priority: interactive
threads: 4
As you can see, there is now one pr
entry in the outputs
which uses an environment variable for the dataset
. The exact name of the dataset
will be provided by the DBT_SCHEMA
environment variable defined in the GitHub Actions workflow.
You would store this modified profiles.yml
as a secret in your GitHub repository's settings. Navigate to your repository on GitHub, click on Settings > Secrets > New repository secret. Add the entire content of the modified profiles.yml
as the secret value, and name the secret something like DBT_PROFILES
. All other secrets should remain as directed from the Lightdash doc.
2. Next, see below for the full GitHub Actions code. Place this lightdash_pr_start_preview.yml
file under a .github/workflows
path in your repo root directory.
# This should work for either open source or Lightdash Cloud
name: lightdash_cloud_pr_start_preview
on:
pull_request:
branches: [ "main", "master" ]
types: [opened, reopened, synchronize]
env:
DBT_VERSION: "1.7.0" # or whatever version you're on
JOB_ID_NUM: 12345 # Get this from the url of your dbt Cloud job
jobs:
preview:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3.4.1
- uses: actions/setup-python@v1
with:
python-version: "3.9.x"
- name: Copy Google credentials file
env:
GOOGLE_CREDENTIALS: ${{ secrets.GCP_SA_KEY }} # or whatever your json key secret is named for GCP auth
if: "${{ env.GOOGLE_CREDENTIALS != '' }}"
id: create-json
uses: jsdaniell/create-json@1.1.2
with:
name: "googlecredentials.json"
json: ${{ env.GOOGLE_CREDENTIALS }}
- name: Move credentials to /tmp
run: mv googlecredentials.json /tmp || true
- name: Locate dbt_project.yml
run: echo "PROJECT_DIR=$(find . -name "dbt_project.yml" | sed 's/dbt_project.yml//g')" >> $GITHUB_ENV
- name: Get lightdash version
uses: sergeysova/jq-action@v2
id: version
env:
LIGHTDASH_URL: ${{ secrets.LIGHTDASH_URL_CLOUD }}
with:
cmd: curl -s "${LIGHTDASH_URL}/api/v1/health" | jq -r '.results.version'
- name: Set DBT_SCHEMA based on PR
env:
PR_NUMBER: ${{ github.event.pull_request.number }}
run: echo "DBT_SCHEMA=dbt_cloud_pr_${JOB_ID_NUM}_${PR_NUMBER}" >> $GITHUB_ENV
- name: Copy profiles.yml
env:
PROFILES_YML_CONTENT: ${{ secrets.DBT_PROFILES }} # This should match the name of your secret
SCHEMA: ${{ env.DBT_SCHEMA }}
run: |
echo "$PROFILES_YML_CONTENT" | sed "s/{{ env_var('DBT_SCHEMA') }}/${SCHEMA}/g" > profiles.yml
- name: Install dbt
run: |
pip install dbt-bigquery==$DBT_VERSION #Change if using a different data warehouse
dbt deps --project-dir "$PROJECT_DIR" --profiles-dir . --target pr
- name: Install lightdash CLI
run: npm install -g "@lightdash/cli@${{ steps.version.outputs.value }}" || npm install -g @lightdash/cli@latest
- name: Lightdash CLI start preview
id: start-preview
env:
LIGHTDASH_API_KEY: ${{ secrets.LIGHTDASH_API_KEY_CLOUD }}
LIGHTDASH_PROJECT: ${{ secrets.LIGHTDASH_PROJECT_CLOUD }}
LIGHTDASH_URL: ${{ secrets.LIGHTDASH_URL_CLOUD }}
GOOGLE_APPLICATION_CREDENTIALS: '/tmp/googlecredentials.json'
DBT_TARGET: 'pr'
run: |
lightdash start-preview --project-dir "$PROJECT_DIR" --profiles-dir . --name ${{ github.head_ref }} --target $DBT_TARGET
- uses: jwalton/gh-find-current-pr@v1
id: finder
- name: Leave a comment after deployment
uses: marocchino/sticky-pull-request-comment@v2
with:
number: ${{ steps.finder.outputs.pr }}
message: |
:rocket: Deployed ${{ github.sha }} to ${{ steps.start-preview.outputs.url }}
This GitHub Action, named lightdash_cloud_pr_start_preview
, is designed to automate the process of launching a preview environment for a Lightdash project upon a pull request (PR) event against the main
or master
branches. It is configured to trigger when a pull request is opened, reopened, or synchronized.
The environment variables specify the DBT version and a job identification number. The job named "preview" runs on the latest version of Ubuntu and comprises steps to set up the environment, including checking out the code, setting up Python and Node.js, copying Google credentials, finding the dbt_project.yml
file, and dynamically setting the DBT schema based on the PR number.
It continues with installing the correct versions of dbt for BigQuery and the Lightdash CLI, and then uses the Lightdash CLI to start a preview environment. Subsequently, it finds the current PR and leaves a comment with the deployment URL, providing direct access to the deployed preview version relevant to the git SHA of the commit that triggered the action.
3. Lastly, you need a preview project close step or you'll have to manually delete an excessive number of projects. This script should stay the same as the Lightdash docs example, but for clarity we will include the latest version here:
name: close-preview
on:
pull_request:
types: [closed]
jobs:
preview:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3.4.1
- name: Get lightdash version
uses: sergeysova/jq-action@v2
id: version
env:
LIGHTDASH_URL: ${{ secrets.LIGHTDASH_URL_CLOUD }}
with:
cmd: curl -s "${LIGHTDASH_URL}/api/v1/health" | jq -r '.results.version'
- name: Install lightdash CLI
run: npm install -g "@lightdash/cli@${{ steps.version.outputs.value }}" || npm install -g @lightdash/cli@latest
- name: Lightdash CLI stop preview
env:
LIGHTDASH_API_KEY: ${{ secrets.LIGHTDASH_API_KEY_CLOUD }}
LIGHTDASH_PROJECT: ${{ secrets.LIGHTDASH_PROJECT_CLOUD }}
LIGHTDASH_URL: ${{ secrets.LIGHTDASH_URL_CLOUD }}
run: lightdash stop-preview --name ${{ github.head_ref }}
And that's it! Implementing the above, along with the GitHub Actions for Lightdash Validate and Automatically Deploy Your Changes Using GitHub Action will have you setup for a great dbt + Lightdash development workflow! If you're interested in how we might help you with a similar solution, feel free to schedule an intro call here.