How to run multiple Databricks notebooks in GitHub Actions on a single push?

Question

I currently have a YAML file set up. At the moment, it allows one Databricks notebook to run upon a successful push to GitHub branch.

However, I haven’t been able to configure it to handle multiple Databricks notebooks during a single merge.

Here is my YAML file:

name: run-sql-notebooks
 
on:
  push:
    branches:
      - dev
 
env:
  DATABRICKS_HOST: https://mydatabricks.cloud.databricks.com
  DATABRICKS_CLUSTER_ID: 1255-122012-pwryxyz1
  DEV: dev
 
jobs:
  run_sql_notebooks:
    name: Run updated SQL notebooks
    runs-on: ubuntu-latest
 
    steps:
      - name: Checkout repo
        uses: actions/checkout@v3
        with:
          fetch-depth: 0
 
      - name: Find modified SQL notebooks in last commit
        id: changed_sqls
        run: |
          NOTEBOOKS=$(git diff --name-only HEAD^ HEAD | grep '\.sql$' || true)
          echo "notebooks<<EOF" >> $GITHUB_OUTPUT
          echo "$NOTEBOOKS" >> $GITHUB_OUTPUT
          echo "EOF" >> $GITHUB_OUTPUT
 
      - name: Run modified SQL notebooks on Databricks
        if: steps.changed_sqls.outputs.notebooks != ''
        uses: databricks/run-notebook@v0
        with:
          run-name: "GitHub Actions - ${{ github.run_number }}"
          local-notebook-path: ${{ steps.changed_sqls.outputs.notebooks }}
          git-commit: ${{ github.sha }}
          existing-cluster-id: ${{ env.DATABRICKS_CLUSTER_ID }}
          notebook-params-json: |
            {
            "env": "${{ env.DEV }}"
            }
        env:
          DATABRICKS_HOST: ${{ env.DATABRICKS_HOST }}
          DATABRICKS_TOKEN: ${{ secrets.DB_SECRET }}```

could you please describe in more details what you would like to achieve? — Utmost Creator
– Utmost Creator, Commented May 31 at 21:35

Yashovardhan99 · Accepted Answer · 2025-06-01 19:42:18Z

At the onset, this seems like a question you should be asking the creator of the run-notebook action you are using. You can try requesting this as a feature on their GitHub repo.

In the meanwhile, there are several options you can try:

You can try using Dynamic matrix in GitHub Actions. This SO answer seems to be a good starting point.
Since Databricks notebooks can run other notebooks, you can simply create an orchestration notebook that takes the list of notebooks as input. Then you simply call that notebook in your GitHub workflow and let it handle calling the rest of the notebooks.

Here is how a sample notebook will look like:

dbutils.widgets.text("notebookPaths", "")
...
paths = dbutils.widgets.get("notebookPaths")
for path in paths.split(" "): # Assuming the paths are split by a space
    dbutils.notebook.run(path, ...)

You will also need to pass in any arguments to the orchestrating notebook and ensure that it passes the right arguments to the childs.

If you don't want to run the notebooks sequentially, you can simply use python's multiprocessing module to run multiple notebooks in parallel.

Collectives™ on Stack Overflow

How to run multiple Databricks notebooks in GitHub Actions on a single push?

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related