-1

I currently have a YAML file set up. At the moment, it allows one Databricks notebook to run upon a successful push to GitHub branch.

However, I haven’t been able to configure it to handle multiple Databricks notebooks during a single merge.

Here is my YAML file:

name: run-sql-notebooks
 
on:
  push:
    branches:
      - dev
 
env:
  DATABRICKS_HOST: https://mydatabricks.cloud.databricks.com
  DATABRICKS_CLUSTER_ID: 1255-122012-pwryxyz1
  DEV: dev
 
jobs:
  run_sql_notebooks:
    name: Run updated SQL notebooks
    runs-on: ubuntu-latest
 
    steps:
      - name: Checkout repo
        uses: actions/checkout@v3
        with:
          fetch-depth: 0
 
      - name: Find modified SQL notebooks in last commit
        id: changed_sqls
        run: |
          NOTEBOOKS=$(git diff --name-only HEAD^ HEAD | grep '\.sql$' || true)
          echo "notebooks<<EOF" >> $GITHUB_OUTPUT
          echo "$NOTEBOOKS" >> $GITHUB_OUTPUT
          echo "EOF" >> $GITHUB_OUTPUT
 
      - name: Run modified SQL notebooks on Databricks
        if: steps.changed_sqls.outputs.notebooks != ''
        uses: databricks/run-notebook@v0
        with:
          run-name: "GitHub Actions - ${{ github.run_number }}"
          local-notebook-path: ${{ steps.changed_sqls.outputs.notebooks }}
          git-commit: ${{ github.sha }}
          existing-cluster-id: ${{ env.DATABRICKS_CLUSTER_ID }}
          notebook-params-json: |
            {
            "env": "${{ env.DEV }}"
            }
        env:
          DATABRICKS_HOST: ${{ env.DATABRICKS_HOST }}
          DATABRICKS_TOKEN: ${{ secrets.DB_SECRET }}``` 

1
  • could you please describe in more details what you would like to achieve? Commented May 31 at 21:35

1 Answer 1

0

At the onset, this seems like a question you should be asking the creator of the run-notebook action you are using. You can try requesting this as a feature on their GitHub repo.

In the meanwhile, there are several options you can try:

  1. You can try using Dynamic matrix in GitHub Actions. This SO answer seems to be a good starting point.

  2. Since Databricks notebooks can run other notebooks, you can simply create an orchestration notebook that takes the list of notebooks as input. Then you simply call that notebook in your GitHub workflow and let it handle calling the rest of the notebooks.

Here is how a sample notebook will look like:

dbutils.widgets.text("notebookPaths", "")
...
paths = dbutils.widgets.get("notebookPaths")
for path in paths.split(" "): # Assuming the paths are split by a space
    dbutils.notebook.run(path, ...)

You will also need to pass in any arguments to the orchestrating notebook and ensure that it passes the right arguments to the childs.

If you don't want to run the notebooks sequentially, you can simply use python's multiprocessing module to run multiple notebooks in parallel.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.