5

We upgraded Databricks from 10.3 to 10.4 LTS. But the python version did not change from python 3.8.10.

Question: In Databricks - version 10.4, how can we upgrade the python version from python 3.8.10 to python 3.10?

UPDATE: I would like to use some new functionalities offered in python 3.10 such as match case Statement.

2
  • 1
    what is the reason for upgrade? What functionality do you want to add with it? Commented Jul 5, 2022 at 6:10
  • @AlexOtt Good question (that made me add an UPDATE section to my post above). Commented Jul 5, 2022 at 15:30

2 Answers 2

2

You might be able to install python 3.10.5 on a Docker image that a cluster can utilise instead of the standard runtime.

https://docs.databricks.com/clusters/custom-containers.html

You can build upon the minimal configuration. I have made a minimal example

FROM databricksruntime/minimal:experimental

# Installs python 3.10 and virtualenv for Spark and Notebooks
RUN apt-get update \
  && apt-get install -y \
    python3.10 \
    virtualenv \
  && apt-get clean \
  && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

# Initialize the default environment that Spark and notebooks will use
RUN virtualenv -p python3.10 --system-site-packages /databricks/python3

# Specifies where Spark will look for the python process
ENV PYSPARK_PYTHON=/databricks/python3/bin/python3

You will need to install all other python libraries, so the process is a bit more tedious.

Sign up to request clarification or add additional context in comments.

Comments

1

It might not be possible to upgrade the version of python inside a Databricks cluster. Each cluster have a pre-defined configuration which consist of specific versions of Spark, Scala and Python.

We upgraded Databricks from 10.3 to 10.4 LTS. But the python version did not change from python 3.8.10

  • This is because both Databricks 10.3 and 10.4 LTS have python version as 3.8.10.

One solution would have been to Edit the cluster to change to Databricks runtime which supports required configuration. To do this Navigate to compute -> click on your cluster -> edit and choose required databricks runtime.

But currently, the highest python version supported in Azure databricks is Python 3.9.5 by Databricks runtime 11.1. Refer to this Microsoft documentation to understand more about features and configurations of Databricks runtimes

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.