I'm running a Python script inside a Conda-based Docker container that processes geospatial data. The script runs a two step GDAL workflow, it uses gdaldem colorrelief to create a colorized GeoTIFF, and gdal2tiles.py to generate map tiles from that result.
Gdaldem completes successfully every time. However, the script hangs indefinitely as soon as it calls gdal2tiles.py... It produces no error output, and surprisingly, even the timeout argument in subprocess.run does not trigger an exception, the whole process just freezes with these log:
2025-08-02 09:53:22,602 - INFO - Successfully created GeoTIFF: /app/geotiffs/skjav/reflectivity/reflectivity_20250802T094500Z.tif
2025-08-02 09:53:22,636 - INFO - Step 1: Colorizing /app/geotiffs/skjav/reflectivity/reflectivity_20250802T094500Z.tif with gdaldem.
2025-08-02 09:53:22,672 - INFO - Successfully colorized GeoTIFF to /app/static/tiles/skjav/reflectivity/20250802T094500Z/colorized.tif
2025-08-02 09:53:22,672 - INFO - Step 2: Generating tiles from /app/static/tiles/skjav/reflectivity/20250802T094500Z/colorized.tif with gdal2tiles.py.
<-- HANGS HERE -->
The code snippet in question:
try:
logging.info(f"coloring with with gdaldem.")
color_map_content = create_color_map_file(product_config['cmap'], product_config['vmin'],
product_config['vmax'])
with open(color_file_path, 'w') as f:
f.write(color_map_content)
cmd_colorize = ['gdaldem', 'color-relief', geotiff_path, color_file_path, colorized_tiff_path, '-alpha']
subprocess.run(cmd_colorize, check=True, capture_output=True, text=True, timeout=60)
logging.info(f"colored geotiff to {colorized_tiff_path}")
logging.info(f"generating tiles {colorized_tiff_path} with gdal2tiles.py.")
cmd_gdal2tiles = [
'gdal2tiles.py',
'--profile=raster',
'--zoom=5-12',
'--webp-quality=90',
colorized_tiff_path,
output_tile_dir
]
subprocess.run(cmd_gdal2tiles, check=True, capture_output=True, text=True, timeout=180)
logging.info(f"success - {output_tile_dir}")
return output_tile_dir
What could cause gdal2tiles.py to hang so completely that it ignores the timeout from Python's subprocess module? Is there a known issue with running gdal2tiles.py non interactively from a Python script inside a Docker container that could lead to this kind of deadlock?
Ruled out Environment Path Issues: I added a diagnostic log (shutil.which('gdal2tiles.py')) which confirmed the script is correctly finding the modern version of gdal2tiles.py inside the conda environment (/opt/conda/envs/radar-env/bin/gdal2tiles.py).
Ruled out Multiprocessing: The hang occurs even with the --processes flag removed from the command.
Ruled out output format: The hang persists whether I use --webp-quality=90 or remove it to default to png tiles.
I also tried to replaced subprocess.run with the lower-level subprocess.Popen and proc.communicate(timeout=) this also hung and failed to trigger the TimeoutExpired exception.
try:but where isexcept:? Maybe you haveexcept: passand it hides some errors.