When moving from Celery 4.3 to Celery 5.4, we saw to unexpected results and rolled back until more research could be made to understand what happened.
Three thing occurred:
- We have several nodes pick up the same job and run them. This should be allowed based on 'visibility_timeout': 86400 from my understanding. Is there another configuration I should use?
- We had a huge spike of reserved tasks, rather than tasks waiting in the queue. This should not be allowed as we allow 5 workers per node and worker_prefetch_multiplier = 1. However, this was not honored when upgrading to 5.4. It is unclear why. This seemingly was unbounded as it went into the hundreds of reserved tasks. Is there a new way to set this in 5.4?
- The Redis key celery was not set. My interpretation of that is that there were no "queued tasks", only these got move to reserved tasks. Did this get migrated in 5.4 to something new?
When trying to replicate this using celery --app=my.celery.app:app -b redis://my-redis/0 inspect reserved or related terms in a controlled environment, I get the results I expect. Does something happen when it is across nodes?
Configuration Background
- Python 3.8 (moving to 3.11 asap)
- Celery 4.3 -> 5.4
- Redis redis==4.3.6 -> redis==4.5.2
- kombu==5.5.3
- Multiple celery instances running 5 workers each hitting the Redis broker
result_serializer = 'json'
timezone = 'UTC'
enable_utc = True
task_track_started = True
task_acks_late = True
worker_prefetch_multiplier = 1
worker_send_task_events = True
worker_max_memory_per_child = int(settings.get('CELERY_MEMORY_LIMIT'))
broker_transport_options = {'visibility_timeout': 86400,
'queue_order_strategy': 'priority',
}
result_backend_transport_options = {'visibility_timeout': 86400}
visibility_timeout = 86400
Application Start under 4.3:
CELERY_WORKERS is set to 5
celery -A my.celery.app:app -E -l info --concurrency=${CELERY_WORKERS} worker 2>&1
Application Start under 5.4:
CELERY_WORKERS is set to 5
celery --app=my.celery.app:app worker --loglevel=info --concurrency=${CELERY_WORKERS} 2>&1
Could this bug a bug in celery?
I have tried every knob configuration. None of these seem to yield the expect number of reserved task.
Any insight into a configuration that I may be missing?