We're running a Rails application on Heroku, with Postgres 15.7 (on the plan Heroku Postgres standard-2)
Before upgrading to Postgres 15.7, we were on 12.x with Heroku Postgres standard-0.
We have large/complex/slow queries, but this wasn't an issue before the upgrade. Ever since we're on 15.7, we get a lot of out-of-memory issues, and other (probably related) ones. A few examples:
ActiveRecord::StatementInvalid: PG::OutOfMemory: ERROR: out of memory
DETAIL: Failed on request of size 21 in memory context "CachedPlan".
ActiveRecord::StatementInvalid: PG::OutOfMemory: ERROR: out of memory
DETAIL: Failed on request of size 292 in memory context "CacheMemoryContext".
ActiveRecord::StatementInvalid: PG::OutOfMemory: ERROR: out of memory
DETAIL: Failed on request of size 27336 in memory context "MessageContext".
Errors happen on large queries, but regularly on simple queries as well, when other queries are running simultaneously.
Automatics backups/snapshots fail 40% of the time, trying to pull the database to local machine fails, unless Heroku dynos are restarted at the same time (probably because of the brief downtime, and less load on Postgres).
With the plan upgrade, RAM has increased from 4GB->8GB. We have looked into work_mem and other configuration settings, by both increasing and decreasing it (now it's back at the default setting). (Also suggested by Heroku support, who can't help us any further.
What would be a logical place to look next? I understand optimising the queries would be the best, we don't have enough resources to do this enough for it to have effect on the short-term, so for that reason I'm looking at other options. The fact that everything seemed to be working fine before upgrading the version and plan (RAM), it makes me think there must be some other way to improve performance.
For some context: we collect customer feedback and show the results in graphs. For all previous months, we use materialized views to get basic data, and apply calculations/filters on that. Only for the current day, live data is fetched from database.
pg:settings:
auto-explain: false
auto-explain.log-analyze: false
auto-explain.log-buffers: false
auto-explain.log-format: text
auto-explain.log-min-duration: -1
auto-explain.log-nested-statements: false
auto-explain.log-triggers: false
auto-explain.log-verbose: false
log-connections: true
log-lock-waits: true
log-min-duration-statement: 2000
log-min-error-statement: error
log-statement: ddl
pgbouncer-default-pool-size: 300
pgbouncer-max-client-conn: 10000
pgbouncer-max-db-connections: 300
track-functions: none
pg:info:
Plan: Standard 2
Status: Available
Data Size: 13.6 GB / 256 GB (5.32%)
Tables: 60
PG Version: 15.7
Connections: 31/400
Connection Pooling: Available
Credentials: 1
Fork/Follow: Available
Rollback: earliest from 2024-08-23 09:23 UTC
Created: 2024-04-15 13:54
Region: eu
Data Encryption: In Use
Continuous Protection: On
Enhanced Certificates: Off
Upgradable Extensions: Yes
Maintenance: not required
Maintenance window: Thursdays 19:00 to 23:00 UTC
SHOW ALL;: https://codefile.io/f/dE7rz5cxip
max_parallel_workers_per_gather = 0?standard-0and fixed it. Thanks for noticing.