Postgres Query Slow 0.05% of the time

Question

I have the following tables:

CREATE TABLE IF NOT EXISTS users
(
    id                  NUMERIC(20, 0)                           NOT NULL DEFAULT NEXTVAL('users_sequence') PRIMARY KEY,
    list_id             NUMERIC(20, 0)                           NOT NULL,
    first_name          VARCHAR(512)   DEFAULT NULL              NULL,
    last_name           VARCHAR(512)   DEFAULT NULL              NULL,
    full_name VARCHAR(1024) GENERATED ALWAYS AS
        (CASE
        WHEN first_name IS NULL AND
             last_name IS NULL THEN NULL
        ELSE
             (TRIM(COALESCE(first_name, '') || ' ' || COALESCE(last_name, ''))) END) STORED,
    deleted_at          TIMESTAMP      DEFAULT NULL              NULL,
    -- Some ~20 columns
);

CREATE TABLE IF NOT EXISTS user_aliases
(
    id         NUMERIC(20, 0)            NOT NULL DEFAULT NEXTVAL('user_aliases_sequence') PRIMARY KEY,
    entry_id   NUMERIC(20, 0)            NOT NULL,
    list_id             NUMERIC(20, 0)                           NOT NULL,
    first_name VARCHAR(512) DEFAULT NULL NULL,
    last_name  VARCHAR(512) DEFAULT NULL NULL,
    full_name  VARCHAR(1024) GENERATED ALWAYS AS
        (CASE
            WHEN first_name IS NULL AND
                 last_name IS NULL THEN NULL
            ELSE
                 (TRIM(COALESCE(first_name, '') || ' ' || COALESCE(last_name, ''))) END) STORED,
    deleted_at TIMESTAMP      DEFAULT NULL NULL,
    CONSTRAINT fk_user_aliases_entry_id FOREIGN KEY (entry_id) REFERENCES users (id) ON UPDATE CASCADE ON DELETE CASCADE
);

and the following indices:

CREATE INDEX users_full_name_idx ON users USING GIST (full_name gist_trgm_ops, list_id) WHERE deleted_at IS NULL AND full_name IS NOT NULL;

CREATE INDEX user_aliases_full_name_idx ON user_aliases USING GIST (full_name gist_trgm_ops, list_id) WHERE deleted_at IS NULL AND full_name IS NOT NULL;

and I have the following functions:

And my query is:

SELECT id  
FROM (
    SELECT
      e.id,  
      e.full_name  
    FROM users e  
    WHERE e.full_name % :value  
      AND e.deleted_at IS NULL  
      AND e.list_id IN (:lists)  

    UNION

    SELECT
      a.entry_id AS id,  
      a.full_name  
    FROM user_aliases a  
    WHERE a.full_name % :value  
      AND a.deleted_at IS NULL  
      AND a.list_id IN (:lists)
) filter_table
WHERE LEVENSHTEIN_LESS_EQUAL(full_name, :value, :threshold) <= :threshold  
LIMIT :limit

I have around ~500K records in the parent (users) table, and again ~500K records in the child (user_aliases) table. 99.5% of the time, my query takes around ~200ms, which works for me. However, sometimes (and I believe when there's some load on the server) it takes around 20seconds. I have ~140 connections at the moment, majority of which is idle (connection pooling).

The server has the following resources:

8 vCPU
32GB memory
500 GB SSD

and some related bits of my postgresql.conf are as follows, happy to provide further, just don't know what:

max_connections = 300     
shared_buffers = 12GB           
effective_cache_size = 16GB
maintenance_work_mem = 1GB
work_mem = 32MB
max_wal_size = 1GB
min_wal_size = 80MB

And the query plan for is as follows:

Limit  (cost=285.91..286.11 rows=10 width=20) (actual time=238.326..238.330 rows=0 loops=1)
  Output: filter_table.id
  Buffers: shared hit=40038
  ->  Subquery Scan on filter_table  (cost=285.91..286.23 rows=16 width=20) (actual time=238.323..238.327 rows=0 loops=1)
        Output: filter_table.id
        Buffers: shared hit=40038
        ->  HashAggregate  (cost=285.91..286.07 rows=16 width=536) (actual time=238.322..238.325 rows=0 loops=1)
"              Output: e.id, e.full_name"
"              Group Key: e.id, e.full_name"
              Batches: 1  Memory Usage: 24kB
              Buffers: shared hit=40038
              ->  Append  (cost=13.58..285.83 rows=16 width=536) (actual time=238.317..238.320 rows=0 loops=1)
                    Buffers: shared hit=40038
                    ->  Bitmap Heap Scan on master.users e  (cost=13.58..199.25 rows=11 width=27) (actual time=164.384..164.385 rows=0 loops=1)
"                          Output: e.id, e.full_name"
                          Recheck Cond: (((e.full_name)::text % 'hasan can saral'::text) AND (e.deleted_at IS NULL))
"                          Filter: ((e.list_id = ANY ('{1,3,5,6,7,10,20,30,40,50,60,70,80,90,100,110,120,130,140,150,160,170,180,190,200,210,220,230,240,250,260,270,280,300,310,320,330,340,350,360,370,380,390,400,410,420,430,440,450,460,470,480,490,500,510,520,530,540,550,560,570,580,590,600,610,620,630,640,650,660,670,680,690,700,710,720,730,740,750,760,770,780,790,800,810,820,830,840,850,860,870,880,890,900,910,920,930,940,950,960,970,980,990,1000,1010,1020,1030,1040,1050,1060,1070,1080,1090,1100,1110,1120,1130,1140,1150,1160,1170,1180,1190,1200,1210,1220,1230,1240,1250,1260,1270,1280,1290,1300,1310,1320,1330,1340,1350,1360,1370,1380,1390,1400,1410,1420,1430,1440,1450,1460,1470,1480,1490,1500,1510,1520,1530,1540,1550,1560,1570,1580,1590,1600,1610,1620,1630,1640,1650,1660,1670,1680,1690,1700,1710,1720,1730,1740,1750,1760,1770,1780,1790,1800,1810,1820,1830,1840,1850,1860,1870,1880,1890,1900,1910,1920,1930,1940,1950,1960,1970,1980,1990,2000,2010,2020,2030,2040,2050,2060,2070,2080,2090,2100,2110,2120,2130,2140,2150,2160,2170,2180,2190,2200,2210,2220,2230,2240,2250,2260,2270,2280,2290,2300,2310,2320,2330,2340,2350,2360,2370,2380,2390,2400,2410,2420,2430,2440,2450,2460,2470,2480,2490,2500,2510,2520,2530,2540,2550,2560,2570,2580,2590,2600,2610,2620,2630,2640,2650,2660,2670,2680,2690,2700,2710,2720,2730,2740,2750,2760,2770,2780,2790,2800,2810,2820,2830,2840,2850,2860,2870,2880,2890,2900,2910,2920,2930,2940,2950,2960,2970,2980,2990,3000,3010,3020,3030,3040,3050,3060,3070,3080,3090,3100,3110,3120,3130,3140,3150,3160,3170,3180}'::numeric[])) AND (levenshtein_less_equal((e.full_name)::text, 'hasan can saral'::text, 2) <= 2))"
                          Rows Removed by Filter: 161
                          Heap Blocks: exact=145
                          Buffers: shared hit=27180
                          ->  Bitmap Index Scan on users_full_name_idx  (cost=0.00..12.77 rows=48 width=0) (actual time=163.986..163.987 rows=161 loops=1)
                                Index Cond: ((e.full_name)::text % 'hasan can saral'::text)
                                Buffers: shared hit=27035
                    ->  Bitmap Heap Scan on master.user_aliases a  (cost=9.36..86.34 rows=5 width=38) (actual time=73.928..73.928 rows=0 loops=1)
"                          Output: a.entry_id, a.full_name"
                          Recheck Cond: (((a.full_name)::text % 'hasan can saral'::text) AND (a.deleted_at IS NULL))
"                          Filter: ((a.list_id = ANY ('{1,3,5,6,7,10,20,30,40,50,60,70,80,90,100,110,120,130,140,150,160,170,180,190,200,210,220,230,240,250,260,270,280,300,310,320,330,340,350,360,370,380,390,400,410,420,430,440,450,460,470,480,490,500,510,520,530,540,550,560,570,580,590,600,610,620,630,640,650,660,670,680,690,700,710,720,730,740,750,760,770,780,790,800,810,820,830,840,850,860,870,880,890,900,910,920,930,940,950,960,970,980,990,1000,1010,1020,1030,1040,1050,1060,1070,1080,1090,1100,1110,1120,1130,1140,1150,1160,1170,1180,1190,1200,1210,1220,1230,1240,1250,1260,1270,1280,1290,1300,1310,1320,1330,1340,1350,1360,1370,1380,1390,1400,1410,1420,1430,1440,1450,1460,1470,1480,1490,1500,1510,1520,1530,1540,1550,1560,1570,1580,1590,1600,1610,1620,1630,1640,1650,1660,1670,1680,1690,1700,1710,1720,1730,1740,1750,1760,1770,1780,1790,1800,1810,1820,1830,1840,1850,1860,1870,1880,1890,1900,1910,1920,1930,1940,1950,1960,1970,1980,1990,2000,2010,2020,2030,2040,2050,2060,2070,2080,2090,2100,2110,2120,2130,2140,2150,2160,2170,2180,2190,2200,2210,2220,2230,2240,2250,2260,2270,2280,2290,2300,2310,2320,2330,2340,2350,2360,2370,2380,2390,2400,2410,2420,2430,2440,2450,2460,2470,2480,2490,2500,2510,2520,2530,2540,2550,2560,2570,2580,2590,2600,2610,2620,2630,2640,2650,2660,2670,2680,2690,2700,2710,2720,2730,2740,2750,2760,2770,2780,2790,2800,2810,2820,2830,2840,2850,2860,2870,2880,2890,2900,2910,2920,2930,2940,2950,2960,2970,2980,2990,3000,3010,3020,3030,3040,3050,3060,3070,3080,3090,3100,3110,3120,3130,3140,3150,3160,3170,3180}'::numeric[])) AND (levenshtein_less_equal((a.full_name)::text, 'hasan can saral'::text, 2) <= 2))"
                          Rows Removed by Filter: 62
                          Heap Blocks: exact=49
                          Buffers: shared hit=12858
                          ->  Bitmap Index Scan on user_aliases_full_name_idx  (cost=0.00..8.56 rows=20 width=0) (actual time=73.735..73.735 rows=62 loops=1)
                                Index Cond: ((a.full_name)::text % 'hasan can saral'::text)
                                Buffers: shared hit=12809
"Settings: effective_cache_size = '16GB', search_path = 'master', work_mem = '32MB'"
Planning Time: 2.445 ms
Execution Time: 238.404 ms

Why does my query performs poorly, and how do I resolve it?

Update:

I got the slow query plan using auto_explain (the query took 25s to execute):

Limit  (cost=10645.98..10646.18 rows=10 width=20) (actual time=47782.688..47782.691 rows=0 loops=1)
  Output: filter_table.id
  Buffers: shared hit=3749709
  ->  Subquery Scan on filter_table  (cost=10645.98..10667.42 rows=1072 width=20) (actual time=47782.684..47782.687 rows=0 loops=1)
        Output: filter_table.id
        Buffers: shared hit=3749709
        ->  HashAggregate  (cost=10645.98..10656.70 rows=1072 width=536) (actual time=47782.682..47782.684 rows=0 loops=1)
"              Output: e.id, e.full_name"
"              Group Key: e.id, e.full_name"
              Batches: 1  Memory Usage: 73kB
              Buffers: shared hit=3749709
              ->  Append  (cost=1449.29..10640.62 rows=1072 width=536) (actual time=47782.666..47782.669 rows=0 loops=1)
                    Buffers: shared hit=3749709
                    ->  Bitmap Heap Scan on master.users e  (cost=1449.29..10533.25 rows=1068 width=27) (actual time=47636.308..47636.310 rows=0 loops=1)
"                          Output: e.id, e.full_name"
"                          Recheck Cond: (((e.full_name)::text % 'SOME RELATIVELY LONG VALUE HERE'::text) AND (e.list_id = ANY ('{1,3,5,6,7,10,20,30,40,50,60,70,80,90,100,110,120,130,140,150,160,170,180,190,200,210,220,230,240,250,260,270,280,300,310,320,330,340,350,360,370,380,390,400,410,420,430,440,450,460,470,480,490,500,510,520,530,540,550,560,570,580,590,600,610,620,630,640,650,660,670,680,690,700,710,720,730,740,750,760,770,780,790,800,810,820,830,840,850,860,870,880,890,900,910,920,930,940,950,960,970,980,990,1000,1010,1020,1030,1040,1050,1060,1070,1080,1090,1100,1110,1120,1130,1140,1150,1160,1170,1180,1190,1200,1210,1220,1230,1240,1250,1260,1270,1280,1290,1300,1310,1320,1330,1340,1350,1360,1370,1380,1390,1400,1410,1420,1430,1440,1450,1460,1470,1480,1490,1500,1510,1520,1530,1540,1550,1560,1570,1580,1590,1600,1610,1620,1630,1640,1650,1660,1670,1680,1690,1700,1710,1720,1730,1740,1750,1760,1770,1780,1790,1800,1810,1820,1830,1840,1850,1860,1870,1880,1890,1900,1910,1920,1930,1940,1950,1960,1970,1980,1990,2000,2010,2020,2030,2040,2050,2060,2070,2080,2090,2100,2110,2120,2130,2140,2150,2160,2170,2180,2190,2200,2210,2220,2230,2240,2250,2260,2270,2280,2290,2300,2310,2320,2330,2340,2350,2360,2370,2380,2390,2400,2410,2420,2430,2440,2450,2460,2470,2480,2490,2500,2510,2520,2530,2540,2550,2560,2570,2580,2590,2600,2610,2620,2630,2640,2650,2660,2670,2680,2690,2700,2710,2720,2730,2740,2750,2760,2770,2780,2790,2800,2810,2820,2830,2840,2850,2860,2870,2880,2890,2900,2910,2920,2930,2940,2950,2960,2970,2980,2990,3000,3010,3020,3030,3040,3050,3060,3070,3080,3090,3100,3110,3120,3130,3140,3150,3160,3170,3180}'::numeric[])) AND (e.deleted_at IS NULL))"
"                          Filter: (levenshtein_less_equal((e.full_name)::text, 'SOME RELATIVELY LONG VALUE HERE'::text, 2) <= 2)"
                          Rows Removed by Filter: 75
                          Heap Blocks: exact=57
                          Buffers: shared hit=3736002
                          ->  Bitmap Index Scan on users_full_name_idx  (cost=0.00..1448.22 rows=3203 width=0) (actual time=47634.248..47634.248 rows=75 loops=1)
"                                Index Cond: (((e.full_name)::text % 'SOME RELATIVELY LONG VALUE HERE'::text) AND (e.list_id = ANY ('{1,3,5,6,7,10,20,30,40,50,60,70,80,90,100,110,120,130,140,150,160,170,180,190,200,210,220,230,240,250,260,270,280,300,310,320,330,340,350,360,370,380,390,400,410,420,430,440,450,460,470,480,490,500,510,520,530,540,550,560,570,580,590,600,610,620,630,640,650,660,670,680,690,700,710,720,730,740,750,760,770,780,790,800,810,820,830,840,850,860,870,880,890,900,910,920,930,940,950,960,970,980,990,1000,1010,1020,1030,1040,1050,1060,1070,1080,1090,1100,1110,1120,1130,1140,1150,1160,1170,1180,1190,1200,1210,1220,1230,1240,1250,1260,1270,1280,1290,1300,1310,1320,1330,1340,1350,1360,1370,1380,1390,1400,1410,1420,1430,1440,1450,1460,1470,1480,1490,1500,1510,1520,1530,1540,1550,1560,1570,1580,1590,1600,1610,1620,1630,1640,1650,1660,1670,1680,1690,1700,1710,1720,1730,1740,1750,1760,1770,1780,1790,1800,1810,1820,1830,1840,1850,1860,1870,1880,1890,1900,1910,1920,1930,1940,1950,1960,1970,1980,1990,2000,2010,2020,2030,2040,2050,2060,2070,2080,2090,2100,2110,2120,2130,2140,2150,2160,2170,2180,2190,2200,2210,2220,2230,2240,2250,2260,2270,2280,2290,2300,2310,2320,2330,2340,2350,2360,2370,2380,2390,2400,2410,2420,2430,2440,2450,2460,2470,2480,2490,2500,2510,2520,2530,2540,2550,2560,2570,2580,2590,2600,2610,2620,2630,2640,2650,2660,2670,2680,2690,2700,2710,2720,2730,2740,2750,2760,2770,2780,2790,2800,2810,2820,2830,2840,2850,2860,2870,2880,2890,2900,2910,2920,2930,2940,2950,2960,2970,2980,2990,3000,3010,3020,3030,3040,3050,3060,3070,3080,3090,3100,3110,3120,3130,3140,3150,3160,3170,3180}'::numeric[])))"
                                Buffers: shared hit=3735945
                    ->  Bitmap Heap Scan on master.user_aliases a  (cost=9.36..91.29 rows=4 width=38) (actual time=146.350..146.351 rows=0 loops=1)
"                          Output: a.entry_id, a.full_name"
                          Recheck Cond: (((a.full_name)::text % 'SOME RELATIVELY LONG VALUE HERE'::text) AND (a.deleted_at IS NULL))
"                          Filter: ((a.list_id = ANY ('{1,3,5,6,7,10,20,30,40,50,60,70,80,90,100,110,120,130,140,150,160,170,180,190,200,210,220,230,240,250,260,270,280,300,310,320,330,340,350,360,370,380,390,400,410,420,430,440,450,460,470,480,490,500,510,520,530,540,550,560,570,580,590,600,610,620,630,640,650,660,670,680,690,700,710,720,730,740,750,760,770,780,790,800,810,820,830,840,850,860,870,880,890,900,910,920,930,940,950,960,970,980,990,1000,1010,1020,1030,1040,1050,1060,1070,1080,1090,1100,1110,1120,1130,1140,1150,1160,1170,1180,1190,1200,1210,1220,1230,1240,1250,1260,1270,1280,1290,1300,1310,1320,1330,1340,1350,1360,1370,1380,1390,1400,1410,1420,1430,1440,1450,1460,1470,1480,1490,1500,1510,1520,1530,1540,1550,1560,1570,1580,1590,1600,1610,1620,1630,1640,1650,1660,1670,1680,1690,1700,1710,1720,1730,1740,1750,1760,1770,1780,1790,1800,1810,1820,1830,1840,1850,1860,1870,1880,1890,1900,1910,1920,1930,1940,1950,1960,1970,1980,1990,2000,2010,2020,2030,2040,2050,2060,2070,2080,2090,2100,2110,2120,2130,2140,2150,2160,2170,2180,2190,2200,2210,2220,2230,2240,2250,2260,2270,2280,2290,2300,2310,2320,2330,2340,2350,2360,2370,2380,2390,2400,2410,2420,2430,2440,2450,2460,2470,2480,2490,2500,2510,2520,2530,2540,2550,2560,2570,2580,2590,2600,2610,2620,2630,2640,2650,2660,2670,2680,2690,2700,2710,2720,2730,2740,2750,2760,2770,2780,2790,2800,2810,2820,2830,2840,2850,2860,2870,2880,2890,2900,2910,2920,2930,2940,2950,2960,2970,2980,2990,3000,3010,3020,3030,3040,3050,3060,3070,3080,3090,3100,3110,3120,3130,3140,3150,3160,3170,3180}'::numeric[])) AND (levenshtein_less_equal((a.full_name)::text, 'SOME RELATIVELY LONG VALUE HERE'::text, 2) <= 2))"
                          Rows Removed by Filter: 4
                          Heap Blocks: exact=4
                          Buffers: shared hit=13707
                          ->  Bitmap Index Scan on user_aliases_full_name_idx  (cost=0.00..8.56 rows=20 width=0) (actual time=146.099..146.099 rows=4 loops=1)
                                Index Cond: ((a.full_name)::text % 'SOME RELATIVELY LONG VALUE HERE'::text)
                                Buffers: shared hit=13703
"Settings: effective_cache_size = '16GB', search_path = 'master', work_mem = '32MB'"
Planning Time: 5.653 ms
Execution Time: 47783.054 ms

Update #2:

The query is slow for some specific, and relatively long filter values, and fast for most.

Changing the query to:

SELECT id  
FROM (
    (SELECT
      e.id,  
      e.full_name  
    FROM users e  
    WHERE e.full_name % :value  
      AND e.deleted_at IS NULL  
      AND e.list_id IN (:lists)  
      LIMIT :limit)
    UNION
    (SELECT
      a.entry_id AS id,  
      a.full_name  
    FROM user_aliases a  
    WHERE a.full_name % :value  
      AND a.deleted_at IS NULL  
      AND a.list_id IN (:lists)
      LIMIT :limit)
) filter_table
WHERE LEVENSHTEIN_LESS_EQUAL(full_name, :value, :threshold) <= :threshold  
LIMIT :limit

basically, adding LIMIT clauses to every subquery of UNION makes the query fast again. So there is a solution. But why?

Start with the query plan, using explain(analyze, verbose, buffers, settings) and share the result in plain text as an update of your question. Besides that, do you need UNION or would UNION ALL be good enough? And why don't you filter in the subquery? — Frank Heikens
– Frank Heikens, Commented May 14 at 18:33
Offtopic: Why NUMERIC(20, 0) ? INT and BIGINT are smaller in size and are the natural choice for identifiers. The indexes will also be smaller, resulting in faster performance. — Frank Heikens
– Frank Heikens, Commented May 14 at 18:36
You need to capture also the plan when it is slow and compare the two. — mustaccio
– mustaccio, Commented May 15 at 17:59
I'd check the selectivity of the full name table. It looks like substring matching is proving less effective in specific subset your use case. If you look at the filter values that trigger it, is it a case of someone using a suffix/prefix naming scheme? — user1937198
– user1937198, Commented May 26 at 14:57

user1937198 · Accepted Answer · 2025-05-29 17:57:53Z

2

+50

The key difference between the two query plans is in the number of rows hit, with far more rows being retrieved in users_full_name_idx in the expensive case. Given that this is sub-string matching and that the example queries are relatively long, it suggests that some of your users are using prefix/suffix naming conventions such that large numbers of users have common substrings, which then makes looking up those substrings expensive.

answered May 29 at 17:57

user1937198

2761 silver badge5 bronze badges

Add a comment |

Stack Exchange Network

Postgres Query Slow 0.05% of the time

1 Answer 1

Your Answer

Hot Network Questions

Postgres Query Slow 0.05% of the time

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions