2

I have table with 50 mln rows. One column named u_sphinx is very important available values are 1,2,3. Now all rows have value 3 but, when i checking for new rows (u_sphinx = 1) the query is very slow. What could be wrong ? Maybe index is broken ? Server: Debian, 8GB 4x Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz

Table structure:

base=> \d u_user
Table "public.u_user"
         Column          |       Type        |                       Modifiers                       
 u_ip                    | character varying | 
 u_agent                 | text              | 
 u_agent_js              | text              | 
 u_resolution_id         | integer           | 
 u_os                    | character varying | 
 u_os_id                 | smallint          | 
 u_platform              | character varying | 
 u_language              | character varying | 
 u_language_id           | smallint          | 
 u_language_js           | character varying | 
 u_cookie                | smallint          | 
 u_java                  | smallint          | 
 u_color_depth           | integer           | 
 u_flash                 | character varying | 
 u_charset               | character varying | 
 u_doctype               | character varying | 
 u_compat_mode           | character varying | 
 u_sex                   | character varying | 
 u_age                   | character varying | 
 u_theme                 | character varying | 
 u_behave                | character varying | 
 u_targeting             | character varying | 
 u_resolution            | character varying | 
 u_user_hash             | bigint            | 
 u_tech_hash             | character varying | 
 u_last_target_data_time | integer           | 
 u_last_target_prof_time | integer           | 
 u_id                    | bigint            | not null default nextval('u_user_u_id_seq'::regclass)
 u_sphinx                | smallint          | not null default 1::smallint
Indexes:
    "u_user_u_id_pk" PRIMARY KEY, btree (u_id)
    "u_user_hash_index" btree (u_user_hash)
    "u_user_u_sphinx_ind" btree (u_sphinx)

Slow query:

base=> explain analyze SELECT u_id FROM u_user WHERE u_sphinx = 1 LIMIT 1;
                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.15 rows=1 width=8) (actual time=485146.252..485146.252 rows=0 loops=1)
   ->  Seq Scan on u_user  (cost=0.00..3023707.80 rows=19848860 width=8) (actual time=485146.249..485146.249 rows=0 loops=1)
         Filter: (u_sphinx = 1)
         Rows Removed by Filter: 23170476
 Total runtime: 485160.241 ms
(5 rows)

Solved:

After adding partial index

base=> explain analyze SELECT u_id FROM u_user WHERE u_sphinx = 1 LIMIT 1;
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.27..4.28 rows=1 width=8) (actual time=0.063..0.063 rows=0 loops=1)
   ->  Index Scan using u_user_u_sphinx_index_1 on u_user  (cost=0.27..4.28 rows=1 width=8) (actual time=0.061..0.061 rows=0 loops=1)
         Index Cond: (u_sphinx = 1)
 Total runtime: 0.106 ms

Thx for @Kouber Saparev

0

2 Answers 2

3

Try making a partial index.

CREATE INDEX u_user_u_sphinx_idx ON u_user (u_sphinx) WHERE u_sphinx = 1;
Sign up to request clarification or add additional context in comments.

3 Comments

Delete actual index and create 3 more for each value? Or not deleting first index ?
It depends on what other queries you have over that data, but generally if really almost "all rows have value 3" as you wrote, then you don't need the old index at all. It won't be used for the value 3 anyway.
What happens if you just add that index and run EXPLAIN ANALYZE?
1

Your query plan looks like the DB is treating the query as if 1 was so common in the DB that it'll be better off digging into a disk page or two in order to identify a relevant row, instead of adding the overhead of plowing through an index and finding a row in a random disk page.

This could be an indication that you forgot to run to analyze the table so the planner has proper stats:

analyze u_user

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.