Is it better to use tables instead of arrays field type in PostgreSql when arrays do not exceed 50 elements?

Question

Or better said: When to use array as a field data type in a table?

Which solution provides better search results?

Possible duplicate: dba.stackexchange.com/questions/130350/… — Franklin Yu
– Franklin Yu, Commented Sep 30, 2016 at 22:00

Milen A. Radev · Accepted Answer · 2008-11-11 10:53:15Z

17

I avoid arrays for 2 reasons:

by storing more than one attribute value in a cell you violate the first normal form (theoretical);
you have to perform some extra, non-SQL related, processing each time you need to work with individual elements of the arrays (practical, but a direct consequence of the theoretical one)

answered Nov 11, 2008 at 10:53

Milen A. Radev

63.2k22 gold badges111 silver badges112 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Florin Over a year ago

I'm a contractor (construction) and have little time for theory. Does that sound bad? I respect that but I do what comes first and works. Why do rdbms vendors provide array data types (for tables) if it is in violation of theory?

Milen A. Radev Over a year ago

Yep, it sounds bad. Theory exists to provide solid foundation for the engineering. There are various reasons for the fact that SQL vendors ignore theory - they have little time for theory, they don't know the theory, they copy features from competitors etc.

Florin Over a year ago

Point taken. I go back to my hammer and saw.

Rol Over a year ago

I wonder if arrays have a niche for creating many-to-many relations that need to be updated atomically.

Krusty the Clown Over a year ago

I get keeping things standard SQL (probably for different reasons than you...), but not fitting "theoretically"? That's a really dumb reason to avoid a valuable feature.

Dana the Sane · Accepted Answer · 2008-11-11 04:00:33Z

14

I've considered this problem as well and the conclusion that I came to, is to use arrays when you want to eliminate table joins. The number of elements contained in each array isn't as important as the size of the tables involved. If there are only a few thousand rows in each table, then joining to get the 50 sub rows shouldn't be a big problem. If you get into 10's or 100's of thousands or rows, you're likely to start chewing through a lot of processor time and disk i/o though.

answered Nov 11, 2008 at 4:00

Dana the Sane

15.2k8 gold badges61 silver badges81 bronze badges

2 Comments

Florin Over a year ago

I could use the zip per county analogy. A county has so many zips. If a table T record needs to know of two counties, how many zips does the row know of? Do I keep an array of immutable county names in T and county-zip in T2?

Dana the Sane Over a year ago

I believe that GiST indexes can handle that sort of problem. In general, DBMS's don't deal well with this well though. This question also applies stackoverflow.com/questions/256997/…

foges · Accepted Answer · 2023-09-30 19:20:24Z

I know this post is 15 years old at this point, but it was my top Google result, so I figured I'd chime in.

As always it depends on what your data distribution looks like and what your queries look like. A common use-case where arrays comes up is tags (think hash-tags, etc..). Even this post is tagged with [arrays] and [postgresql]. Tags typically have a fairly heavy tail distribution (a few tags account for the vast majority of occurrences). For this type of data, and unless you're just counting tags you'll almost always be better of using an array of strings. The rationale is that most of the time you probably care about tags associated with individual documents.

There is a great post about it on database soup. The conclusion is:

The overall winner is an array of text, with a GIN index. This is better for one-tag searches, worlds faster for two-tag searches, and competitive at other tasks. It's also the smallest representation, and becomes smaller and faster still if you actually put the array of tags in the documents table. Still, there are times that you would want to use the traditional child table with plain text tags: if you build tag clouds a lot or if you never search for two tags and your ORM can't deal with Postgres arrays.

mattdlockyer · Accepted Answer · 2019-12-29 03:36:11Z

Don't know how long these links stay live so I'll paste the results below: http://sqlfiddle.com/#!17/55761/2

TLDR; searching a table index and then joining is fast, BUT adding a GIN index (using gin__int_ops) to a single table with an array column can be faster. Additionally, the flexibility of being able to match "some" or a small number of your array values might be a better option e.g. a tagging system.

create table data (
    id serial primary key,
    tags int[],
    data jsonb
);

create table tags (
    id serial primary key,
    data_id int references data(id)
);

CREATE INDEX gin_tags ON data USING GIN(tags gin__int_ops); 

SET enable_seqscan to off;

with rand as (SELECT generate_series(1,100000) AS id)
insert into data (tags) select '{5}' from rand;

update data set tags = '{1}' where id = 47300;

with rand as (SELECT generate_series(1,100000) AS id)
INSERT INTO tags(data_id) select id from rand;

Running:

  select data.id, data.data, data.tags
  from data, tags where tags.data_id = data.id and tags.id = 47300;

and

  select data.id, data.data, data.tags
  from data where data.tags && '{1}';

Yields:

Record Count: 1; Execution Time: 3ms
QUERY PLAN
Nested Loop (cost=0.58..16.63 rows=1 width=61)
-> Index Scan using tags_pkey on tags (cost=0.29..8.31 rows=1 width=4)
Index Cond: (id = 47300)
-> Index Scan using data_pkey on data (cost=0.29..8.31 rows=1 width=61)
Index Cond: (id = tags.data_id)

and

Record Count: 1; Execution Time: 1ms
QUERY PLAN
Bitmap Heap Scan on data (cost=15.88..718.31 rows=500 width=61)
Recheck Cond: (tags && '{1}'::integer[])
-> Bitmap Index Scan on gin_tags (cost=0.00..15.75 rows=500 width=0)
Index Cond: (tags && '{1}'::integer[])

Will Hartung · Accepted Answer · 2008-11-11 04:00:51Z

0

The tables will always provide better search results assuming you're querying something within the actual array. With a subtable, you can index the contents trivially, whereas with an array, you'd have to literally create 50 indexes (one for each potential element within the array).

answered Nov 11, 2008 at 4:00

Will Hartung

119k20 gold badges134 silver badges209 bronze badges

3 Comments

Dana the Sane Over a year ago

I don't think this is the case, from what I've read, you can create indexes on arrays just like any other type of column type.

Will Hartung Over a year ago

That may be correct, but all of the examples I saw were using expression indexes tied to specific elements of the array.

user330315 Over a year ago

No, you don't have to create 50 indexes. You create just one on the array

trashgenerator · Accepted Answer · 2019-05-13 09:08:22Z

0

I think that arrays have to be used for some custom data. But for foreign keys - it's better to use link table (or something else but column per key). This way you have data control at DB level and easy queries for join - you need for join even if you have them in arrays (for full data set) - but arrays much more complicated than "standart" SQL. P.S. Sorry bad english

answered May 13, 2019 at 9:08

trashgenerator

4707 silver badges24 bronze badges

Collectives™ on Stack Overflow

Is it better to use tables instead of arrays field type in PostgreSql when arrays do not exceed 50 elements?

6 Answers 6

5 Comments

2 Comments

Comments

Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

5 Comments

2 Comments

Comments

Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related