51

It seems to me that the functionality of the PostgreSQL array datatype overlaps a lot with the standard one-to-many and many-to-many relationships.

For example, a table called users could have an array field called "favorite_colors", or there could be a separate table called "favorite_colors" and a join table between "users" and "favorite_colors".

In what cases is the array datatype OK to use instead of a full-blown join?

0

5 Answers 5

43

An array should not be used similar to a relation. It should rather contain indexed values that relate to one row very tightly. For example if you had a table with the results of a football match, than you would not need to do

id team1 team2 goals1 goals2

but would do

id team[2] goals[2]

Because in this example, most would also consider normalizing this into two tables would be silly.

So all in all I would use it in cases where you are not interested in making relations and where you else would add fields like field1 field2 field3.

Sign up to request clarification or add additional context in comments.

4 Comments

I would give two upvotes for this answer if it was possible ;-)
Another example, relating to machine-based translation. (Sort of.) stackoverflow.com/q/4967012/562459
If array data type is used, how would you extract the data of the performance of one team in all matches easily?
You'll typically want to know who was home and who was away. And to compare home vs away performance of a given team. But I'll pretend it's pong
11

I totally agree with @marc. Arrays are to be used when you are absolutely sure you don't need to create any relationship between the items in the array with any other table. It should be used for a tightly coupled one to many relationship.
A typical example is creating a multichoice questions system. Since other questions don't need to be aware of the options of a question, the options can be stored in an array.
e.g

CREATE TABLE Question (
  id integer PRIMARY KEY,
  question TEXT,
  options VARCHAR(255)[],
  answer VARCHAR(255)
)  

This is much better than creating a question_options table and getting the options with a join.

2 Comments

If I understand this correctly, another criterium is: even w/in one table, a row won't need a relationship with the items in another row.
One thing I will say specifically about that example is if the user can dynamically create question "forms", it may be useful to have sets of options that can be reused for specific questions, in which case a 1-M would likely be a better choice. Other than that great example
8

One incredibly handy use case is tagging:

CREATE TABLE posts (
    title TEXT,
    tags TEXT[]
);

-- Select all posts with tag 'kitty'
SELECT * FROM posts WHERE tags @> '{kitty}';

3 Comments

tsvectors are certainly better at doing this specific thing, but this is still a good example
Why is this the case? To me, tagging is the prima facie case for a fully normalized, many-to-many relationship.
@Dogweather Completely agree, this data model is very poorly thought through. OP, arrays are pretty much made for the exact opposite of this use case: read the giant green tip box on Postgre's ARRAYS page reading "Arrays are not sets; searching for specific array elements can be a sign of database misdesign.". If you're searching for posts by tags a lot (which is kind of the entire point of tags...), this is going to scale horrendously.
4

The Postgresql documentation gives good examples:

  CREATE TABLE sal_emp (
     name            text,
     pay_by_quarter  integer[],
     schedule        text[][]
 );

The above command will create a table named sal_emp with a column of type text (name), a one-dimensional array of type integer (pay_by_quarter), which represents the employee's salary by quarter, and a two-dimensional array of text (schedule), which represents the employee's weekly schedule.

Or, if you prefer:

 CREATE TABLE tictactoe (
     squares   integer[3][3] );

Comments

3

If I want to store some similar type of set of data, and those data don't have any other attribute.

I prefer to use arrays.

One example is :

Storing contact numbers for a user

So, when we want to store contact number, usually main one and a alternate one, in such case

I prefer to use array.

CREATE TABLE students (  
    name text,
    contacts varchar ARRAY -- or varchar[]
);

But if these data have additional attributes, say storing cards. A card can have expiry date and other details.

Also, storing tags as an array a bad idea. A tag can be associated to multiple posts.

Don't use arrays in such cases.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.