0

I'm struggling with the syntax for a join when I've got an array in stored in JSONB. I've searched for examples and I can't find the magic sauce that makes this work in PostgreSQL 9.6

I've got the following structure stored in a JSONB column in a table called disruption_history. The element is called data:

"message": {
        "id": 352,
        "preRecordedMessageList": {
            "preRecordedMessageCodes": [804, 2110, 1864, 1599]
        }
}

I then have another standard table called message_library

component_code       | integer                | not null
message_text         | character varying(255) | not null

I'm trying to produce the text for each set of message codes. So something like

SELECT
    ml.message_text 
FROM
    message_library ml, disruption_history dh
WHERE 
    jsonb_array_elements_text(dh.data->'message'->'preRecordedMessageList'
->'preRecordedMessageCodes')) = ml.component_code

I get

ERROR: operator does not exist: text = integer

even if I try to cast the numbers to integer I get argument of WHERE must not return a set.

Can someone help please?

1
  • Not sure if this matters, but you have an extra closing parenthesis on the last line. Commented Aug 3, 2017 at 16:45

2 Answers 2

1
select message_library.message_text 
from disruption_history
join lateral jsonb_array_elements_text(data->'message'->'preRecordedMessageList'->'preRecordedMessageCodes') v
on true
join message_library
on v.value::int = message_library.component_code
Sign up to request clarification or add additional context in comments.

2 Comments

Hey @Oto, do you think there would be merit in highlighting why your solution works. I am sure it does, but there is always a teaching moment available.
Thanks, this works too but suffers the same ordering issue as I've described above.
0

You can use the following query:

SELECT
    CAST(dh.data->'message'->>'id' AS INTEGER) AS message_id, 
    ml.message_text 
FROM
    disruption_history dh
    JOIN message_library ml 
        ON ml.component_code IN 
            (SELECT 
                CAST(jsonb_array_elements_text(
                   dh.data->'message'->'preRecordedMessageList'->'preRecordedMessageCodes'
                                              ) 
                AS INTEGER)
            ) ;

Note that I have used an explicit join (avoid the implicit ones!).

The trick here is to convert your preRecordedMessageCodes into a set of texts, by using the jsonb_array_elements_text function, that are further CAST to integer, and then compared to the ml.component_code (by using an IN condition):

You can check the whole setup at dbfiddle here

Note also that this structure produces an awful execution plan, that requires whole sequential scans of both tables. I have not been able to find any kind of index that helps the queries.

Note that this won't work if you have arrays with NULLs in them, which I assume wouldn't make sense.


Keeping order:

If you want to keep the elements of the array in order, you need to use a WITH ORDINALITY predicate to obtain not only the array element, but also its relative position, and use it to ORDER BY

-- Keeping order
SELECT
    CAST(dh.data->'message'->>'id' AS INTEGER) AS message_id, 
    ml.message_text 
FROM
    disruption_history dh
    JOIN LATERAL
        jsonb_array_elements_text(dh.data->'message'->'preRecordedMessageList'->'preRecordedMessageCodes')
        WITH ORDINALITY AS x(mc, ord) /* We will want to use 'ord' to order by */
        ON true
    JOIN message_library ml ON ml.component_code = cast(mc AS INTEGER)
ORDER BY
    message_id, ord ;

Watch this at dbfiddle here


Alternative:

If the structure of your json data is always the same, I would strongly recommend that you normalize your design (at least partially):

CREATE TABLE disruption_history_no_json
(
    disruption_history_id SERIAL PRIMARY KEY,
    message_id INTEGER,
    pre_recorded_message_codes INTEGER[]
) ;

CREATE INDEX idx_disruption_history_no_json_pre_recorded_message_codes
    ON disruption_history_no_json USING GIN (pre_recorded_message_codes) ;

Would allow for a much simpler and efficient and simpler query:

SELECT
    message_id, 
    ml.message_text 
FROM
    disruption_history_no_json dh
    JOIN message_library ml 
        ON ml.component_code = ANY(pre_recorded_message_codes) ;

Check everything together at dbfiddle here

JSON(B) allows you not to normalize, and not to have to think much about your table structures, but you pay a steep price in performance and maintainability.

3 Comments

Thanks for the comprehensive answer. This works however the codes in the JSON array are processed in a seemingly random order. I need them to be converted into text in the order in that array as they form a readable sentence. In my case I'm getting 'Marble Arch . is closed station" rather than 'Marble Arch station is closed.' This is the same for the answer from @Oto as well which works as well. Not sure which is the most efficient way of doing this of course.
Relational theory doesn't have the concept of order, so most of what you do in SQL doesn't have it either, unless you explicitly ask for a given order. You didn't specify this as a requirement. Never assume an order in SQL. See the updated versions that keeps the order.
That's nailed it, thanks a lot. I understand the lack of order in SQL. I just forgot to add it as a requirement :-) As you say it would be better to normalise but this is part of a complex changing structure, kind of the thing that the JSONB structure is used for.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.