0

New to Bigquery, and googling could not really point me to the solution of the problem.

I am trying to use a where clause in a subquery to filter and pick the latest row for each other row in the main query. In postgres I'd normally do it like this:

SELECT
    *
FROM
    table_a AS a
    LEFT JOIN LATERAL
    (
        SELECT
            score,
            CONCAT( "AB", id ) AS id 
        FROM
            table_b AS b
        WHERE
            id = a.company_id
            and 
            b.date < a.date
        ORDER BY
            b.date DESC 
        LIMIT
            1
    ) ON true

WHERE
    id LIKE 'AB%'

ORDER BY
    createdAt DESC

so this would essentially run the subquery against each row and pick the latest row from table B based on a given row's date from table A.

So if table A would have a row

id date
12 2021-05-XX

and table B:

id date value
12 2022-01-XX 99
12 2021-02-XX 98
12 2020-03-XX 97
12 2019-04-XX 96

It would have joined only the row with 2021-02-XX to table a.


In another example, with

Table A:

id date
15 2021-01-XX

Table B:

id date value
15 2022-01-XX 99
15 2021-02-XX 98
15 2020-03-XX 97
15 2019-04-XX 96

it would join only the row with date: 2020-03-XX, value: 97.

Hope that is clear, not really sure how to write this query to work

Thanks for help!

6
  • Please explain your Id column in Table B, because those id values are not unique identifiers... Commented Sep 23, 2022 at 9:27
  • Some of your logic doesn't make sense for the values you are wanting to return. For example b.date > a.updatedAt then a limit to get the most recent, but you say in the first example the table_a value should be 2021-05-XX while the table_b value should be 2021-02-XX, similarly with your second example table_a = 2021-01-XX , table_b = 2022-03-XX. In both cases there are more recent values. Is there additional logic that you are applying? Commented Sep 23, 2022 at 9:50
  • @Dai ID is identifier of customer, where can have multiple value entries but only one for each date, so ID is not unique. Commented Sep 23, 2022 at 9:55
  • @DanielZagales I think mod editing my post (when formatting the tables) changed my dates of years.. let me edit it Commented Sep 23, 2022 at 9:57
  • @DanielZagales previously i had it by year Commented Sep 23, 2022 at 9:58

2 Answers 2

1

You can replace some of your correlated sub-select logic with a simple join and qualify statement.

Try the following:

SELECT *
FROM table_a a
LEFT JOIN table_b b
  ON a.id = b.id
WHERE b.date < a.date
QUALIFY ROW_NUMBER() OVER (PARTITION BY b.id ORDER BY b.date desc) = 1

With your sample data it produces: enter image description here

This should work for both truncated dates (YYYY-MM) as well as full dates (YYYY-MM-DD)

Sign up to request clarification or add additional context in comments.

Comments

0

Something like below should work for your requirements

    WITH
  latest_record AS (
  SELECT
    a.id,
    value,b.date, a.createdAt
  FROM
    `gcp-project-name.data-set-name.A` AS a
  JOIN
    `gcp-project-name.data-set-name.B` b
  ON
    ( a.id = b.id
      AND b.date < a.updatedAt )
  ORDER BY
    b.date DESC
  LIMIT
    1 )
SELECT
  *
FROM
  latest_record

I ran this with table A as

enter image description here

and table B as

enter image description here

and get result

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.