How to select rows with conditional values of one column in SQL

Question

Say I have this table:

id	timeline
1	BASELINE
1	MIDTIME
1	ENDTIME
2	BASELINE
2	MIDTIME
3	BASELINE
4	BASELINE
5	BASELINE
5	MIDTIME
5	ENDTIME
6	MIDTIME
6	ENDTIME
7	RISK
7	RISK

So this is what the data looks like except the data has more observations (few thousands)

How do I get the output so that it will look like this:

id	timeline
1	BASELINE
1	MIDTIME
2	BASELINE
2	MIDTIME
5	BASELINE
5	MIDTIME

How do I select the first two terms of each ID which has 2 specific timeline values (BASELINE and MIDTIME)? Notice id 6 has MIDTIME and ENDTIME,and id 7 has two RISK I don't want these two ids.

I used

SELECT * 
FROM df 
WHERE id IN (SELECT id FROM df GROUP BY id HAVING COUNT(*)=2)

and got IDs with two timeline values (output below) but don't know how to get rows with only BASELINE and MIDTIME.

id  timeline   
---|--------|
 1 | BASELINE |
 1 | MIDTIME  |
 2 | BASELINE |
 2 | MIDTIME  | 
 5 | BASELINE | 
 5 | MIDTIME  |
 6 | MIDTIME  |    ---- dont want this
 6 | ENDTIME  |    ---- dont want this
 7 | RISK     |    ---- dont want this
 7 | RISK     |    ---- dont want this

Many Thanks.

Also, do the first two timeline values need to be BASELINE and MIDTIME? If so, how do you order the timeline values? — e_i_pi
– e_i_pi, Commented Jul 22, 2020 at 5:09

Fahmi · Accepted Answer · 2020-07-22 05:15:13Z

2

You can try using exists -

DEMO

    select * from t t1 where timeline in ('BASELINE','MIDTIME') and
    exists
     (select 1 from t t2 where t1.id=t2.id and timeline in ('BASELINE','MIDTIME')
            group by t2.id having count(distinct timeline)=2)

OUTPUT:

id  timeline
1   BASELINE
1   MIDTIME
2   BASELINE
2   MIDTIME
5   BASELINE
5   MIDTIME

edited Jul 22, 2020 at 5:15

answered Jul 22, 2020 at 5:03

Fahmi

37.5k5 gold badges26 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Fahmi Over a year ago

@Iwishworldpeace, glad that it helped u :)

Ivan Verges · Accepted Answer · 2020-07-22 05:37:33Z

0

I think this query should give you the result you want.

NOTE: As i understand, you don't want the ID where exists a "ENDTIME", and in your sample data, there is an "ENDTIME" for ID 1. I assumed this was an error so i made a query that excludes all id containing "ENDTIME".

WITH CTE AS
(
    SELECT
        id
    FROM
        df
    WHERE
        timeline IN ('ENDTIME', 'RISK')
)
SELECT
    id,
    timeline
FROM
    df
WHERE
    id NOT IN (SELECT id FROM CTE);

edited Jul 22, 2020 at 5:37

answered Jul 22, 2020 at 5:09

Ivan Verges

6413 gold badges11 silver badges26 bronze badges

4 Comments

Iwishworldpeace Over a year ago

ID 1 has 3 values and only those 2 values with BASELINE and MIDTIME are needed.

Ivan Verges Over a year ago

So you only want to exclude the ID 6 for having that value? the ID 6 is the last id (MAX)? If so, it's easy to fix it. If not, you should give more criteria to exlcude ID's more than having the "ENDTIME" value.

Iwishworldpeace Over a year ago

OK, I think I will add more to clarify.

Ivan Verges Over a year ago

Got it, define which are all the criterias to bring a record or to exclude it. For Example: IF an ID have BASELINE, MIDTIME and ENDTIME, then should bring BASE and MID but not END. IF an ID have only MIDTIME and ENDTIME should not be included.

e_i_pi · Accepted Answer · 2020-07-22 21:25:21Z

0

There's probably a number of ways to do this, here's one way that will pick up BASELINE and MIDTIME rows where only they exist, ensuring there are only 2 rows per returned ID. Without knowing the ordering of timeline, it's not possible to go further I don't think:

SELECT
      id
    , timeline
FROM (
    SELECT
          *
        , SUM(CASE WHEN timeline = 'BASELINE' THEN 1 ELSE 0 END) OVER (PARTITION BY id) AS BaselineCount
        , SUM(CASE WHEN timeline = 'MIDTIME' THEN 1 ELSE 0 END) OVER (PARTITION BY id) AS MidtimeCount
    FROM df
    WHERE df.timeline IN ('BASELINE', 'MIDTIME')
) subquery
WHERE subquery.BaselineCount > 0
AND subquery.MidtimeCount > 0
GROUP BY
      id
    , timeline
;

edited Jul 22, 2020 at 21:25

answered Jul 22, 2020 at 5:15

e_i_pi

4,8605 gold badges32 silver badges47 bronze badges

2 Comments

e_i_pi Over a year ago

Updated the answer to fix a bug

user330315 Over a year ago

BaselineCount = sum(...) is invalid standard SQL (unless there is a column named BaselineCount in the table)

Collectives™ on Stack Overflow

How to select rows with conditional values of one column in SQL

3 Answers 3

1 Comment

4 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

4 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related