3

Say I have this table:

id timeline
1 BASELINE
1 MIDTIME
1 ENDTIME
2 BASELINE
2 MIDTIME
3 BASELINE
4 BASELINE
5 BASELINE
5 MIDTIME
5 ENDTIME
6 MIDTIME
6 ENDTIME
7 RISK
7 RISK

So this is what the data looks like except the data has more observations (few thousands)

How do I get the output so that it will look like this:

id timeline
1 BASELINE
1 MIDTIME
2 BASELINE
2 MIDTIME
5 BASELINE
5 MIDTIME

How do I select the first two terms of each ID which has 2 specific timeline values (BASELINE and MIDTIME)? Notice id 6 has MIDTIME and ENDTIME,and id 7 has two RISK I don't want these two ids.

I used

SELECT * 
FROM df 
WHERE id IN (SELECT id FROM df GROUP BY id HAVING COUNT(*)=2)

and got IDs with two timeline values (output below) but don't know how to get rows with only BASELINE and MIDTIME.

id  timeline   
---|--------|
 1 | BASELINE |
 1 | MIDTIME  |
 2 | BASELINE |
 2 | MIDTIME  | 
 5 | BASELINE | 
 5 | MIDTIME  |
 6 | MIDTIME  |    ---- dont want this
 6 | ENDTIME  |    ---- dont want this
 7 | RISK     |    ---- dont want this
 7 | RISK     |    ---- dont want this

Many Thanks.

4
  • Which SQL engine are you using? Commented Jul 22, 2020 at 5:07
  • I am using sql package in R actually. Commented Jul 22, 2020 at 5:08
  • Also, do the first two timeline values need to be BASELINE and MIDTIME? If so, how do you order the timeline values? Commented Jul 22, 2020 at 5:09
  • Yes. the two values needed to be Baseline and Midtime Commented Jul 22, 2020 at 5:10

3 Answers 3

2

You can try using exists -

DEMO

    select * from t t1 where timeline in ('BASELINE','MIDTIME') and
    exists
     (select 1 from t t2 where t1.id=t2.id and timeline in ('BASELINE','MIDTIME')
            group by t2.id having count(distinct timeline)=2)

OUTPUT:

id  timeline
1   BASELINE
1   MIDTIME
2   BASELINE
2   MIDTIME
5   BASELINE
5   MIDTIME
Sign up to request clarification or add additional context in comments.

1 Comment

@Iwishworldpeace, glad that it helped u :)
0

I think this query should give you the result you want.

NOTE: As i understand, you don't want the ID where exists a "ENDTIME", and in your sample data, there is an "ENDTIME" for ID 1. I assumed this was an error so i made a query that excludes all id containing "ENDTIME".

WITH CTE AS
(
    SELECT
        id
    FROM
        df
    WHERE
        timeline IN ('ENDTIME', 'RISK')
)
SELECT
    id,
    timeline
FROM
    df
WHERE
    id NOT IN (SELECT id FROM CTE);

4 Comments

ID 1 has 3 values and only those 2 values with BASELINE and MIDTIME are needed.
So you only want to exclude the ID 6 for having that value? the ID 6 is the last id (MAX)? If so, it's easy to fix it. If not, you should give more criteria to exlcude ID's more than having the "ENDTIME" value.
OK, I think I will add more to clarify.
Got it, define which are all the criterias to bring a record or to exclude it. For Example: IF an ID have BASELINE, MIDTIME and ENDTIME, then should bring BASE and MID but not END. IF an ID have only MIDTIME and ENDTIME should not be included.
0

There's probably a number of ways to do this, here's one way that will pick up BASELINE and MIDTIME rows where only they exist, ensuring there are only 2 rows per returned ID. Without knowing the ordering of timeline, it's not possible to go further I don't think:

SELECT
      id
    , timeline
FROM (
    SELECT
          *
        , SUM(CASE WHEN timeline = 'BASELINE' THEN 1 ELSE 0 END) OVER (PARTITION BY id) AS BaselineCount
        , SUM(CASE WHEN timeline = 'MIDTIME' THEN 1 ELSE 0 END) OVER (PARTITION BY id) AS MidtimeCount
    FROM df
    WHERE df.timeline IN ('BASELINE', 'MIDTIME')
) subquery
WHERE subquery.BaselineCount > 0
AND subquery.MidtimeCount > 0
GROUP BY
      id
    , timeline
;

2 Comments

Updated the answer to fix a bug
BaselineCount = sum(...) is invalid standard SQL (unless there is a column named BaselineCount in the table)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.