Say I have this table:
| id | timeline |
|---|---|
| 1 | BASELINE |
| 1 | MIDTIME |
| 1 | ENDTIME |
| 2 | BASELINE |
| 2 | MIDTIME |
| 3 | BASELINE |
| 4 | BASELINE |
| 5 | BASELINE |
| 5 | MIDTIME |
| 5 | ENDTIME |
| 6 | MIDTIME |
| 6 | ENDTIME |
| 7 | RISK |
| 7 | RISK |
So this is what the data looks like except the data has more observations (few thousands)
How do I get the output so that it will look like this:
| id | timeline |
|---|---|
| 1 | BASELINE |
| 1 | MIDTIME |
| 2 | BASELINE |
| 2 | MIDTIME |
| 5 | BASELINE |
| 5 | MIDTIME |
How do I select the first two terms of each ID which has 2 specific timeline values (BASELINE and MIDTIME)? Notice id 6 has MIDTIME and ENDTIME,and id 7 has two RISK I don't want these two ids.
I used
SELECT *
FROM df
WHERE id IN (SELECT id FROM df GROUP BY id HAVING COUNT(*)=2)
and got IDs with two timeline values (output below) but don't know how to get rows with only BASELINE and MIDTIME.
id timeline
---|--------|
1 | BASELINE |
1 | MIDTIME |
2 | BASELINE |
2 | MIDTIME |
5 | BASELINE |
5 | MIDTIME |
6 | MIDTIME | ---- dont want this
6 | ENDTIME | ---- dont want this
7 | RISK | ---- dont want this
7 | RISK | ---- dont want this
Many Thanks.