1

I have one dataframe with sessions - one session, one row, so SID is unique. The session has a doctor name.

SID Doctor Patient
1 robby david
2 langdon sara
3 langdon michael

I have another dataframe with the SID, and a record of who opened the patient file. The opening person can be either the doctor or anyone else from the clinic. If two different people from the clinic open the patient file in the SID, I will have two rows with the same SID, only different opener_name.

SID opener_name
1 robby
1 dana
2 dana

I want to generate a true/false column in the sessions dataframe for:

  • If the doctor opened the file

  • If anyone opened the file at all (either the doctor or anyone else)

Sessions were not necessarily opened by anyone, and if not wont appear at all.

The output I desire is this:

SID Doctor Patient is_doctor_opened is_anyone_opened
1 robby david True True
2 langdon sara False True
3 langdon michael False False

If I merge the two files on session ID, I will get duplicate rows, and I'm not sure how to rid of the duplicates in that scenario.

I've also tried playing around with simple booleans but I run into problems.

How do I get an organized dataframe with the booleans and keep it to one session, one row?

New contributor
Semyaz is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
2
  • 1
    Would be better if you add sample source tables and a desired resulting table. Commented Nov 16 at 12:22
  • I'm voting to reopen, but more details would help: 1) Show your code please, even if it's just the merge. For one thing, showing the variable names helps keep answers consistent. 2) What do you mean by "playing around with simple booleans"? Please edit to clarify and show code if possible. 3) What research have you done? E.g. are you aware of .drop_duplicates()? Commented Nov 18 at 13:54

2 Answers 2

1

Nice — this is a classic join + groupby task in pandas. Two clean approaches below (both give the exact output you showed). Pick whichever reads better to you.

I'll use your example data and show code + result.

import pandas as pd

sessions = pd.DataFrame({
    'SID': [1, 2, 3],
    'Doctor': ['robby', 'langdon', 'langdon'],
    'Patient': ['david', 'sara', 'michael']})

openers = pd.DataFrame({
    'SID': [1, 1, 2],
    'opener_name': ['robby', 'dana', 'dana']})

Method A — simple & fast (using sets)

# who opened anything
opened_sids = set(openers['SID'].unique())
sessions['is_anyone_opened'] = sessions['SID'].isin(opened_sids)

# build a set of (SID, opener) pairs that match doctor
# merge to get doctor on the opener rows
merged = openers.merge(sessions[['SID', 'Doctor']], on='SID', how='left')
# rows where opener == doctor
doctor_open_rows = merged[
    merged['opener_name'] == merged['Doctor']
]['SID'].unique()
sessions['is_doctor_opened'] = sessions['SID'].isin(doctor_open_rows)

# ensure boolean dtype
sessions['is_anyone_opened'] = sessions['is_anyone_opened'].astype(bool)
sessions['is_doctor_opened'] = sessions['is_doctor_opened'].astype(bool)

print(sessions)

Method B — explicit merge + groupby (robust, great for large data)

This uses groupby(...).any() so duplicates don’t make extra rows.

# 1) is_anyone_opened: any record for SID?
anyone = openers[['SID']].drop_duplicates().assign(is_anyone_opened=True)

# 2) is_doctor_opened: merge opener rows with sessions to compare 
# names, then groupby SID
merged = openers.merge(sessions[['SID', 'Doctor']], on='SID', how='left')
merged['is_doctor_opened'] = merged['opener_name'] == merged['Doctor']
doctor_flag = merged.groupby('SID', as_index=False)['is_doctor_opened'].any()

# 3) left-join these flags back to sessions; missing -> False
result = (
    sessions
    .merge(anyone, on='SID', how='left')
    .merge(doctor_flag, on='SID', how='left')
    .fillna({'is_anyone_opened': False, 'is_doctor_opened': False})
)

# convert to bool
result['is_anyone_opened'] = result['is_anyone_opened'].astype(bool)
result['is_doctor_opened'] = result['is_doctor_opened'].astype(bool)

print(result)
Sign up to request clarification or add additional context in comments.

Comments

0

If I merge the two files on session ID, I will get duplicate rows, and I'm not sure how to [get] rid of the duplicates in that scenario.

For is_doctor_opened, merge on Doctor as well. For is_anyone_opened, de-dupe before merging.

Here I'm going to merge the actual values then do .notna() after to get the boolean desired result. This technique has the best intermediate values IMHO.

Style note: I like to use chaining, but this chain is pretty lengthy so you might prefer to rewrite it in a more imperative style.

(
    sessions
    # Setup for `is_doctor_opened`
    .merge(
        openers.rename(columns={'opener_name': 'doctor_opener_name'}),
        left_on=['SID', 'Doctor'],
        right_on=['SID', 'doctor_opener_name'],
        how='left',
    )
    # Setup for `is_anyone_opened`
    .merge(
        openers.groupby('SID').agg(list),
        on='SID',
        how='left',
    )
    # Switch to boolean
    .assign(
        is_doctor_opened=lambda d: d['doctor_opener_name'].notna(),
        is_anyone_opened=lambda d: d['opener_name'].notna(),
    )
    .drop(columns=['doctor_opener_name', 'opener_name'])
)
   SID   Doctor  Patient  is_doctor_opened  is_anyone_opened
0    1    robby    david              True              True
1    2  langdon     sara             False              True
2    3  langdon  michael             False             False

Intermediates:

   SID   Doctor  Patient doctor_opener_name    opener_name
0    1    robby    david              robby  [robby, dana]
1    2  langdon     sara                NaN         [dana]
2    3  langdon  michael                NaN            NaN

1 Comment

P.S. I tried a few variations before settling on this one, including setting the "opened" columns before merging. This version is the cleanest and has the best intermediates IMHO.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.