0

I have an SQL query which I want to implement it on the pandas dataframe data. This SQL is actually filtering the t_id if the e_count is null for more than 90 percent of the cases for type HIST. I have dataframe with all the columns but need to implement this logic in Python Pandas dataframe

Below sql query:-

SELECT * 
FROM 
(
    SELECT 
        t_id,
        s_id,
        t_name,
         type,
        act_dt
    FROM
        tblstg 
)t
WHERE t.t_id NOT IN
(
    SELECT t_id FROM
     (   
        SELECT t_id,CASE WHEN (CAST(SUM(CASE WHEN act_dt IS NULL THEN 1 ELSE 0 END) AS FLOAT)/count(*) * 100) > 90 THEN 1 ELSE 0 END AS removal_flg
        FROM tblstg
        WHERE type = 'HIST'
        GROUP BY t_id 
     )st
     where st.removal_flg = 1 

Dataframe :

I have a pandas dataframe like

t_id   s_id      t_name    type   act_dt
    T1     china     android   HIST   jan
    T1     mumbai    android   HIST   feb
    T1     dubai     apple     EXT  
    T2     japan     nokia     HIST   
    T3     japan     apple     HIST   jan
    T3     koria     HIST   
    T3     japan1    apple     HIST   
    T3     japan2    apple     HIST 
    T3     japan3    apple     HIST 
    T3     japan4    apple     HIST 
   T3     japan5    apple     HIST 
   T3     japan6    apple     HIST 
   T3     japan7    apple     HIST 
   T3     japan8    apple     HIST 
    T3     dubai     nokia     EXT 

final result:

t_id   s_id      t_name    type   act_dt
T1     china     android   HIST   jan
T1     mumbai    android   HIST   feb
T1     dubai     apple     EXT    

i.e. drop T2 and T3 as 90% records for type HIST for that t_id for any s_id, act_dt is null.

I have written the below code to identify the t_id's that has got 90% of data for act_dt as null so i can drop it from main dataframe raw_data but it is giving error from 3rd line. How can I get the list of t_id's that meet the criteria?

I have written the below code to find out the t_id's that

raw_data['filter_dt'] = raw_data['act_dt'].isnull().astype(int)
            
filterrecords = raw_data[raw_data.type == 'HIST'].groupby("t_id").filter_dt.sum()

countoftids = raw_data[raw_data.type == 'HIST'].groupby("t_id").count()

finalflg = filterrecords / countoftids

finaltids = raw_data['t_id'][finalflg > 0.90]
       
4
  • Why don't you make an attempt first? Commented Jun 21, 2017 at 12:32
  • added code that i tried but i am struggling to achieve that so I asked the question to the forum Commented Jun 21, 2017 at 12:42
  • Do you have a sample input data and expected output? Commented Jun 21, 2017 at 13:08
  • added..if that helps Commented Jun 21, 2017 at 13:30

1 Answer 1

2

Let's try this:

df.groupby('t_id').filter(lambda x: (x[x['type'] == "HIST"].act_dt.count() / x[x['type'] == "HIST"].act_dt.values.shape[0]) > .9)

OUtput:

  t_id    s_id   t_name  type act_dt
0   T1   china  android  HIST    jan
1   T1  mumbai  android  HIST    feb
2   T1   dubai    apple   EXT    NaN

To see a list of unique t_id retained...

print(df.groupby('t_id').filter(lambda x: (x[x['type'] == "HIST"].act_dt.count() / x[x['type'] == "HIST"].act_dt.values.shape[0]) > .9)['t_id'].unique())

Output:

['T1']
Sign up to request clarification or add additional context in comments.

8 Comments

Thank you scott..i tried this but got the below error"ufunc true_divide cannot use operands with types dtype('<M8[ns]') and dtype('int64')"
Ttry that new statement... df.groupby('t_id').filter(lambda x: (x[x['type'] == "HIST"].act_dt.count() / x[x['type'] == "HIST"].act_dt.values.shape[0]) > .9)
Thanks Scott! i got the output..but is there any way to get the unique t_id's that got skipped or dropped so i can test the same?
If you add to the end of that statement ['t_id'].unique() it will give you a list of unique ids.
hi scott..it is working fine but it is somehow including some other records as well..like there is one t_id that has 3 instances for type HIST and out of which 2 are null ..note there are EXT instances as well..but we are only interested in HIST...any idea why?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.