2

I currently am trying to create a function for a dataframe and is too complex for me. I have a dataframe that looks like this:

df1

     hour    production ....      
0     1          10
0     2          20
0     1          30
0     3          40
0     1          40
0     4          30
0     1          20
0     4          10

I am trying to create a function that would do the following:

  1. Group data by different hour
  2. Calculate 90% confidence interval of production for each hour
  3. If production value of a particular row falls outside the 90% confidence interval for it's respective hour, mark it as unusual by creating a new column

Below is the current step I am taking to do the above for each individual hours:

Calculate confidence interval

confidence = 0.90
data = df1['production ']
n = len(data)
m = mean(data)
std_err = sem(data)
h = std_err * t.ppf((1 + confidence) / 2, n - 1)
lower_interval = m - h
upper_interval = m + h

Then:

def confidence_interval(x):
if x['production'] > upper_interval  :
    return 1
if x['production'] < lower_interval :
    return 1
return 0

df1['unusual'] = df1.apply (lambda x: confidence_interval(x), axis=1)

I am doing this for each of the values in hour, than having to merge all the result together into one original dataframe.

Can anyone help me to crate a function that can do all the above at once? I had a go but just cant get my head around it.

1 Answer 1

2

Create custom function and use GroupBy.transform with Series.between and invert mask by ~:

from scipy.stats import sem, t
from scipy import mean

def confidence_interval(data):
    confidence = 0.90
    n = len(data)
    m = mean(data)
    std_err = sem(data)
    h = std_err * t.ppf((1 + confidence) / 2, n - 1)
    lower_interval = m - h
    upper_interval = m + h
    #print (lower_interval ,upper_interval)
    return ~data.between(lower_interval, upper_interval, inclusive=False)

df1['new'] = df1.groupby('hour')['production'].transform(confidence_interval).astype(int)
print (df1)
   hour  production  new
0     1          10    0
0     2          20    1
0     1          30    0
0     3          40    1
0     1          40    0
0     4          30    0
0     1          20    0
0     4          10    0
Sign up to request clarification or add additional context in comments.

2 Comments

reset_index may not needed if used tranform instead of apply.
@QuangHoang - Thank you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.