Subset a df using an if statement - Pandas

Question

I am hoping to create and return a subsetted df using an if statement. Specifically, for the code below, I have two different sets of values. The df I want to return will vary based on one of these values.

Using the code below, the specific value will be within normal and different. The value in place will dictate how the df will be subsetted.

Below is my attempt. The value in place will only ever be a single value, so it won't match the lists in full. Is it possible to return the df when the value in place is equal to a single value with these lists?

I'm hoping to return df1 to be used for subsequent tasks.

import pandas as pd

df = pd.DataFrame({
    'period' : [1.0, 1.0, 2.0, 2.0, 3.0, 4.0, 5.0, 7.0, 7.0, 8.0, 9.0],                                
    })

place = 'a'

normal = ['a','b']
different = ['v','w','x','y','z']

different_subset_start = 2
normal_subset_start = 4
subset_end = 8

for val in df:
    if place in different:
        print('place is different')
        df1 = df[(df['period'] >= different_subset_start) & (df['period'] <= subset_end)].drop_duplicates(subset = 'period')
        return df1
    elif place in normal:
        print('place is normal')
        df1 = df[(df['period'] >= normal_subset_start) & (df['period'] <= subset_end)].drop_duplicates(subset = 'period')
        return df1
    else:
        print('Incorrect input for Day. Day Floater could not be scheduled. Please check input value')
    return

print(df1)

Intended output would be to return df1 to be used later.

brentertainer · Accepted Answer · 2019-08-08 03:32:39Z

1

To check if an object is in something rather than check if it equal to something, use in.

if place in different:

and similarly

elif place in normal:

EDIT:

Here is how it should look if you make it a function. Basically, you just need to do a def my_function_name(arguments): sort of thing, then indent the rest of your code so it belongs to that function. Like this:

import pandas as pd

def get_subset(df, place):
    normal = ['a','b']
    different = ['v','w','x','y','z']

    different_subset_start = 2
    normal_subset_start = 4
    subset_end = 8

    if place in different:
        df1 = df[(df['period'] >= different_subset_start) & (df['period'] <= subset_end)].drop_duplicates(subset = 'period')
    elif place in normal:
        df1 = df[(df['period'] >= normal_subset_start) & (df['period'] <= subset_end)].drop_duplicates(subset = 'period')
    else:
        df1 = None
    return df1

df = pd.DataFrame({
    'period' : [1.0, 1.0, 2.0, 2.0, 3.0, 4.0, 5.0, 7.0, 7.0, 8.0, 9.0],                             
    })

place = 'a'

print(get_subset(df, place))

edited Aug 8, 2019 at 3:32

answered Aug 8, 2019 at 3:14

brentertainer

2,2101 gold badge8 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

jonboy Over a year ago

Thanks @brentertainer. It it possible to return df1 outside of the if statement?

brentertainer Over a year ago

@jonboy When I read return, I think return -- like what you would write to return a value from from a function/method. Would you please clarify what you mean? There is no function/method in your example.

jonboy Over a year ago

I'm learning as I go, I think I need a function where it returns the subsetted df based on the value in place

brentertainer Over a year ago

Sure, you can do that. You need to be careful to define df1 in every case. If the else condition is reached in your code, df1 will not be defined. So you might do df1 = None there or df1 = pd.DataFrame() to make it an empty DataFrame so at least it's the same type.

jonboy Over a year ago

I've amended the question. It's still incorrect. But would something like this be amended

|

Valdi_Bo · Accepted Answer · 2019-08-08 03:58:53Z

Look at for val in df: in your code. Such a construction is strange, as you don't use val variable.

Change the last fragment of your code to something like this:

def fn():
    if place in different:
        print('place is different')
        return df[df.period.between(different_subset_start, subset_end)]\
            .drop_duplicates(subset='period')
    elif place in normal:
        print('place is normal')
        return df[df.period.between(normal_subset_start, subset_end)]\
            .drop_duplicates(subset = 'period')
    else:
        print('Incorrect input for place. Please check value')

In your case subset = 'period' is superfluous as period is the only column in your DataFrame.

The last return is also not needed. If a function execution comes to the end of code it returns without returning any value.

Yet another detail: If your DataFrame has a single column then maybe a Series would be enough?

Collectives™ on Stack Overflow

Subset a df using an if statement - Pandas

2 Answers 2

7 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related