0

I am hoping to create and return a subsetted df using an if statement. Specifically, for the code below, I have two different sets of values. The df I want to return will vary based on one of these values.

Using the code below, the specific value will be within normal and different. The value in place will dictate how the df will be subsetted.

Below is my attempt. The value in place will only ever be a single value, so it won't match the lists in full. Is it possible to return the df when the value in place is equal to a single value with these lists?

I'm hoping to return df1 to be used for subsequent tasks.

import pandas as pd

df = pd.DataFrame({
    'period' : [1.0, 1.0, 2.0, 2.0, 3.0, 4.0, 5.0, 7.0, 7.0, 8.0, 9.0],                                
    })

place = 'a'

normal = ['a','b']
different = ['v','w','x','y','z']

different_subset_start = 2
normal_subset_start = 4
subset_end = 8

for val in df:
    if place in different:
        print('place is different')
        df1 = df[(df['period'] >= different_subset_start) & (df['period'] <= subset_end)].drop_duplicates(subset = 'period')
        return df1
    elif place in normal:
        print('place is normal')
        df1 = df[(df['period'] >= normal_subset_start) & (df['period'] <= subset_end)].drop_duplicates(subset = 'period')
        return df1
    else:
        print('Incorrect input for Day. Day Floater could not be scheduled. Please check input value')
    return

print(df1)

Intended output would be to return df1 to be used later.

   period
2     2.0
4     3.0
5     4.0
6     5.0
7     7.0
9     8.0

2 Answers 2

1

To check if an object is in something rather than check if it equal to something, use in.

if place in different:

and similarly

elif place in normal:

EDIT:

Here is how it should look if you make it a function. Basically, you just need to do a def my_function_name(arguments): sort of thing, then indent the rest of your code so it belongs to that function. Like this:

import pandas as pd

def get_subset(df, place):
    normal = ['a','b']
    different = ['v','w','x','y','z']

    different_subset_start = 2
    normal_subset_start = 4
    subset_end = 8

    if place in different:
        df1 = df[(df['period'] >= different_subset_start) & (df['period'] <= subset_end)].drop_duplicates(subset = 'period')
    elif place in normal:
        df1 = df[(df['period'] >= normal_subset_start) & (df['period'] <= subset_end)].drop_duplicates(subset = 'period')
    else:
        df1 = None
    return df1

df = pd.DataFrame({
    'period' : [1.0, 1.0, 2.0, 2.0, 3.0, 4.0, 5.0, 7.0, 7.0, 8.0, 9.0],                             
    })

place = 'a'

print(get_subset(df, place))
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks @brentertainer. It it possible to return df1 outside of the if statement?
@jonboy When I read return, I think return -- like what you would write to return a value from from a function/method. Would you please clarify what you mean? There is no function/method in your example.
I'm learning as I go, I think I need a function where it returns the subsetted df based on the value in place
Sure, you can do that. You need to be careful to define df1 in every case. If the else condition is reached in your code, df1 will not be defined. So you might do df1 = None there or df1 = pd.DataFrame() to make it an empty DataFrame so at least it's the same type.
I've amended the question. It's still incorrect. But would something like this be amended
|
0

Look at for val in df: in your code. Such a construction is strange, as you don't use val variable.

Change the last fragment of your code to something like this:

def fn():
    if place in different:
        print('place is different')
        return df[df.period.between(different_subset_start, subset_end)]\
            .drop_duplicates(subset='period')
    elif place in normal:
        print('place is normal')
        return df[df.period.between(normal_subset_start, subset_end)]\
            .drop_duplicates(subset = 'period')
    else:
        print('Incorrect input for place. Please check value')

In your case subset = 'period' is superfluous as period is the only column in your DataFrame.

The last return is also not needed. If a function execution comes to the end of code it returns without returning any value.

Yet another detail: If your DataFrame has a single column then maybe a Series would be enough?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.