Add values from pandas dataframe to a list

Question

I have a dataframe where one column in that dataframe has the GPA of first-year students. I want to loop through this column and append to a list of lists all values that fall within 0.4 units of each other. For example, if I have the values (0.4, 0.6, 0.8, 3, 3.4), then I want my list to be [[0.4,0.6,0.8], [3, 3.4]].

This is the code I have tried.

averages = [[] for w in range(len(df['GPA_year1'])//4)]

small = min(df['GPA_year1']) + 0.4

for i in range(len(averages)): 

    for y in range(len(df['GPA_year1'])):

        if small - 0.4 <= df['GPA_year1'][y] <= (small + 0.4):

            averages[i].append(df['GPA_year1'][y])

    small = small + 0.4

However, when I run this code in Jupyter Notebook, it seems to run forever, which makes me think that there may be an infinite loop somewhere (?) but I'm not sure where the infinite loop might be.

Here is the dataframe

Do you want to keep the sequence as it is or you want also to sort the numbers in GPA_year1? — Gius
– Gius, Commented Nov 30, 2019 at 18:45

Valdi_Bo · Accepted Answer · 2019-11-30 19:09:41Z

From your expected result I see that:

The first bin contains elements in the range [0.4 - 0.8].
The next bin starts from 3.0.

So you:

Don't want one-side-open bins (the first bin is closed at both sides).
Want neither "ëmpty bins" nor "adjacent ranges" (e.g. [0.4 - 0.8), then [0.8 - 1.2) and so on.

You want rather something like this:

Set the üpper limit to the lowest element in the source list + 0.4.
Put in the first "bin" elements <= limit (append this list to averages).
Drop these elements from the list.
Repeat the above procedure while the list is not empty.

I also assume that the result should be a plain Python list of lists.

To get this result, try the following code:

averages = []
src = df['GPA_year1'].sort_values()
while not src.empty:
    limit = src.min() + 0.4
    currBin = src[src <= limit]
    averages.append(currBin.to_list())
    src.drop(currBin.index, inplace=True)

This code should run quicker, beacause:

Due to sort_values() there is no need for the inner loop.
All values for the current bin are selected in a single instruction.
Dropping of "used" values is performed also in a single instruction.

For GPA_year1 column from your DataFrame this code generates:

[[0.74], [1.95, 2.18, 2.34], [3.23, 3.23, 3.44, 3.49], [3.64, 3.78, 3.82]]

One more remark concerning your code:

averages = [[] for w in range(len(df['GPA_year1'])//4)]

looks strange. How do you know that the output list will contain just 4 lists? Accidentally this is the case for your sample data, but consider such case that:

One part of values will be "very bad" (all around some lower limit).
The second part of values will be "very good" (all around some upper limit).

Then the number of "bins" will be just 2 (not 4).

Gius · Accepted Answer · 2019-11-30 20:07:23Z

0

This is my way: assuming df is your Dataframe:

GPA_year1 = df['GPA_year1'].tolist()
GPA_year1 = [3.82, 3.64, 1.95, 3.44, 2.18, 3.49, 3.78, 3.23, 0.74, 3.23, 0.74, 3.23, 2.34]

Sort the list:

GPA_year1.sort()

initialize the averages list with the first element

averages = [[GPA_year1[0]]]

loop trough your list:

for x, y in zip(GPA_year1, GPA_year1[1:]):
    if y - x <= 0.4:
        averages[-1].append(y)
    else:
        averages.append([y]) #if not create a new sublist
print(averages)
# [[0.74, 0.74], [1.95, 2.18, 2.34], [3.23, 3.23, 3.23, 3.44, 3.49, 3.64, 3.78, 3.82]]

edited Nov 30, 2019 at 20:07

answered Nov 30, 2019 at 19:51

Gius

5146 silver badges15 bronze badges

Collectives™ on Stack Overflow

Add values from pandas dataframe to a list

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related