Splittig data in python dataframe and getting the array values automatically

Question

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

data = pd.read_csv('D:\ history/segment.csv')
data = pd.DataFrame(data)
data = data.sort_values(['Prob_score'], ascending=[False])

one = len(data)
actualpaid_overall = len(data.loc[data['paidstatus'] == 1])

data_split = np.array_split(data, 10)

data1 = data_split[0]
actualpaid_ten = len(data1.loc[data1['paidstatus'] == 1])
percent_ten = actualpaid_ten/actualpaid_overall

data2 = data_split[1]
actualpaid_twenty = len(data2.loc[data2['paidstatus'] == 1])
percent_twenty = (actualpaid_twenty/actualpaid_overall) +  percent_ten

data3 = data_split[2]
actualpaid_thirty = len(data3.loc[data3['paidstatus'] == 1])
percent_thirty = (actualpaid_thirty/actualpaid_overall) +  percent_twenty

data4 = data_split[3]
actualpaid_forty = len(data4.loc[data4['paidstatus'] == 1])
percent_forty = (actualpaid_forty/actualpaid_overall) +  percent_thirty

data5 = data_split[4]
actualpaid_fifty = len(data5.loc[data5['paidstatus'] == 1])
percent_fifty = (actualpaid_fifty/actualpaid_overall) +  percent_forty

data6 = data_split[5]
actualpaid_sixty = len(data6.loc[data6['paidstatus'] == 1])
percent_sixty = (actualpaid_sixty/actualpaid_overall) +  percent_fifty

data7 = data_split[6]
actualpaid_seventy = len(data7.loc[data7['paidstatus'] == 1])
percent_seventy = (actualpaid_seventy/actualpaid_overall) + percent_sixty

data8 = data_split[7]
actualpaid_eighty = len(data8.loc[data8['paidstatus'] == 1])
percent_eighty = (actualpaid_eighty/actualpaid_overall) +  percent_seventy

data9 = data_split[8]
actualpaid_ninenty = len(data9.loc[data9['paidstatus'] == 1])
percent_ninenty = (actualpaid_ninenty/actualpaid_overall) +  percent_eighty

data10 = data_split[9]
actualpaid_hundred = len(data10.loc[data10['paidstatus'] == 1])
percent_hundred = (actualpaid_hundred/actualpaid_overall) +  percent_ninenty

array_x = [10,20,30,40,50,60,70,80,90,100]
array_y = [ percent_ten, percent_twenty, percent_thirty, percent_forty,percent_fifty, percent_sixty, percent_seventy, percent_eighty, percent_ninenty, percent_hundred]

plt.xlabel(' Base')
plt.ylabel(' percent')
ax = plt.plot(array_x,array_y)
plt.minorticks_on()
plt.grid(which='major', linestyle='-', linewidth=0.5, color='0.1')
plt.grid( which='both', axis = 'both',  linewidth=0.5,color='0.75')

The above is my python code i have splitted my dataframe into 10 equal sections and plotted the graph but I'm not satisfied with this i have two concerns:

array_x = [10,20,30,40,50,60,70,80,90,100] in this line of code i have manually taken the x values, is there any possible way to process automatically as i have taken split(data,10) it should show 10 array values
1. As we can see the whole data1,2,3,4...10 is being repeated again and again is there a solution to write this in a function or loop.

Any help with codes will be appreciated. Thanks

np.array_split(data, 5) suppose i have 100 rows in my data frame it splits equally into 5 equal rows with 20 rows in each splitted set. Now my array_x should be array_x =[1,2,3,4,5] — Yadhu
– Yadhu, Commented Feb 1, 2019 at 6:44
np.cumsum if i use this i will get the array_x as [20,40,60,80,100] right bro? but i dont need the sum i need as [1,2,3,4,5]. if i give split(data,5) , array_x should automatically assign as array_x = [1,2,3,4,5]. If I give split(data,10), Array_x should be array_x = [1,2,3,4,5,6,7,8,9,10] — Yadhu
– Yadhu, Commented Feb 1, 2019 at 6:59
Oops, soo sorry you are right boss is should be [10,20,30,40,50,60,70,80,90,100] not [1,2,3,4,5,6,7,8,9,10]. — Yadhu
– Yadhu, Commented Feb 1, 2019 at 7:04
actually my logic is in 10% of the overall data i need to find how many of the customers are paid ones in 20% how many are paid and so on — Yadhu
– Yadhu, Commented Feb 1, 2019 at 7:06
Lets assume i have 1000 customer's in my dataframe i need to split them into 10 so that each split wil have 100 customers and so on. i need to plot this in graph with split which is 10%,20% upto 100% as i gave split(10) in x axis and the sum of paid customers among the y-axis. I have no proble with y axis as the code that u gave was correct and working fine — Yadhu
– Yadhu, Commented Feb 1, 2019 at 7:10

jezrael · Accepted Answer · 2019-02-01 07:07:46Z

1

I believe you need list comprehension and for count is possible use simplier way - sum of boolean mask, True values are processes like 1, then convert list to numpy array and use numpy.cumsum:

data = pd.read_csv('D:\ history/segment.csv')
data = data.sort_values('Prob_score', ascending=False)

one = len(data)
actualpaid_overall = (data['paidstatus'] == 1).sum()

data_split = np.array_split(data, 10)

x = [len(x) for x in data_split]
y = [(x['paidstatus'] == 1).sum()/actualpaid_overall for x in data_split]

array_x = np.cumsum(np.array(x))
array_y = np.cumsum(np.array(y))

plt.xlabel(' Base')
plt.ylabel(' percent')
ax = plt.plot(array_x,array_y)
plt.minorticks_on()
plt.grid(which='major', linestyle='-', linewidth=0.5, color='0.1')
plt.grid( which='both', axis = 'both',  linewidth=0.5,color='0.75')

Sample:

np.random.seed(2019)
N = 1000
data = pd.DataFrame({'paidstatus':np.random.randint(3, size=N),
                     'Prob_score':np.random.randint(100, size=N)})
#print (data)

data = data.sort_values(['Prob_score'], ascending=[False])

actualpaid_overall = (data['paidstatus'] == 1).sum()

data_split = np.array_split(data, 10)

x = [len(x) for x in data_split]
y = [(x['paidstatus'] == 1).sum()/actualpaid_overall for x in data_split]

array_x = np.cumsum(np.array(x))
array_y = np.cumsum(np.array(y))

print (array_x)
[ 100  200  300  400  500  600  700  800  900 1000]

print (array_y)
[0.09118541 0.18844985 0.27963526 0.38601824 0.49848024 0.61702128
 0.72036474 0.81155015 0.9331307  1.        ]

edited Feb 1, 2019 at 7:07

answered Feb 1, 2019 at 6:32

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Yadhu Over a year ago

not this code bro the previous one array_x = np.arange(10) + 1

Collectives™ on Stack Overflow

Splittig data in python dataframe and getting the array values automatically

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related