1
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

data = pd.read_csv('D:\ history/segment.csv')
data = pd.DataFrame(data)
data = data.sort_values(['Prob_score'], ascending=[False])

one = len(data)
actualpaid_overall = len(data.loc[data['paidstatus'] == 1])

data_split = np.array_split(data, 10)

data1 = data_split[0]
actualpaid_ten = len(data1.loc[data1['paidstatus'] == 1])
percent_ten = actualpaid_ten/actualpaid_overall

data2 = data_split[1]
actualpaid_twenty = len(data2.loc[data2['paidstatus'] == 1])
percent_twenty = (actualpaid_twenty/actualpaid_overall) +  percent_ten

data3 = data_split[2]
actualpaid_thirty = len(data3.loc[data3['paidstatus'] == 1])
percent_thirty = (actualpaid_thirty/actualpaid_overall) +  percent_twenty

data4 = data_split[3]
actualpaid_forty = len(data4.loc[data4['paidstatus'] == 1])
percent_forty = (actualpaid_forty/actualpaid_overall) +  percent_thirty

data5 = data_split[4]
actualpaid_fifty = len(data5.loc[data5['paidstatus'] == 1])
percent_fifty = (actualpaid_fifty/actualpaid_overall) +  percent_forty

data6 = data_split[5]
actualpaid_sixty = len(data6.loc[data6['paidstatus'] == 1])
percent_sixty = (actualpaid_sixty/actualpaid_overall) +  percent_fifty

data7 = data_split[6]
actualpaid_seventy = len(data7.loc[data7['paidstatus'] == 1])
percent_seventy = (actualpaid_seventy/actualpaid_overall) + percent_sixty

data8 = data_split[7]
actualpaid_eighty = len(data8.loc[data8['paidstatus'] == 1])
percent_eighty = (actualpaid_eighty/actualpaid_overall) +  percent_seventy

data9 = data_split[8]
actualpaid_ninenty = len(data9.loc[data9['paidstatus'] == 1])
percent_ninenty = (actualpaid_ninenty/actualpaid_overall) +  percent_eighty

data10 = data_split[9]
actualpaid_hundred = len(data10.loc[data10['paidstatus'] == 1])
percent_hundred = (actualpaid_hundred/actualpaid_overall) +  percent_ninenty

array_x = [10,20,30,40,50,60,70,80,90,100]
array_y = [ percent_ten, percent_twenty, percent_thirty, percent_forty,percent_fifty, percent_sixty, percent_seventy, percent_eighty, percent_ninenty, percent_hundred]

plt.xlabel(' Base')
plt.ylabel(' percent')
ax = plt.plot(array_x,array_y)
plt.minorticks_on()
plt.grid(which='major', linestyle='-', linewidth=0.5, color='0.1')
plt.grid( which='both', axis = 'both',  linewidth=0.5,color='0.75')

The above is my python code i have splitted my dataframe into 10 equal sections and plotted the graph but I'm not satisfied with this i have two concerns:

  1. array_x = [10,20,30,40,50,60,70,80,90,100] in this line of code i have manually taken the x values, is there any possible way to process automatically as i have taken split(data,10) it should show 10 array values

    1. As we can see the whole data1,2,3,4...10 is being repeated again and again is there a solution to write this in a function or loop.

Any help with codes will be appreciated. Thanks

6
  • np.array_split(data, 5) suppose i have 100 rows in my data frame it splits equally into 5 equal rows with 20 rows in each splitted set. Now my array_x should be array_x =[1,2,3,4,5] Commented Feb 1, 2019 at 6:44
  • np.cumsum if i use this i will get the array_x as [20,40,60,80,100] right bro? but i dont need the sum i need as [1,2,3,4,5]. if i give split(data,5) , array_x should automatically assign as array_x = [1,2,3,4,5]. If I give split(data,10), Array_x should be array_x = [1,2,3,4,5,6,7,8,9,10] Commented Feb 1, 2019 at 6:59
  • Oops, soo sorry you are right boss is should be [10,20,30,40,50,60,70,80,90,100] not [1,2,3,4,5,6,7,8,9,10]. Commented Feb 1, 2019 at 7:04
  • actually my logic is in 10% of the overall data i need to find how many of the customers are paid ones in 20% how many are paid and so on Commented Feb 1, 2019 at 7:06
  • Lets assume i have 1000 customer's in my dataframe i need to split them into 10 so that each split wil have 100 customers and so on. i need to plot this in graph with split which is 10%,20% upto 100% as i gave split(10) in x axis and the sum of paid customers among the y-axis. I have no proble with y axis as the code that u gave was correct and working fine Commented Feb 1, 2019 at 7:10

1 Answer 1

1

I believe you need list comprehension and for count is possible use simplier way - sum of boolean mask, True values are processes like 1, then convert list to numpy array and use numpy.cumsum:

data = pd.read_csv('D:\ history/segment.csv')
data = data.sort_values('Prob_score', ascending=False)

one = len(data)
actualpaid_overall = (data['paidstatus'] == 1).sum()

data_split = np.array_split(data, 10)

x = [len(x) for x in data_split]
y = [(x['paidstatus'] == 1).sum()/actualpaid_overall for x in data_split]

array_x = np.cumsum(np.array(x))
array_y = np.cumsum(np.array(y))

plt.xlabel(' Base')
plt.ylabel(' percent')
ax = plt.plot(array_x,array_y)
plt.minorticks_on()
plt.grid(which='major', linestyle='-', linewidth=0.5, color='0.1')
plt.grid( which='both', axis = 'both',  linewidth=0.5,color='0.75')

Sample:

np.random.seed(2019)
N = 1000
data = pd.DataFrame({'paidstatus':np.random.randint(3, size=N),
                     'Prob_score':np.random.randint(100, size=N)})
#print (data)

data = data.sort_values(['Prob_score'], ascending=[False])

actualpaid_overall = (data['paidstatus'] == 1).sum()

data_split = np.array_split(data, 10)

x = [len(x) for x in data_split]
y = [(x['paidstatus'] == 1).sum()/actualpaid_overall for x in data_split]

array_x = np.cumsum(np.array(x))
array_y = np.cumsum(np.array(y))

print (array_x)
[ 100  200  300  400  500  600  700  800  900 1000]

print (array_y)
[0.09118541 0.18844985 0.27963526 0.38601824 0.49848024 0.61702128
 0.72036474 0.81155015 0.9331307  1.        ]
Sign up to request clarification or add additional context in comments.

1 Comment

not this code bro the previous one array_x = np.arange(10) + 1

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.