I am trying to replicate a plot in Figure 8 of this "https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2012JD017979" article using precipitation data.
Basically, it shows the distribution of (in my case) precipitation where the y-axis (oftentimes in distribution plots, the density) is the normalized volume, allowing different precipitation products to be comparable in one plot.
I have tried the code below using Scipy's stats.binned_statistic function and numpy:
compute mean of values in each bin, bin edges, and bin number
#NB: sim_id = simulated, obs_id = observed
im_bin_means, sim_bin_edges, sim_binnumber = stats.binned_statistic(x = sim_id,values = sim_id,statistic='mean',bins=2000,range=(0.01,120))
sim_bin_cnt, sim_bin_edges, sim_binnumber = stats.binned_statistic(x = sim_id,values = sim_id,statistic='count',bins=2000,range=(0.01,120))
obs_bin_means, obs_bin_edges, obs_binnumber = stats.binned_statistic(x = obs_id,values = obs_id,statistic='mean',bins=2000,range=(0.01,120))
obs_bin_cnt, obs_bin_edges, obs_binnumber = stats.binned_statistic(x = obs_id,values = obs_id,statistic='count',bins=2000,range=(0.01,120))
compute a customized distribution as
sim_dist = (sim_bin_cnt*sim_bin_means)/np.nansum(sim_bin_cnt)
obs_dist = (obs_bin_cnt*obs_bin_means)/np.nansum(obs_bin_cnt)
the plot
f,ax = plt.subplots()
ax.plot(sim_bin_edges[:-1],sim_dist,c='r',label='SIM',lw=2,ls='-')
ax.plot(obs_bin_edges[:-1],obs_dist,c='k',label='OBS',lw=2,ls='--')
ax.set_xscale('log')
ax.legend(loc = 'best',frameon=False)
The problem is that the area under each curve should integrate to 1; i.e., np.nansum(sim_dist * np.diff(sim_bin_edges)) should be = 1. But this is not the case with my current method.
I would be glad if I could get help with an elegant method that leads to a more desirable answer. Thanks in advance

matplotlib.pyplot.histwithdensity=True? You give it the raw data, and it produces a normalized histogram.