-1

I have problem related to curve_fit in Python. Specifically, I am trying to model double exponential data.

I am trying to model data. The fits are good by the eye, but when I look at the parameters and errors I am shocked. The errors are huge (few orders greater then the mean of the parameter). Besides that I have to provide very specific initial guesses, otherwise it is useless. Do you have any tip?

X-axis is time and Y-axis is intensity. This method works fine for the blue plot and the estimated parameters are not that huge (reasonable more or less). I am facing problem when try to fit it to orange data. Here are the dataset:

Time [-0.5 -0.4 -0.3 -0.2 -0.1 0. 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2 0.21 0.22 0.23 0.24 0.25 0.27 0.3 0.33 0.36 0.39 0.42 0.45 0.48 0.51 0.54 0.57 0.6 0.63 0.66 0.69 0.72 0.75 0.78 0.81 0.84 0.87 0.9 0.93 0.96 0.99 1.02 1.05 1.08 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2. ]

Gain (blue) [0. 0.08362 0.05064 0.0498 0.03706667 0.27215333 1.47517 1.62150333 1.72074667 1.85051667 1.81913 1.89994 1.96591667 1.98604667 2.08784333 2.07969333 2.09007 2.06054333 2.1477 2.08343 2.08434 2.02111 1.96873 1.96056333 1.89549667 1.82184333 1.70469 1.47502667 1.30257667 1.09063333 0.95496333 0.87091333 0.73064667 0.59721333 0.53186333 0.47225 0.41566667 0.36025 0.31752 0.30499667 0.26868 0.21355667 0.19169 0.17645 0.14872667 0.10706667 0.12724667 0.1013 0.08720333 0.09477333 0.09134 0.10958 0.08022667 0.09363 0.08076667 0.09006 0.09902 0.07093333 0.09918 0.10192 0.09432333 0.09079333 0.10425 0.09839667]

Loss - orange [ 0. -0.12552667 -0.15764 -0.19444 -0.26223 -0.35566333 -1.2022 -1.30988667 -1.44820667 -1.49741333 -1.66169333 -1.72277667 -1.81840667 -1.92772333 -1.97253333 -2.10094667 -2.21817667 -2.24875333 -2.43420667 -2.47216 -2.50555667 -2.62232333 -2.65796667 -2.68747333 -2.69612667 -2.71442 -2.59840333 -2.56071667 -2.44539333 -2.28093667 -2.23737 -2.09984333 -2.07465667 -2.04232667 -1.95166667 -1.93045333 -1.89963333 -1.92825667 -1.98861667 -1.93057667 -2.00662333 -1.91338667 -1.92764 -2.03428 -2.04880667 -1.98233667 -1.90731333 -2.00939667 -1.92832 -1.93799 -1.89133 -1.90807 -1.97912 -1.99034667 -1.95203333 -1.85398667 -1.86426667 -1.85729667 -1.86018667 -1.92405 -1.88595667 -1.88160667 -1.82243333 -1.87084667]

How to deal with double exp fitting?

Plots: https://ibb.co/mCs39VP
Model:a*e^(-x*b)+c*e^(-x*d)+e
Paramters (for lower fit):  [-1622.6092323    -15.038877    1713.45499496   -15.33871455
    -1.92259733]
Errors [4.73773551e+06 4.17637511e+02 4.73736168e+06 4.37257698e+02
 1.20927662e-02

Plots:

enter image description here

Here is my code for the orange data fit:

strong text

  {def exp(x, a, b, c, d, e):
       return a*np.exp(b*x)+c*np.exp(d*x)+e
    
    n=25
    y_l=y_l[n:]
    x_l=x_l[n:]
    
    par, cov=opt.curve_fit(exp,x_l, y_l, p0=[-5, -7, 30, -20, -1.9], bounds=((-np.inf, -np.inf,-np.inf,-np.inf, -np.inf ),(np.inf, 0, np.inf, 0, np.inf)), maxfev=5000)
    
    print(par)
    parerr=np.sqrt(np.diag(cov))
    print("Errors",parerr)
    mx_l=np.linspace(np.min(x_l), np.max(x_l), 100)
    my_l=exp(mx_l, par[0], par[1], par[2], par[3], par[4])
    plt.plot(mx_l, my_l, label="fit", color="blue")
    plt.show()}
6
  • 1
    The red and blue curves only start at the peak of the dots rather than right at the beginning of the dots; is the issue maybe to do with the curve not existing for the earlier parts? Are you just comparing it for data points that are actually being estimated? And how are you calculating the error? Is it absolute error or MSE (Mean Squared Error)? There's lots of stuff you can and should include in this question to make it more likely (and possible) to answer. Commented Dec 15, 2023 at 23:50
  • 4
    Welcome to Stack Overflow.. For us to help you, provide the expected minimal reproducible example. We should be able to copy and paste a contiguous block of your code, execute that file, and reproduce your problem along with tracing output for the problem points. This allows testing suggestions against your data and desired output. Commented Dec 16, 2023 at 0:06
  • Could you post the dataset, no screenshot. Copy paste instead or provide a link. Commented Dec 16, 2023 at 9:27
  • @jlandercy I added datasets and part of the code. Commented Dec 17, 2023 at 8:45
  • @Random Davis The method works well for the blue data. The issue with orange is that I can fit only when it starts from the data index 15-20. The errors are calculated as a parerr=np.sqrt(np.diag(cov)). Commented Dec 17, 2023 at 8:48

2 Answers 2

3

TL;DR

You are facing at least three issues:

  • Your complete dataset does not fit your model (x <= 0);
  • Additionally models with provided parameters are not convergent (increasing exponentials), it is not a solution based on dataset you showed;
  • You need to help the solver to discriminate between both exponential effects because your model is redundant and get the solver confused when optimizing

MCVE

Lets build a MCVE to highlight challenges of your fitting problem.

import numpy as np
import matplotlib.pyplot as plt
from scipy import optimize
from sklearn.metrics import r2_score

For the model you have referenced we have the following function:

def model(x, a, b, c, d, e):
    return a * np.exp(-b * x) + c * np.exp(-d * x) + e

Which is not trivial to fit because it is redundant (there is two identical exponential effect in it). Meaning the solver will be confused when performing loss gradient descent starting from unit point.

We generate some similar data to you dataset:

p0 = np.array([1.7, 1.1, -0.4, 4.2, 1.])

Notice your parameters set does not define a convergent function because both exponential are increasing. This solution certainly cannot fit your data and therefore their uncertainties are huge.

#p0 = np.array([-1622.6092323, -15.038877, 1713.45499496, -15.33871455, -1.92259733])

We generate some synthetic dataset:

np.random.seed(12345)
x = np.linspace(-0.5, 2.0, 35)
y = model(x, *p0)
s = 0.01 * np.ones_like(y)
n = s * np.random.normal(size=y.size)
yexp = y + n

Here is the key of the optimization, we only tweak one initial parameters to be negative in order to help the solver to discriminate between the two exponentials and optimize it each against the other:

popt, pcov = optimize.curve_fit(
    model, x, yexp, sigma=s,
    p0=[1, 1, -1, 1, 1],    # Hint: discriminate both effect
    absolute_sigma=True
)    
# array([ 1.70657559,  1.04436978, -0.36944364,  4.29809242,  0.96715622])

spopt = np.sqrt(np.diag(pcov))
# array([0.01738214, 0.03964849, 0.02884002, 0.09962603, 0.01952317])

We can see, in this case, the convergence is optimal (we recover parameters with appreciable errors on it).

yhat = model(x, *popt)
score = r2_score(yexp, yhat)  # 0.9993365605205807

Fitness is also correct:

enter image description here

Other models

Increase order of one of the expontentials

Another model of interest for points x>=0, with 4 parameters only could be the following:

def model(x, a, b, c, d):
    return a * x * np.exp(-b * x) + c *(1. - np.exp(-d * x))

Which give some fitness for gain curve while having no problem to converge:

t0 = np.array([-0.5, -0.4, -0.3, -0.2, -0.1, 0., 0.06, 0.07, 0.08, 0.09, 0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2, 0.21, 0.22, 0.23, 0.24, 0.25, 0.27, 0.3, 0.33, 0.36, 0.39, 0.42, 0.45, 0.48, 0.51, 0.54, 0.57, 0.6, 0.63, 0.66, 0.69, 0.72, 0.75, 0.78, 0.81, 0.84, 0.87, 0.9, 0.93, 0.96, 0.99, 1.02, 1.05, 1.08, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.])
x1 = np.array([0., 0.08362, 0.05064, 0.0498, 0.03706667, 0.27215333, 1.47517, 1.62150333, 1.72074667, 1.85051667, 1.81913, 1.89994, 1.96591667, 1.98604667, 2.08784333, 2.07969333, 2.09007, 2.06054333, 2.1477, 2.08343, 2.08434, 2.02111, 1.96873, 1.96056333, 1.89549667, 1.82184333, 1.70469, 1.47502667, 1.30257667, 1.09063333, 0.95496333, 0.87091333, 0.73064667, 0.59721333, 0.53186333, 0.47225, 0.41566667, 0.36025, 0.31752, 0.30499667, 0.26868, 0.21355667, 0.19169, 0.17645, 0.14872667, 0.10706667, 0.12724667, 0.1013, 0.08720333, 0.09477333, 0.09134, 0.10958, 0.08022667, 0.09363, 0.08076667, 0.09006, 0.09902, 0.07093333, 0.09918, 0.10192, 0.09432333, 0.09079333, 0.10425, 0.09839667])
x2 = np.array([0., -0.12552667, -0.15764, -0.19444, -0.26223, -0.35566333, -1.2022, -1.30988667, -1.44820667, -1.49741333, -1.66169333, -1.72277667, -1.81840667, -1.92772333, -1.97253333, -2.10094667, -2.21817667, -2.24875333, -2.43420667, -2.47216, -2.50555667, -2.62232333, -2.65796667, -2.68747333, -2.69612667, -2.71442, -2.59840333, -2.56071667, -2.44539333, -2.28093667, -2.23737, -2.09984333, -2.07465667, -2.04232667, -1.95166667, -1.93045333, -1.89963333, -1.92825667, -1.98861667, -1.93057667, -2.00662333, -1.91338667, -1.92764, -2.03428, -2.04880667, -1.98233667, -1.90731333, -2.00939667, -1.92832, -1.93799, -1.89133, -1.90807, -1.97912, -1.99034667, -1.95203333, -1.85398667, -1.86426667, -1.85729667, -1.86018667, -1.92405, -1.88595667, -1.88160667, -1.82243333, -1.87084667])

q = (t0 >= 0.)
t0 = t0[q]
x1 = x1[q]
x2 = x2[q]

popt1, pcov1 = optimize.curve_fit(model, t0, x1)
# (array([ 3.85903575e+01,  6.84925174e+00,  2.42361762e-02, -9.12847012e-01]),
#  array([[0.44760132, 0.05386743, 0.01048778, 0.19496296],
#         [0.05386743, 0.00822338, 0.00224154, 0.04191002],
#         [0.01048778, 0.00224154, 0.00249973, 0.04831067],
#         [0.19496296, 0.04191002, 0.04831067, 0.9471787 ]]))
# R2 = 0.9932075193845648

popt2, pcov2 = optimize.curve_fit(model, t0, x2)
# (array([-23.77125047,   5.21925912,   1.95317437,   2.20996689]),
#  array([[ 0.90925921,  0.00741302, -0.04303857,  0.25044186],
#         [ 0.00741302,  0.14738189, -0.02429899,  0.22796133],
#         [-0.04303857, -0.02429899,  0.00861674, -0.05510877],
#         [ 0.25044186,  0.22796133, -0.05510877,  0.43753774]]))
# R2 = 0.8745479038673547

enter image description here

But clearly lack of fit for loss curve. On the other hand, it fits seamlessly because there are two orthogonal functions instead of two copies of the same function.

Sum of logistic functions

@JJacquelin proposal clearly offer better fitness for the complete dataset:

def model(x, a, b, c, p, r, q, s):
    return a + b /(1 + np.exp(p * (x - r))) + c / (1 + np.exp(q * (x - s)))

It only requires an educated guess to discriminate both effects (sum of identical function):

popt1, pcov1 = optimize.curve_fit(model, t0, x1, p0=[1, 1, 1, 1, 1, -1, 1])
# R2 = 0.9965718405681585
popt2, pcov2 = optimize.curve_fit(model, t0, x2, p0=[-1, 1, 5, 1, 1, -1, 1])
# R2 = 0.9905364976009451

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

Bravo for the drawing of the complete two curves on the same graph !
0

The model equation made of a sum of two exponentials isn't compatible with the expected shape of curve (except if one uses partial fits i.e. piecewise)).

One can propose some combination of other kind of functions such as gaussian, logistic, etc.

For example with the sum of two logistic functions :

"Gain" curve :

enter image description here

"Loss" curve :

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.