I'm plotting errors obtained for various mesh sizes to study the order of convergence of the model. As shown in the two plots below the "good regime" is not always present, either because the mesh is too coarse (Fig. 1) or either we reach floating point precision (Fig. 2).
So computing the linear regression on the whole data would not yield correct results.
One solution would be to manually remove the undesired points and then compute the linear regression (with sklearn.linear_model for instance) but as I have about 144 data frames to deal with, it will take a lot of time!
I did not find such a feature in sklearn.linear_model. Does it exist? Or is there a module that does it?
I also imagined a workaround consisting of computing all the slopes for consecutive points, and keeping for the actual linear regression all the values that are « the most frequent » but I did not manage to implement this.
On the two figures, I underlined the points I'd like to keep for the linear regression in red.
Here is the code I used to get the plot:
import plotly.graph_objects as go
from io import StringIO
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
import plotly.express as px
df = pd.read_csv(StringIO("""hsize,errorl2
0.1,0.0126488001839
0.05,0.0130209787153
0.025,0.0111789092046
0.0125,0.00766945668181
0.00625,0.0012502455118
0.003125,0.000268314690697
"""))
def linear_regression_log(x, y): # Compute the linear regression for log-log data, but here all the points are used in the computation
keep = ~np.isnan(y) # in my case, some data are missing
log_x = np.log(np.array(x[keep]))
log_y = np.log(np.array(y[keep]))
model = LinearRegression()
model.fit(log_x.reshape(-1, 1), log_y)
slope = model.coef_[0]
intercept = model.intercept_
return slope, intercept
# Sample data
x = df['hsize']
y = df['errorl2']
a, _ = linear_regression_log(x, y)
fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=y, mode='markers+lines'))
fig.update_layout(title=f"Slope: {a:.2f}")
fig.update_xaxes(type="log")
fig.update_yaxes(type="log")
fig.show()
And the data I plotted:
- Fig. 1:
hsize,errorl2
0.1,0.0126488001839
0.05,0.0130209787153
0.025,0.0111789092046
0.0125,0.00766945668181
0.00625,0.0012502455118
0.003125,0.000268314690697
- Fig. 2:
hsize,errorl2
0.1,0.000713407986653
0.05,1.60793872143e-06
0.025,6.20078712336e-11
0.0125,2.99238475669e-13
0.00625,1.1731644955e-13
0.003125,1.88186766825e-13

