Function with two variables - Polynomial fit Python

Question

I have various pump performance data, and I am trying to fit curves as a combination of them.

In below image, different curves are at different pump speeds (RPM), and the Y axis is Pressure head, and X axis is flow rate.

I am trying to figure out the following functions:

pressure = func (flowrate, rpm)
flowrate = func (pressure, rpm)
rpm = func (flowrate, pressure)

Usually the pump curves follow some form of affinity laws, as

flow rate is proportional to rpm.
pressure is proportional to square of rpm.

However in reality - they do not, and there will be a difference between experimental data & theoretical data. So that is why I am looking to fit a function that looks like this: (open to other suggestions) - like 3rd degree polynomial as well.

Y = a*x1**2 + b*x1+ c*x2**2 + d*x2 + e*x1*x2 + f

However, I am getting the wrong results with scipy - curve_fit. I am not in favor of employing sklearn regression as we would not be able to figure out the coefficients.

Is there a way one can employ this in Python?

See my code below so far:

from numpy import genfromtxt
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import pandas as pd
# df = pd.read_excel('./nominal_data/pump_curve.xlsx', sheet_name =0)
df = pd.DataFrame({
    'Flow(lpm)': [128.0846942, 105.7579874, 95.11146262, 69.57902038, 53.25344504, 35.47260492, np.nan, 131.96, 110.57, 91.32, 73.02, 53.9, 20.41, np.nan, 116.06, 99.46, 83.7, 68.84, 54.47, 20.98, np.nan, 103.0, 87.6, 73.6, 57.8, 44.0, 19.49, np.nan, 86.2, 73.1, 56.0, 42.6, 28.1, 16.33, np.nan, 56.2, 47.1, 38.6, 30.9, 24.0, 12.3],
    'Speed Feedback (RPM)': [5204.0, 5208.0, 5206.0, 5206.0, 5176.0, 5175.0, np.nan, 4710.72, 4706.4, 4714.93, 4687.11, 4691.0, 4602.0, np.nan, 4103.21, 4115.26, 4147.8, 4148.14, 4141.09, 4124.72, np.nan, 3675.89, 3657.88, 3673.73, 3671.41, 3675.27, 3664.88, np.nan, 3118.66, 3186.23, 3106.92, 3107.19, 3114.69, 3090.08, np.nan, 2077.44, 2073.23, 2062.01, 2069.37, 2068.02, 2067.91],
    'dP (PSI)': [16.5, 25.34, 28.78, 35.45, 37.86, 38.87, np.nan, 8.85, 17.01, 23.42, 27.48, 30.5, 32.4, np.nan, 6.69, 11.84, 17.24, 20.16, 22.64, 25.81, np.nan, 5.2, 9.6, 13.2, 16.3, 18.1, 20.38, np.nan, 3.7, 6.5, 10.0, 12.1, 13.5, 14.54, np.nan, 1.2, 2.7, 3.7, 4.7, 5.2, 6.3]
})
flow_lpm = df['Flow(lpm)'].to_numpy()
rpm = df['Speed Feedback (RPM)'].to_numpy()
dP = df['dP (PSI)'].to_numpy()
X = (flow_lpm, rpm)
y = dP

def f(X, a, b, c, d, e, f):
    x_1, x_2 = X
    return a*x_1**2 + b*x_1 + c*x_2**2 + d*x_2 + e*x_1*x_2 + f
param, param_cov = curve_fit(f, X, y)

# Test a random data point. 
flow_test = 54
rpm_test = 5195

dp_test = param[0]*flow_test**2 + param[1]*flow_test + param[2]*rpm_test**2 + param[3]**rpm_test + param[4]*flow_test*rpm_test + param[5]
print (dp_test)

I could have multiple curves of Pressure = function (flow rate) at multiple RPMs like the figure shows. For example there is a 3rd degree curve for a 5100 RPM curve. However it only works for a given RPM. The curve is different for a 2070 RPM situation.

Seeking suggestions on proper curve fitting methodology. Open to deleting end points, to ensure the curve fit works in the middle range.

datafile_imagefile

Why don't you fit a quadratic like P = aQ^2 + bQ +c (where P is pressure or head and Q is flow rate) for each rotation rate N. Then you can separately fit coefficients a, b and c as functions of N. BTW, your understanding of the affinity laws for speed-scaling is not entirely accurate. — lastchance
– lastchance, Commented Mar 6 at 22:52
I added the data from the link into the post, using df.to_dict(orient='list'). This is makes the post more useful to future users in case the link breaks at some point. — Nick ODell
– Nick ODell, Commented Mar 7 at 17:09
"I am not in favor of employing sklearn regression as we would not be able to figure out the coefficients." You can get those coefficients using LinearRegression.coef_: scikit-learn.org/stable/modules/generated/… — Nick ODell
– Nick ODell, Commented Mar 7 at 17:22

jlandercy · Accepted Answer · 2025-03-08 08:12:15Z

With the following dataset you provided:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import optimize

data = pd.DataFrame({
    'Flow(lpm)': [128.0846942, 105.7579874, 95.11146262, 69.57902038, 53.25344504, 35.47260492, np.nan, 131.96, 110.57, 91.32, 73.02, 53.9, 20.41, np.nan, 116.06, 99.46, 83.7, 68.84, 54.47, 20.98, np.nan, 103.0, 87.6, 73.6, 57.8, 44.0, 19.49, np.nan, 86.2, 73.1, 56.0, 42.6, 28.1, 16.33, np.nan, 56.2, 47.1, 38.6, 30.9, 24.0, 12.3],
    'Speed Feedback (RPM)': [5204.0, 5208.0, 5206.0, 5206.0, 5176.0, 5175.0, np.nan, 4710.72, 4706.4, 4714.93, 4687.11, 4691.0, 4602.0, np.nan, 4103.21, 4115.26, 4147.8, 4148.14, 4141.09, 4124.72, np.nan, 3675.89, 3657.88, 3673.73, 3671.41, 3675.27, 3664.88, np.nan, 3118.66, 3186.23, 3106.92, 3107.19, 3114.69, 3090.08, np.nan, 2077.44, 2073.23, 2062.01, 2069.37, 2068.02, 2067.91],
     'dP (PSI)': [16.5, 25.34, 28.78, 35.45, 37.86, 38.87, np.nan, 8.85, 17.01, 23.42, 27.48, 30.5, 32.4, np.nan, 6.69, 11.84, 17.24, 20.16, 22.64, 25.81, np.nan, 5.2, 9.6, 13.2, 16.3, 18.1, 20.38, np.nan, 3.7, 6.5, 10.0, 12.1, 13.5, 14.54, np.nan, 1.2, 2.7, 3.7, 4.7, 5.2, 6.3]
})

Remove nans:

data = data.dropna()

Define your model as follow:

def model(x, a, b, c, d, e, f):
    return a * x[:,0] ** 2 + b * x[:,0] + c * x[:,1] ** 2 + d * x[:,1] + e * x[:,0] * x[:,1] + f

Then optimize:

popt, pcov = optimize.curve_fit(model, data.iloc[:,:2].values, data.iloc[:,2].values)

Now we can regress the model on a finer grid:

x0lin = np.linspace(data.iloc[:,0].min(), data.iloc[:,0].max(), 200)
x1lin = np.linspace(data.iloc[:,1].min(), data.iloc[:,1].max(), 200)
X0, X1 = np.meshgrid(x0lin, x1lin)

Z = model(np.array([X0.ravel(), X1.ravel()]).T, *popt).reshape(X0.shape)

It renders as follow:

fig, axe = plt.subplots(subplot_kw={"projection": "3d"})
axe.scatter(*data.values.T)
axe.plot_surface(X0, X1, Z, cmap="viridis", alpha=0.35)

Michael Hodel · Accepted Answer · 2025-03-06 22:58:20Z

-1

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

df = pd.read_excel('pump_curve.xlsx', sheet_name =0)
columns = ['Flow(lpm)', 'Speed Feedback (RPM)', 'dP (PSI)']
df = df[columns].dropna()
columns2 = ['lpm', 'rpm', 'psi']
df = df.rename(columns=dict(zip(columns, columns2)))

def f(X, a, b, c, d, e, f):
    x_1, x_2 = X
    return a*x_1**2 + b*x_1 + c*x_2**2 + d*x_2 + e*x_1*x_2 + f

fig, ax = plt.subplots(1, 3, figsize=(15, 5))
for idx, name in enumerate(columns2):
    y = df[name].to_numpy()
    name1, name2 = columns2[:idx]+columns2[idx+1:]
    X = (df[name1].to_numpy(), df[name2].to_numpy())
    param, param_cov = curve_fit(f, X, y)
    ax[idx].plot(f(X, *param), label='prediction');
    ax[idx].plot(y, label='data');
    ax[idx].legend();
    ax[idx].set_title(f'{name} = f({name1}, {name2})');
plt.show();

answered Mar 6 at 22:58

Michael Hodel

3,0381 gold badge8 silver badges12 bronze badges

1 Comment

Reinderien Mar 7 at 4:10

Please add an explanation (and read meta.stackoverflow.com/a/405940/313768 ).

Collectives™ on Stack Overflow

Function with two variables - Polynomial fit Python

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related