I have various pump performance data, and I am trying to fit curves as a combination of them.
In below image, different curves are at different pump speeds (RPM), and the Y axis is Pressure head, and X axis is flow rate.
I am trying to figure out the following functions:
pressure = func (flowrate, rpm)
flowrate = func (pressure, rpm)
rpm = func (flowrate, pressure)
Usually the pump curves follow some form of affinity laws, as
- flow rate is proportional to rpm.
- pressure is proportional to square of rpm.
However in reality - they do not, and there will be a difference between experimental data & theoretical data. So that is why I am looking to fit a function that looks like this: (open to other suggestions) - like 3rd degree polynomial as well.
Y = a*x1**2 + b*x1+ c*x2**2 + d*x2 + e*x1*x2 + f
However, I am getting the wrong results with scipy - curve_fit. I am not in favor of employing sklearn regression as we would not be able to figure out the coefficients.
Is there a way one can employ this in Python?
See my code below so far:
from numpy import genfromtxt
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import pandas as pd
# df = pd.read_excel('./nominal_data/pump_curve.xlsx', sheet_name =0)
df = pd.DataFrame({
'Flow(lpm)': [128.0846942, 105.7579874, 95.11146262, 69.57902038, 53.25344504, 35.47260492, np.nan, 131.96, 110.57, 91.32, 73.02, 53.9, 20.41, np.nan, 116.06, 99.46, 83.7, 68.84, 54.47, 20.98, np.nan, 103.0, 87.6, 73.6, 57.8, 44.0, 19.49, np.nan, 86.2, 73.1, 56.0, 42.6, 28.1, 16.33, np.nan, 56.2, 47.1, 38.6, 30.9, 24.0, 12.3],
'Speed Feedback (RPM)': [5204.0, 5208.0, 5206.0, 5206.0, 5176.0, 5175.0, np.nan, 4710.72, 4706.4, 4714.93, 4687.11, 4691.0, 4602.0, np.nan, 4103.21, 4115.26, 4147.8, 4148.14, 4141.09, 4124.72, np.nan, 3675.89, 3657.88, 3673.73, 3671.41, 3675.27, 3664.88, np.nan, 3118.66, 3186.23, 3106.92, 3107.19, 3114.69, 3090.08, np.nan, 2077.44, 2073.23, 2062.01, 2069.37, 2068.02, 2067.91],
'dP (PSI)': [16.5, 25.34, 28.78, 35.45, 37.86, 38.87, np.nan, 8.85, 17.01, 23.42, 27.48, 30.5, 32.4, np.nan, 6.69, 11.84, 17.24, 20.16, 22.64, 25.81, np.nan, 5.2, 9.6, 13.2, 16.3, 18.1, 20.38, np.nan, 3.7, 6.5, 10.0, 12.1, 13.5, 14.54, np.nan, 1.2, 2.7, 3.7, 4.7, 5.2, 6.3]
})
flow_lpm = df['Flow(lpm)'].to_numpy()
rpm = df['Speed Feedback (RPM)'].to_numpy()
dP = df['dP (PSI)'].to_numpy()
X = (flow_lpm, rpm)
y = dP
def f(X, a, b, c, d, e, f):
x_1, x_2 = X
return a*x_1**2 + b*x_1 + c*x_2**2 + d*x_2 + e*x_1*x_2 + f
param, param_cov = curve_fit(f, X, y)
# Test a random data point.
flow_test = 54
rpm_test = 5195
dp_test = param[0]*flow_test**2 + param[1]*flow_test + param[2]*rpm_test**2 + param[3]**rpm_test + param[4]*flow_test*rpm_test + param[5]
print (dp_test)
I could have multiple curves of Pressure = function (flow rate) at multiple RPMs like the figure shows. For example there is a 3rd degree curve for a 5100 RPM curve. However it only works for a given RPM. The curve is different for a 2070 RPM situation.
Seeking suggestions on proper curve fitting methodology. Open to deleting end points, to ensure the curve fit works in the middle range.


df.to_dict(orient='list'). This is makes the post more useful to future users in case the link breaks at some point.