Smoothening of data in excel based on certain rules

Question

I have following example data

The Figures depend on 3 parameters: X, Y & Rank. Out of these we are making 4 buckets. High X High Y, Low X Low Y, High X, Low Y and Low X, High Y. Then we are putting the values in each bucket and classifying them in Ranks. So for example, -22.32 is the average of all the records in Low X, Low Y bucket with Rank as 1 and -8.67 for Rank2 and so on.

Now we have following rules for the ideal distribution

For the same Rank :

Low X High Y Figure should be greater than Low X Low Y Figure
High X High Y Figure should be greater than High X Low Y Figure
High X Low Y Figure should be greater than Low X Low Y Figure
High X High Y Figure should be greater than High X High Y Figure

and for the same bucket : High Rank Figure should be greater than Low Rank Figure in a bucket

As you can see, the data confirms fairly well but there are 2 outliers : High X, High Y, Rank 2 (marked in red) and Low X, Rank 3 (marked in orange)

If lets say -4.27 becomes -0.4 and -1.51 becomes -1.8 (or -1.79 becomes -1.50) the data will start confirming with all rules.

My question is that is there any automated way to do this in excel. I have tried polynomial regression but it doesn't work because it changes a lot of values especially at the extremes. I have no issues with quantum of change at every step as long as above rules are satisfied for all records.I can't manually change these values because the data is huge and the process has to be repeatable.

Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. — Community
– Community Bot, Commented Apr 23 at 12:23
How are "Low X Low Y", etc related to X and Y? Currently very difficult to see precisely what problem you are trying to solve without some more information. — DMM
– DMM, Commented Apr 24 at 4:44
Thank you for your response. I have tried to explain the data more now. Please let me know if its still not clear. — Jack Mank
– Jack Mank, Commented Apr 24 at 12:05
@JackMank Q1:Are your constraints: LXLY<LXHY, LXLY<HXLY, LXHY<HXHY, HXLY<HXHY. It appears as though LXLY should have lowest average, HXHY should greatest and LXHY and HXLY be in middle. Is there any constraint applying to relationship between LXHY and HXLY? Q2: The data also suggests you want LXHY<HXLY but that may just be a quirk of a limited dataset. Q3: Is a pair X,Y being allocated to the a bucket on the basis of thresholds (say Xt and Yt) so that the allocation is determined by whether X and Y are above/below Xt and Yt respectively? Q4: Is problem that of setting thresholds? — DMM
– DMM, Commented Apr 25 at 17:17
@JackMank Q5: are the averages the averages of all the X,Y pairs allocated to a bucket? — DMM
– DMM, Commented Apr 25 at 17:18

Reinderien · Accepted Answer · 2025-05-13 00:09:58Z

I think you need to graduate from Excel. It technically has linear programming but I trust that less than writing out code to accomplish the same task.

When you say

greater than

how much greater than? Constraint solvers essentially always talk in greater-or-equal, not greater-than; so either use greater-or-equal or choose some epsilon minimum distance between the two constraint terms.

You can frame this as a mixed-integer linear programming problem that attempts to mutate the figure matrix until the constraints are obeyed while minimizing the number of mutations necessary:

import pandas as pd
import pulp

df = pd.DataFrame({
    'rank': range(1, 6),
    'lxly': (-22.32, -8.67, -1.51, 1.31, 5.47),
    'lxhy': (-20.18, -3.87, -1.79, 2.31, 6.35),
    'hxly': ( -9.79, -6.30, -0.64, 3.41, 6.98),
    'hxhy': ( -0.73, -4.27, -0.34, 4.24, 7.35),
}).set_index('rank')
df.columns.name = 'bucket'
long = df.stack().rename('figure').to_frame()

# 1 if the figure changes, 0 if the figure stays the same
long['change'] = pulp.LpVariable.matrix(
    name='change', indices=long.index, cat=pulp.LpBinary,
)
# what will be used instead of the original figure
long['replacement'] = pulp.LpVariable.matrix(
    name='replacement', indices=long.index, cat=pulp.LpContinuous,
)
replacements = long['replacement'].unstack('bucket')

# Minimize the number of outlier changes needed
prob = pulp.LpProblem(name='smoothing', sense=pulp.LpMinimize)
prob.setObjective(pulp.lpSum(long['change']))

# Big-M room for change on the figure
M = 2*long['figure'].abs().max()

for (rank, bucket), row in long.iterrows():
    # The replacement can only be used for the figure if change=1
    prob.addConstraint(
        name=f'lower_r{rank}_{bucket}',
        constraint=row['replacement'] >= row['figure'] - M*row['change'],
    )
    prob.addConstraint(
        name=f'upper_r{rank}_{bucket}',
        constraint=row['replacement'] <= row['figure'] + M*row['change'],
    )

# Minimum change for "greater than"
epsilon = 0.1

for rank, row in replacements.iterrows():
    # OP original bucket pair constraints
    prob.addConstraint(
        name=f'r{rank}_pair_a', constraint=row['lxhy'] >= row['lxly'] + epsilon,
    )
    prob.addConstraint(
        name=f'r{rank}_pair_b', constraint=row['hxhy'] >= row['hxly'] + epsilon,
    )
    prob.addConstraint(
        name=f'r{rank}_pair_c', constraint=row['hxly'] >= row['lxly'] + epsilon,
    )
    # constraint D is nonsense

for bucket, col in replacements.items():
    for i_start in range(len(col) - 1):
        select = slice(i_start, i_start + 2)
        first, second = col.iloc[select]
        rank_first, rank_second = col.index[select]
        prob.addConstraint(
            name=f'{bucket}_r_{rank_first}_{rank_second}',
            constraint=first + epsilon <= second,
        )

print(prob)
prob.solve()
assert prob.status == pulp.LpStatusOptimal

long['change'] = long['change'].apply(pulp.value)
long['replacement'] = long['replacement'].apply(pulp.value)
print('Outlier changes:')
print(long.loc[long['change'] > 0.5, ['figure', 'replacement']])

(It's also possible to, in one line, import that data frame from Excel.)

The output is:

Outlier changes:
             figure  replacement
rank bucket                     
1    hxhy     -0.73        -9.69
3    lxly     -1.51        -8.57

Collectives™ on Stack Overflow

Smoothening of data in excel based on certain rules

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related