Postgres has a great RETURNING clause for INSERT, DELETE and UPDATE...and it's made me a bit greedy. In a few cases, what I'd like to get is not only the current value, but the previous value:
UPDATE analytic_productivity
SET points = 1000
WHERE points > 1000
RETURNING id, points, OLD.points;
I don't believe there's any way to access previous values outside of the lifespan and context of a trigger. So, I'll guess what I'd like isn't possible as such. If that's right, can anyone suggest an alternative? I'm overwriting outliers with some set values, and would like to record the modified values in another table. This is why I don't know the current value in advance. This is a rare (and clearly suspect) operation, and I don't want to record the change on normal inserts and updates.
As an alternative, I'm thinking that I can select the outliers, revise them, and then write back the modifications. So, do most of the work on the client side with a couple of requests to Postgres. If so, can someone suggest the right locking level to apply between my initial SELECT and my following UPDATE? I believe that the FOR UPDATE lock is right.
Any suggestions on a smart way to capture previous values, during an update, without a trigger would be great to hear about.
Follow-up
Thanks to comments here, I experimented a bit and came up with a solution that works in my case. To make my objectives clearer:
- I've got a table named
outlier_rulethat defines values that are too high for a specific column. - The goal is to loop over the table, and apply the rules to set outliers to a fixed value.
- Stomping on outliers like this is...questionable. There must be leaks in the app's UI that allow for unreasonable values. To help track these down, I'm recording the large values in a table named
outlier_change. - I'd like to push this behavior into server-side function so that any of our servers, regardless of their codebase version, can invoke the current logic.
- The client servers compose and send an email with a result summary, when outliers are found and corrected.
So, a server-side function to do everything, log some data, and return a result. I've got that working, but it's got the smell of You Don't Know What You're Doing So Just Keep Adding Code Until it Works. I've at least got a better handle on using FORMAT and think I understand now that a single function can do many things, and that you can choose what to return with the RETURN clause. For reference, the various bits of code:
CREATE TABLE IF NOT EXISTS data.outlier_rule (
id uuid NOT NULL DEFAULT extensions.gen_random_uuid(),
schema_name text NOT NULL DEFAULT NULL,
table_name text NOT NULL DEFAULT NULL,
column_name text NOT NULL DEFAULT NULL,
threshold integer,
set_to integer,
CONSTRAINT outlier_rule_id_pkey
PRIMARY KEY (schema_name,table_name,column_name)
);
For tracking the modifications, I've got a second table named outlier_change:
------------------------------
-- Table
------------------------------
DROP TABLE IF EXISTS data.outlier_change CASCADE;
CREATE TABLE IF NOT EXISTS data.outlier_change (
id uuid NOT NULL DEFAULT NULL,
outlier_rule_id uuid NOT NULL DEFAULT NULL,
value_was integer NOT NULL DEFAULT NULL,
set_to integer NOT NULL DEFAULT NULL,
change_count integer NOT NULL DEFAULT 0,
last_changed_dts timestamptz NOT NULL DEFAULT NOW(),
CONSTRAINT outlier_change_id_pkey
PRIMARY KEY (id,outlier_rule_id)
);
ALTER TABLE data.outlier_change OWNER TO user_change_structure;
------------------------------
-- Trigger Function
------------------------------
CREATE OR REPLACE FUNCTION data.on_outlier_change_upsert()
RETURNS pg_catalog.trigger AS $BODY$
BEGIN
NEW.last_changed_dts := NOW();
NEW.change_count := OLD.change_count + 1;
RETURN NEW; -- important!
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
------------------------------
-- Trigger
------------------------------
CREATE TRIGGER outlier_change_upsert BEFORE INSERT OR UPDATE ON data.outlier_change
FOR EACH ROW
EXECUTE PROCEDURE data.on_outlier_change_upsert();
DROP FUNCTION IF EXISTS data.outlier_fix ();
CREATE OR REPLACE FUNCTION data.outlier_fix ()
RETURNS TABLE (
schema_name text,
table_name text,
column_name text,
id uuid,
value_was integer,
set_to integer,
change_count integer
)
AS $$
DECLARE
rule record;
now_ timestamptz = NOW();
BEGIN
FOR rule IN SELECT * FROM data.outlier_rule LOOP
EXECUTE FORMAT (
'INSERT INTO outlier_change (
outlier_rule_id,
set_to,
id,
value_was)
SELECT %6$L,
%5$s,
%2$I.id,
%2$I.%3$I
FROM %1$I.%2$I
WHERE %3$I > %4$s
ON CONFLICT(id,outlier_rule_id) DO UPDATE SET
value_was = EXCLUDED.value_was,
set_to = EXCLUDED.set_to
RETURNING outlier_rule_id,
id,
value_was,
set_to
change_count;
UPDATE %1$I.%2$I
SET %3$I = %5$s
WHERE %3$I > %4$s;',
rule.schema_name,
rule.table_name,
rule.column_name,
rule.threshold,
rule.set_to,
rule.id);
END LOOP;
RETURN QUERY EXECUTE ('
SELECT outlier_rule.schema_name,
outlier_rule.table_name,
outlier_rule.column_name,
outlier_change.id,
outlier_change.value_was,
outlier_change.set_to,
outlier_change.change_count
FROM outlier_change
JOIN outlier_rule ON (outlier_rule.id = outlier_change.outlier_rule_id)
WHERE last_changed_dts = $1')
USING now_;
END;
$$ LANGUAGE plpgsql;
ALTER FUNCTION data.outlier_fix() OWNER TO user_bender;
INSERT INTO another_table SELECT * FROM analytic_productivity WHERE points > 1000;, then run yourUPDATEstatement without caring forOLDvalues. Sure, you'd need to duplicate the outlier condition, but I doubt that matters.