0

In my Postgres 13 DB I want to update a column mother_tongues_id in my table based on name columns.

Names are like "Le Minh Thien" and I want to check if those contain words like 'le' or 'lee' then set nationality to 'Vietnam'. The words check table is some thing like this:

language noun
vietnam +lee+le+long+la+
chinese +lu+zhu+

Here is my UPDATE code:

UPDATE resume
SET mother_tongues_id = (
    SELECT id FROM mother_tongues
    WHERE check_language(resume.id) = id
    limit 1);
WHERE mother_tongues_id IS NULL;

Here is my check_language() function:

CREATE OR REPLACE FUNCTION check_language(rid integer) RETURNS integer AS
$BODY$
    DECLARE i int = (SELECT (LENGTH(CONCAT(first_name, ' ',last_name)) - LENGTH(replace(CONCAT(first_name, ' ',last_name), ' ', ''))) FROM resume where id = rid);
    DECLARE sname varchar = '';
    DECLARE sid int = 0
    LOOP
        SET i = i - 1;
        SET sname = SELECT split_part(SELECT CONCAT(first_name, ' ',last_name) FROM resume  where id = rid, ' ', i)
        if select id from mother_tongues WHERE noun ILIKE '%+sname+%' then SET sid = id;
        end if;
        IF i = 1 THEN
            EXIT;
        END IF;
    END LOOP;
    if sid > 0 then RETURN sid;
    else RETURN NULL;
    end if;
END;

$BODY$ 
LANGUAGE plpgsql;

I try to create a function in Postgres with loop to split the name and check the words, however always error.

2 Answers 2

4

Your function is extremely inefficient, errors aside.
You certainly don't need a function for this to begin with. A plain UPDATE with a correlated subquery matching with the array overlap operator && does the job:

UPDATE resume
SET    mother_tongues_id = (
    SELECT m.id
    FROM   mother_tongues m
    WHERE  string_to_array(lower(concat_ws(' ', r.first_name, r.last_name)), ' ')
       &&  string_to_array(trim(m.noun, '+'), '+')
    LIMIT  1
    )
WHERE  mother_tongues_id IS NULL;

Of course, this is just putting lipstick on a pig. You should really start by fixing your DB schema. As a minimum, convert mother_tongues.noun to text[], so we don't have to do it on the fly. And add a GIN index on that column. See:

Or properly normalize the design.

More fundamentally, deciding nationality based on typical names is a dubious approach. Besides being hardly valid, there can be any number of conflicts, like multiple matches. Currently, LIMIT 1 simply takes an arbitrary match in this case.

If you really need to loop in a PL/pgSQL function (which you don't in this case), read here:

Sign up to request clarification or add additional context in comments.

Comments

0

You have some syntax errors, here is a fix, have a try again.

CREATE OR REPLACE FUNCTION check_language(rid integer) RETURNS integer AS
$BODY$
    DECLARE 
        i int = (SELECT (LENGTH(CONCAT(first_name, ' ',last_name)) - LENGTH(replace(CONCAT(first_name, ' ',last_name), ' ', ''))) FROM resume where id = rid);
        sname varchar(255) = '';
        sid int = 0;
    BEGIN
        LOOP
            SET i = i - 1;
            SET sname = (SELECT split_part(CONCAT(first_name, ' ',last_name), ' ', i) FROM resume  where id = rid);
            if select id from mother_tongues WHERE noun ILIKE '%+sname+%' then 
                SET sid = id;
            end if;
            IF i = 1 THEN
                EXIT;
            END IF;
        END LOOP;
        if sid > 0 then 
            RETURN sid;
        else 
            RETURN NULL;
        end if;
    END;
$BODY$ 
LANGUAGE plpgsql;

UPDATE resume
SET mother_tongues_id = (
    SELECT id FROM mother_tongues
    WHERE check_language(resume.id) = id
    limit 1
)
WHERE mother_tongues_id IS NULL;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.