0

I am currently working on a task that will migrate a date from PostgreSQL to another PostgreSQL database. One field's data needs to be splitted into three columns (e.g. father_name, needs to split to f_name, f_middle_name, f_last_name) I searched over the net and I think I can use string_to_array for this task. Now my problem is how to assign the array index of string to the fields of the destination DB (destination DB has f_name, f_middle_name, f_last_name while source DB has only father_name field).

    cur_t.execute("""
    SELECT TRANSLATE(studentnumber, '- ', ''), string_to_array(father_name)
    cur_p.execute(""" INSERT INTO "a_recipient" (student_id, f_name,   f_middle_name, f_last_name) VALUES ('%s', '%s', '%s', '%s') """ % (row[0]
row[1][0], row[1][1], row[1][2]))

I just don't know how to access the index of the array and assign it as value on the destination fields.

References: string_to_array string_to_array

Any suggestions?

2
  • string_to_array is great for inline SQL use - you dont need to select in puthon and then insert the result in it Commented May 3, 2018 at 7:26
  • Hi, do you think it's okay to use string_to_array in this task or can you recommend something? Commented May 3, 2018 at 7:27

2 Answers 2

1

While it is possible to turn an array into a set of columns you won't have a fixed set of columns. For example, if you split father_name into three pieces that's fine for John Wilkes Booth but what about Yarrow Hock? Or Beyoncé? Or Bernal Diaz Del Castillo? You need something more intelligent than just splitting on whitespace.

While you could write something in Postgresql, probably as a stored procedure, it's easier, though slower, to do the data transforms in Python. Since you have to run the data through Python anyway (or do something complicated to link the two databases), and since this is (hopefully) a one time thing, performance isn't critical.

I'm not very good at Python, but it would be something like this.

cur_t.execute("""SELECT studentnumber, father_name FROM something""")

for row in cur_t:
    father = parse_name(row['father_name'])
    student_id = fix_studentnumber(row['studentnumber'])

    cur_p.execute("""
        INSERT INTO "a_recipient" (student_id, f_name, f_middle_name, f_last_name)
        VALUES ('%s', '%s', '%s', '%s')
        """ % (student_id, father['first'], father['middle'], father['last'])
    )

Then you'd write parse_name and fix_studentnumber and any other necessary functions to clean up the data in Python. And you can unit test them.

Note: because accessing columns by number (ie. row[5]) is difficult to read and maintain you'll probably want to use conn_t.cursor(cursor_factory=psycopg2.extras.DictCursor) so you can access columns by name as I have above.

Sign up to request clarification or add additional context in comments.

3 Comments

noted, thanks for this. This needs to import the extras from psycopg2 right? I will study this.
question about the father and student_id variables inside the loop, do I have to do this for all fields to clean up the data? Also where'd you get parse_name and fix_studentnumber? Sorry
Got it my bad. Long day.
0

why not do it directly n SQL:

vao@so=# create table so12(a text, b text, c text);
CREATE TABLE
vao@so=# with a(i) as (values('1,2,5'))
, s as (select string_to_array(i,',') ar from a)
insert into so12 select ar[1],ar[2],ar[3] from s;
INSERT 0 1
vao@so=# select * from so12;
┌───┬───┬───┐
│ a │ b │ c │
├───┼───┼───┤
│ 1 │ 2 │ 5 │
└───┴───┴───┘
(1 row)

update I missed the point that it happens in scope of several databases, thus you would need to use dblink or create a postgres_fdw foreign table. Both would still be much faster then selecting to array and then looping through rows with insert into .. values(..) statement

4 Comments

I have 32 columns in these table. We're planning to do this as automated since we might have other tasks like these in the future.
still - selecting to python and then inserting from it does not make much sense I'd say
@VaoTsun Note they're selecting from one database connection and inserting to another.
@Schwern yes - thank you for your point. I'm still sure insert values will be slower

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.