1

When I'm running a COPY my_table FROM 'my_table_source.csv' WITH HEADER CSV; is it possible to extract the row number within the csv file and add that information into my target table? I have some flat files coming from external sources and going to multiple databases that would be useful to trace backwards during occasional audits done down the road. Thanks.

4
  • By row number do you mean the line number in the file? In other words it is not actually a number in the row just the position in the file? Commented Nov 24, 2020 at 22:43
  • @AdrianKlaver, yes. That's exactly what I mean Commented Nov 24, 2020 at 22:45
  • 1
    Add column names and omit the name of the serial? Commented Nov 25, 2020 at 0:41
  • 1
    @nclu. Then the only way I can think to do this is use the PROGRAM feature of COPY and run the file through a program that adds a row number to each line. That would necessitate also adding a field to the table. This would be a variation of idea from @wildplasser which uses a SERIAL type field. In either case I'm not sure how reliable this is. It would take just one change in the file to unlink the relationship. Seems better to have some set of fields per that constitute a primary key. Commented Nov 25, 2020 at 15:40

1 Answer 1

1

Add column names and omit the name of the serial:


CREATE TEMP TABLE passwords (
        seq serial not null PRIMARY KEY
        , name text
        , passwd text
        , uid integer not null
        , gid integer not null
        , gcos text
        , home text
        , shell text
        );
COPY passwords(name,passwd,uid,gid,gcos,home,shell)
FROM '/etc/passwd' WITH csv DELIMITER ':' ;

SELECT * FROM passwords
WHERE seq < 10
        ;

Output:


CREATE TABLE
COPY 48
 seq |  name  | passwd | uid |  gid  |  gcos  |      home      |       shell       
-----+--------+--------+-----+-------+--------+----------------+-------------------
   1 | root   | x      |   0 |     0 | root   | /root          | /bin/bash
   2 | daemon | x      |   1 |     1 | daemon | /usr/sbin      | /usr/sbin/nologin
   3 | bin    | x      |   2 |     2 | bin    | /bin           | /usr/sbin/nologin
   4 | sys    | x      |   3 |     3 | sys    | /dev           | /usr/sbin/nologin
   5 | sync   | x      |   4 | 65534 | sync   | /bin           | /bin/sync
   6 | games  | x      |   5 |    60 | games  | /usr/games     | /usr/sbin/nologin
   7 | man    | x      |   6 |    12 | man    | /var/cache/man | /usr/sbin/nologin
   8 | lp     | x      |   7 |     7 | lp     | /var/spool/lpd | /usr/sbin/nologin
   9 | mail   | x      |   8 |     8 | mail   | /var/mail      | /usr/sbin/nologin
(9 rows)
Sign up to request clarification or add additional context in comments.

4 Comments

The documentation on the serial type says "Because ... serial ... are implemented using sequences, there may be ... gaps in the sequence of values ..., even if no rows are ever deleted." Hence, this behavior does not seem to be guaranteed (even though I never saw a deviation). Is there a way to be sure that the row numbers are gapless?
Short answer: no. If you want a gapless enumeration, you could use row_number() over (ORDER BY seq) AS rn
Seems you're right. :-) Another quote of the fine documentation: "Because nextval and setval calls are never rolled back, sequence objects cannot be used if “gapless” assignment of sequence numbers is needed. It is possible to build gapless assignment by using exclusive locking of a table containing a counter; but this solution is much more expensive than sequence objects, especially if many transactions need sequence numbers concurrently."
... and using the less expensive row_number() seems sound since "with a cache setting of one it is safe to assume that nextval values are generated sequentially" (referring to the cache option of the implicit CREATE SEQUENCE statement).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.