When I'm running a COPY my_table FROM 'my_table_source.csv' WITH HEADER CSV; is it possible to extract the row number within the csv file and add that information into my target table? I have some flat files coming from external sources and going to multiple databases that would be useful to trace backwards during occasional audits done down the road. Thanks.
1 Answer
Add column names and omit the name of the serial:
CREATE TEMP TABLE passwords (
seq serial not null PRIMARY KEY
, name text
, passwd text
, uid integer not null
, gid integer not null
, gcos text
, home text
, shell text
);
COPY passwords(name,passwd,uid,gid,gcos,home,shell)
FROM '/etc/passwd' WITH csv DELIMITER ':' ;
SELECT * FROM passwords
WHERE seq < 10
;
Output:
CREATE TABLE
COPY 48
seq | name | passwd | uid | gid | gcos | home | shell
-----+--------+--------+-----+-------+--------+----------------+-------------------
1 | root | x | 0 | 0 | root | /root | /bin/bash
2 | daemon | x | 1 | 1 | daemon | /usr/sbin | /usr/sbin/nologin
3 | bin | x | 2 | 2 | bin | /bin | /usr/sbin/nologin
4 | sys | x | 3 | 3 | sys | /dev | /usr/sbin/nologin
5 | sync | x | 4 | 65534 | sync | /bin | /bin/sync
6 | games | x | 5 | 60 | games | /usr/games | /usr/sbin/nologin
7 | man | x | 6 | 12 | man | /var/cache/man | /usr/sbin/nologin
8 | lp | x | 7 | 7 | lp | /var/spool/lpd | /usr/sbin/nologin
9 | mail | x | 8 | 8 | mail | /var/mail | /usr/sbin/nologin
(9 rows)
4 Comments
cstork
The documentation on the serial type says "Because ... serial ... are implemented using sequences, there may be ... gaps in the sequence of values ..., even if no rows are ever deleted." Hence, this behavior does not seem to be guaranteed (even though I never saw a deviation). Is there a way to be sure that the row numbers are gapless?
wildplasser
Short answer: no. If you want a gapless enumeration, you could use
row_number() over (ORDER BY seq) AS rncstork
Seems you're right. :-) Another quote of the fine documentation: "Because
nextval and setval calls are never rolled back, sequence objects cannot be used if “gapless” assignment of sequence numbers is needed. It is possible to build gapless assignment by using exclusive locking of a table containing a counter; but this solution is much more expensive than sequence objects, especially if many transactions need sequence numbers concurrently."cstork
... and using the less expensive
row_number() seems sound since "with a cache setting of one it is safe to assume that nextval values are generated sequentially" (referring to the cache option of the implicit CREATE SEQUENCE statement).
PROGRAMfeature ofCOPYand run the file through a program that adds a row number to each line. That would necessitate also adding a field to the table. This would be a variation of idea from @wildplasser which uses aSERIALtype field. In either case I'm not sure how reliable this is. It would take just one change in the file to unlink the relationship. Seems better to have some set of fields per that constitute a primary key.