7

I am trying to insert records into postgres DB, and its taking about 3 hours while it takes 40seconds using python psycopg2 and cursor.copy_from method

What is wrong with my code, using clojure.java.jdbc/db-do-prepared also takes about 3 hours too. Please help!

File size is 175M and it has 409,854 records

(defn-
  str<->int [str]
  (let [n (read-string str)]
    (if (integer? n) n)))

(with-open [file (reader "/path/to/foo.txt")]
    (try
      (doseq [v (clojure-csv.core/parse-csv file)]

        (clojure.java.jdbc/insert! db  :records 
                      nil
                      [(v 0) (v 1) (v 2) (str<->int (v 3))]))
      (println "Records inserted successfully")
      (Exception e
        (println (.getNextException e) e))))

2 Answers 2

3

It's probably due to not using batching in your Clojure version. You insert rows one by one each triggering the commit.

If you want to do it in Clojure than you need to partition rows from CSV files and insert! each chunk as one batch commit. You need to use the last arity version accepting multiple col-val-vecs. Sample code (not checked, just to show the idea):

(defn row->col-spec [row]
  [(v 0) (v 1) (v 2) (str<->int (v 3))])

(with-open [csv-file (reader "/path/to/foo.txt")]
  (try
    (->> csv-file
         (clojure-csv.core/parse-csv)
         (map row->col-spec)
         (partition 50)
         (map (fn [batch] clojure.java.jdbc/insert! db :records ["col1" "col2" "col3" "col4"] batch))
         (dorun))
    (catch Exception e
      (println e))))

If you don't have to do it in Clojure then using psql's COPY command seems to be the easiest and fastest option:

COPY records FROM '/path/to/foo.txt' WITH (FORMAT csv, DELIMITER ',',  NULL 'NULL');
Sign up to request clarification or add additional context in comments.

8 Comments

I'm using Clojure version 1.8.0, can you share an example of how it can be done in clojure
I have added a code example in Clojure - test it as I didn't run it.
Well, I used the PSQL Copy command you suggested, didn't even need to cast each field, this is indeed easier and faster. Thanks
What if I want to run the copy command as \copy, how will I go about it in clojure considering you can't do "\copy" when preparing the query string
@JobSam \copy is not an SQL command, it's a psql "meta-command" and it's not available via JDBC.
|
1

After 4 years, decided to come back to this problem and share a guide to the solution, I am sure this will help someone get started.

You can take a look at clojure.java.jdbc/insert-multi! and edit appropriately to suite the column types in your database

(let [from "/path/to/foo.txt"
      to "/path/to/temp/foo.txt"]
  (with-open [reader (io/reader from)
              writer (io/writer to)]
    (doall
      (->> (csv/read-csv reader)
           ;(drop 1)   ;if theres header
           (map #(list (nth % 0 nil) (nth % 2 nil)  (nth % 3 nil)))
           (csv/write-csv writer))))
  (let [fstream (slurp to)
        streamarray (map #(str/split % #",")
                         (str/split-lines fstream))]
    (clojure.java.jdbc/insert-multi! pg-db              ;connection or {:datasource hk-cp}
                                     :tbl_cdrs_da                 ;table name
                                     [:origin_node_type :origin_transaction_id :da_ua_id] ;colums
                                     streamarray)))               ;array

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.