3

I have PostgreSQL table with ~ 50 million rows, I want to write Go code to select ~ 1 million rows from this table, and process them in efficient way.

Previous time i used nodejs and this NPM module pg-query-stream to generate readable stream of records found, so i can process them like any readable object stream.

Here I post simplified code I used to process data:


const pg = require('pg');
const QueryStream = require('pg-query-stream');

 
//pipe 1,000,000 rows to stdout without blowing up your memory usage
pg.connect((err, client, done) => {
  if (err) throw err;
  const query = new QueryStream('SELECT * FROM generate_series(0, $1) num', [1000000]);
  const stream = client.query(query);
  //release the client when the stream is finished
  stream.on('end', done);
  stream.on('data', function(data) { 
    stream.pause();
    funcDoSomethingWithDataAsync(data, function(error) {
      if(error) throw error;
      stream.resume();
    });
 };
})

How can I emulate readable stream of database records in Go? Does sql.Scanner in Go works with streaming query results like nodejs module does?

I already have optimized queries that works ok, I just want to stream query execution result to Go, like its done in nodejs library.

2
  • are you performing computations in the iterations? how long does iterating over 1 million rows take you? I am having same issue and iterating over 10,000 rows is taking like 20 minutes to complete...trying to reduce that Commented Jan 27, 2022 at 5:52
  • Even better way to stream lots of records in NodeJS - pg-iterator. Commented Nov 23, 2022 at 11:14

2 Answers 2

13

Yes, it works very much the same: execute the query, iterate through the results. Here's a simple example using lib/pq which is the Postgres version of database/sql.

Make the Query and then iterate through the Rows.

rows, err := db.Query(`SELECT * FROM generate_series(0, $1) num`, 1000000)
if err != nil {
   panic(err)
}
 
defer rows.Close()
for rows.Next() {
    var num int
 
    err = rows.Scan(&num)
    if err != nil {
       panic(err)
    }
 
    fmt.Println(num)
}
Sign up to request clarification or add additional context in comments.

5 Comments

as far as i understand how github.com/lib/pg works, it saves all query result in RAM, and then parses it with rows.Next - it seems to consume a lot of RAM, instead of streaming nodejs library. I'm gratefull for answer, but i cannot consider this as answer yet.
No, it has a buffer to fetch some of the results to avoid doing a network call for each row. Try it with various sizes of series and check your memory usage, it should remain constant.
i'll try your code with profiler and provide result here, it will save me a lot of effort, if it works like this
@vodolaz095 Stick a sleep in the loop, run it with 10 vs 10000000 numbers, and compare the memory size.
Hey @vodolaz095 what's your conclusion?
8

I observed memory usage of Go program which uses lib/pq:

  • When the result set returned is 0 or 1 row, memory used is around 15MB
  • When result set has 50K rows (each of 200bytes, amount to 10MB). I put sleep while looping through results but memory increase was only 2 MB. Even with multiple such heavy requests memory rose by few MBs

So Schwern is correct that pq maintains buffer for reading results.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.