4

I am facing this complex challenge with an RDS PostgreSQL instance. I am almost out of any idea how to handle it. I am launching an app (React+Go+PostreSQL) for which I expect to have around 250-300 users simultaneously making the same API GET request for how long the users wish to use it.

It is a questionnaire kind of app and users will be able to retrieve one question from the database and answer it, the server will save the answer in the DB, and then the user will be able to press next to fetch the next question. I tested my API endpoint with k6 using 500 virtual users for 2 minutes and the database returns dial: i/o timeout or even connection rejected sometimes, usually when it reaches 6000 requests and I get around 93% success. I tried to fine-tune the rds instance with tcp_keep_alive parameters but without any luck, I still cannot manage to get 100% of the request pass. I also tried to increase the general storage from 20gb min to 100gb in rds and switch from the free db.t3.micro to db.t3.medium size.

Any hint would be much appreciated. It should be possible for a normal golang server with postgres to handle this requests at the same time, shouldn't it? It is just a regular select * from x where y statement.

EDIT (CODE SAMPLE): I use a dependency injection pattern and so I have only one instance of the DB passed to all the other repositories including the API package. The db repo looks like this:

func NewRepository() (DBRepository, error) {
  dbname := getenv("POSTGRES_DATABASE", "")
  username := getenv("POSTGRES_ROOT_USERNAME", "")
  password := getenv("POSTGRES_ROOT_PASSWORD", "")
  host := getenv("POSTGRES_HOST", "")
  port := getenv("POSTGRES_PORT", "")

  dsn := fmt.Sprintf("host=%s user=%s password=%s"+
    " dbname=%s port=%s sslmode=disable 
    TimeZone=Europe/Bucharest", host, username, password, dbname, 
    port)
  db, err := gorm.Open(postgres.Open(dsn), &gorm.Config{})
  if err != nil {
    return nil, err
  } 

  if err != nil {
    return nil, err
  }

  db.AutoMigrate(
   //migrate tables are here
  )

  return &dbRepository{
    db: db,
  }, nil

}

Currently the parameters use in RDS for TCP keepalive are:

tcp_keepalives_count 30

tcp_keepalives_idle 1000

tcp_keepalives_interval 1000 and I also tried with different numbers.

The query I am doing is a simple .Find() statement from gorm package but it seems like this is not the issue since it gets blocked whenever hits the first query/connection with the db. There are 2 query executed in this endpoint I am testing but it gets stuck on the first. If more info is needed I will update but this issue it gets so frustrating.

My k6 test if the following:

import http from 'k6/http';
import { check } from 'k6';
import { sleep } from 'k6';

export const options = {
  insecureSkipTLSVerify: true,
  stages: [
    { target: 225, duration: '2m' },
  ],
};

const access_tokens = []

let random_token = access_tokens[Math.floor(Math.random()*access_tokens.length)];

const params = {
    headers: {'Authorization': `Bearer ${random_token}`}
};

export default function () {
  let res =  http.get('endpoint here', params);
  check(res, {'Message': (r)=> r.status === 202});  
  sleep(1);
}

The DB tables are also indexed and tested with the explain statement.

4
  • Check the logs and monitor the number of open connections as the test runs (here and perhaps limit this). My guess would be that the issue is in your code (for example, not closing result sets). For a real answer provide a minimal, reproducible example. Commented Jul 26, 2022 at 22:06
  • It needs to support 300, and it craps out at 500. What is the problem? This sounds like victory. Commented Jul 27, 2022 at 2:15
  • Hi @Brits, I updated the post and I hope it includes enough information now. Looking forward to hearing from you! Commented Jul 27, 2022 at 8:15
  • "it gets blocked whenever hits the first query/connection with the db" - please expand on this (I thought your API worked OK until under heavy load?). Please try to produce a working example (connect to database, Listen for HTTP connections and run your handler - should be petty short). It may also be worth halving the target and doubling the duration (to confirm if it is the load or related to the number of queries run). Sorry - can't really say any more with the info available (it's possible you are hitting a database limit we have insufficient info to comment). Commented Jul 27, 2022 at 9:06

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.