2

we have some tough performance issues that seem to us that they shouldn't exist, but we don't see any way of solving them.

We have a Gcloud Mysql database, with 4Cpu, 16go Ram. The full instance contains around 50Go (on ssd) but our main database only 15Go.

We get extreme slowdowns on most pages of our app in the middle of the day, when it is the most used. With some pages taking 40s (or more) to load.

We've extracted the queries and by themselves they take 'only' around 500ms, but as soon as I throw concurrent queries (using Jmeter), they baloon. 5 concurrent queries get us to 3s avg, 10 concurrent to 7s, 20 to 30s...

We've tried all kind of optimizations, indexes (on every column separately, composite of each column of the query). In the end I managed to split the time by two and then three, but it's just pushing the problem further.

We didn't augment the resources of the database further, because we've played the queries on a less powerful database and we only saw a linear downgrade, and we kinda expect that if we double the resources we'd just get twice the gain. Which still doesn't feel like it's enough (on top of being more expensive).

Here is an exemple of one of the problematic queries (a count, used for pagination). This is a simplified version, optimized as much as we could, and still it takes around 150ms which is not enough to prevent it from slowing down when doing concurrent queries.

Each filter in the where clause are user inputs and can't be stored. As I said, we have indexes on issue_date, customer_id, status, workflow and customer_id+status+workflow+issue_date. issue_date is a date, customer_id bigint, status and workflow INTEGER.

select count(i1_0.id) 
from invoice i1_0 
where i1_0.issue_date BETWEEN '2025-01-01' and '2025-07-30' 
and (i1_0.customer_id = 20 
and (i1_0.status, i1_0.workflow) in ((1,1), (2,1), (3,1), (4,1), (5,1), (6,1), (7,1))
);

That specific query returns 70k results for around 4M without any filter.

We've tried experimenting with read locks but didn't get any result.

We don't have any DB admin here so it's possible that we're missing something obvious, regarding the query, the resources, the locks...

Does someone has any idea ?

Update Here is the "explain" for that query

|id |select_type|table|partitions|type |possible_keys                                                                                            |key       |key_len|ref|rows   |filtered|Extra                   |
|---|-----------|-----|----------|-----|---------------------------------------------------------------------------------------------------------|----------|-------|---|-------|--------|------------------------|
|1  |SIMPLE     |i1_0 |          |range|invoice_issue_date_IDX,invoice_customer_id_IDX,invoice_workflow_num_IDX,invoice_status_num_IDX,composite_idx|composite_idx|23     |   |145 523|100     |Using where; Using index|

Update 2 I've upgraded the database to 8cpu, 32Go and tried again. The query alone doesn't go faster (no surprise) but concurrent queries are obviously better. I had to push to 100 concurrent queries to get 1.5s avg, 200 for a 3s average. Obviously it's better but it doesn't feel like a real solution ?

Update 3 The full table. I've removed the indexes because as I stated, I've tried all different kinds and while there were some improvements, nothing solved the core problem. (Which, following @Shadow advice, is probably somewhere else)

CREATE TABLE `invoice` (
  `id` bigint NOT NULL AUTO_INCREMENT,
  `invoice_type_code` varchar(10) NOT NULL COMMENT ,
  `issue_date` date DEFAULT NULL,
  `category_code` varchar(3) DEFAULT NULL COMMENT ,
  `document_currency_code` varchar(3) DEFAULT NULL COMMENT ,
  `tax_exclusive_amount` decimal(19,6) DEFAULT NULL COMMENT ,
  `tax_inclusive_amount` decimal(19,6) DEFAULT NULL COMMENT ,
  `payable_amount` decimal(19,6) DEFAULT NULL COMMENT ,
  `payment_means_code` varchar(11) DEFAULT NULL COMMENT ,
  `datecreated` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
  `dateupdated` timestamp NULL DEFAULT NULL,
  `tax_amount` decimal(19,6) DEFAULT NULL COMMENT ,
  `percent` decimal(19,1) DEFAULT NULL,
  `supplier_id` bigint DEFAULT NULL,
  `customer_id` bigint DEFAULT NULL,
  `downloaded` bit(1) DEFAULT NULL COMMENT ,
  `status_date` datetime DEFAULT NULL COMMENT ,
  `submission_type` varchar(100) DEFAULT NULL COMMENT ,
  `tax_type` varchar(100) DEFAULT NULL COMMENT ,
  `market_number` varchar(100) DEFAULT NULL COMMENT ,
  `obligation_number` varchar(50) DEFAULT NULL,
  `comment_status` varchar(2000) DEFAULT NULL,
  `invoice_number` varchar(20) NOT NULL COMMENT,
  `flux_id` bigint DEFAULT NULL,
  `supplier_service` varchar(100) DEFAULT NULL,
  `customer_service` varchar(100) DEFAULT NULL,
  `metadata` longtext NOT NULL COMMENT,
  `due_date` date DEFAULT NULL COMMENT,
  `iban` varchar(34) DEFAULT NULL COMMENT,
  `department_id` bigint DEFAULT NULL,
  `origin_number` varchar(20) DEFAULT NULL,
  `supplier_auxiliary_code` varchar(100) DEFAULT NULL COMMENT ,
  `supplier_type` varchar(100) DEFAULT NULL COMMENT ,
  `supplier_type_code` varchar(100) DEFAULT NULL COMMENT ,
  `accounting_date` date DEFAULT NULL COMMENT ,
  `custom_info` varchar(500) DEFAULT NULL,
  `payment_date` date DEFAULT NULL COMMENT ,
  `invoice_format` enum() NOT NULL,
  `portal` enum() DEFAULT NULL,
  `is_subscriber_deposit` bit(1) DEFAULT b,
  `owner_type` enum() DEFAULT NULL,
  `from_emission` bit(1) DEFAULT b,
  `extracted_from_document` varchar(255) DEFAULT NULL,
  `invoice_number_back_03_25` varchar(20) DEFAULT NULL,
  `document_type` bigint NOT NULL,
  `purchase_accounting_downloaded` bit(1) DEFAULT b,
  `status` int DEFAULT NULL,
  `workflow` int DEFAULT NULL,
  PRIMARY KEY (`id`),
  CONSTRAINT `fk_invoice_to_customer` FOREIGN KEY (`customer_id`) REFERENCES `structure` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT,
  CONSTRAINT `fk_invoice_to_flux` FOREIGN KEY (`flux_id`) REFERENCES `invoice_flux` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT,
  CONSTRAINT `fk_invoice_to_supplier` FOREIGN KEY (`supplier_id`) REFERENCES `structure` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT
) ENGINE=InnoDB AUTO_INCREMENT=5104843 DEFAULT CHARSET=utf8mb3 COMMENT=

Update 4: Answers to Rick James

Result of the SHOW TABLE STATUS LIKE 'invoice';

|Name   |Engine|Version|Row_format|Rows     |Avg_row_length|Data_length  |Max_data_length|Index_length |Data_free|Auto_increment|Create_time        |Update_time        |Check_time|Collation         |Checksum|Create_options|Comment                                             |
|-------|------|-------|----------|---------|--------------|-------------|---------------|-------------|---------|--------------|-------------------|-------------------|----------|------------------|--------|--------------|----------------------------------------------------|
|invoice|InnoDB|10     |Dynamic   |4 295 224    |319           |1 370 488 832    |0              |3 908 190 208 |4 194 304|5 109 810        |2025-09-02 02:42:28|2025-09-02 02:54:21|          |utf8mb3_general_ci|        |              |

The column metadata contains strings of length between 40 and 170 bytes, also very often null.

On the environment where I'm realizing my tests:

  • innodb_buffer_pool_size: 4 563 402 752
  • RAM: 8GB (but I raised it to 32GB during my test, before lowering it back)
  • Volume on the ssd 37GB (shared with other databases)
  • Volume in the DB 16GB
  • Volume on the table 4GB

On the actual production environment:

  • innodb_buffer_pool_size: 11 811 160 064
  • RAM: 16GB
  • Total Volume on the ssd 50GB (shared with other databases)
  • Volume in the DB 16GB
  • Volume on the table 4GB

As for the last test, the count query prefers that index (by using explain):

customer_id, workflow_num, status_num, issue_date

9
  • Can you please rund EXPLAIN ANALYZE on query and share the results so we know index are used or not/ Commented Aug 29 at 8:19
  • I will but since I've tried all kind of different permutations and nothing panned out, I didn't feel it was relevant. Commented Aug 29 at 8:27
  • What are typical values for this count and the count of the full table? The use of the index might actually be making the query slower if it still has to select a large portion of the table. I'm just speculating, but the slowdown might be caused by a reduction in available memory/cache per query when you submit multiple queries at once. Have you tried giving mysql more memory? Commented Aug 29 at 8:52
  • I've updated the question, for that specific query I get 70k results out of 4M lines. We've tried 8Go and 16Go of memory and the loss/gains were proportional (x2 slower/quicker). We haven't tried to push it further yet. Commented Aug 29 at 9:18
  • Unfortunately, this seems to be an operational issue, not a programming one based on the information you provide. This question is better suited for the dba.stackexchange.com sister site of SO. Commented Aug 29 at 9:46

2 Answers 2

1

Create an index by (customer_id, status, workflow, issue_date, id) (if not exists, first 3 columns posession may be mixed) then test the next query:

SELECT SUM(amount) 
FROM (
    SELECT COUNT(id)
    FROM invoice
    WHERE issue_date BETWEEN '2025-01-01' and '2025-07-30' 
      and customer_id = 20 
      and status = 1
      AND workflow = 1
  UNION ALL
    SELECT COUNT(id)
    FROM invoice
    WHERE issue_date BETWEEN '2025-01-01' and '2025-07-30' 
      and customer_id = 20 
      and status = 2
      AND workflow = 1
  UNION ALL
.......
  UNION ALL
    SELECT COUNT(id)
    FROM invoice
    WHERE issue_date BETWEEN '2025-01-01' and '2025-07-30' 
      and customer_id = 20 
      and status = 7
      AND workflow = 1
  ) AS subquery (amount)

If id is primary key then remove it from the index and use COUNT(*).

Sign up to request clarification or add additional context in comments.

3 Comments

Your query idea presented a real gain in performances (x2), so thank you. It didn't solve the core issue of scaling though, so I'll probably validate your answer in a few days but move my question to dba.
Could you explain where do you think the gain comes from ?
Each separate subquery may use the index completely, by 3+1 columns, whereas initial query may use only 2+1 columns in the index.
1

What is the value of innodb_buffer_pool_size? (It should be about 70% of RAM.) How much RAM? What is the typical size of metadata ?

Please provide `SHOW TABLE STATUS LIKE 'invoice';`
`    SELECT  COUNT(*)
        FROM  invoice
        WHERE  issue_date BETWEEN '2025-01-01' AND '2025-07-30'
          AND  customer_id = 20
          AND  status BETWEEN 1 AND 7
          AND  workflow = 1 

Together with

    INDEX(customer_id, workflow, status, issue_date)
    INDEX(customer_id, workflow, issue_date, status)

4 Comments

I've updated my answer with your questions/remarks =)
The buffer_pool_sizes look good. The EXPLAIN probably says "Using index", which implies that INDEX's BTree was being used without touching the Data's BTree. This should decrease the I/O, possibly leading the queries being run entirely in the buffer_pool -- hence a good speedup.
Following this link dba.stackexchange.com/questions/27328/… and the command to check the RIBPS, I got 38 RIBPS for prod and 22 for test. Which are much superior to what we currently have on both. Could that be the bottleneck ?
I've also found your answer, that I'm reading/processing

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.