I'm trying to optimize the following query:
SELECT a2 AS 'b_actual_pair',
a1 AS 'c_actual_date',
a3 AS 'd_actual_value',
b1 AS 'e_1m_date',
b3 AS 'f_1m_value',
c1 AS 'g_2m_date',
c3 AS 'h_2m_value',
d1 AS 'i_3m_date',
d3 AS 'j_3m_value',
e1 AS 'k_4m_date',
e3 AS 'l_4m_value',
f1 AS 'm_5m_date',
f3 AS 'n_5m_value'
FROM (SELECT crd.b_date AS 'a1',
crd.c_pair AS 'a2',
crd.d_value AS 'a3'
FROM item_raw_data crd
WHERE crd.a_unique_id > ( (SELECT crd.a_unique_id
FROM item_raw_data crd
ORDER BY crd.a_unique_id DESC
LIMIT 0, 1) - ((SELECT
Count(DISTINCT c_pair)
FROM item_raw_data)) )
ORDER BY crd.b_date DESC) a,
(SELECT crd.b_date AS 'b1',
crd.c_pair AS 'b2',
crd.d_value AS 'b3'
FROM item_raw_data crd
WHERE crd.b_date < ( (SELECT crd.b_date
FROM item_raw_data crd
ORDER BY crd.b_date DESC
LIMIT 0, 1) - INTERVAL 1 minute )
AND crd.a_unique_id > ( (SELECT Max(x.a_unique_id)
FROM (SELECT crd.a_unique_id,
crd.b_date,
crd.c_pair,
crd.d_value
FROM item_raw_data crd
WHERE crd.b_date < (
(SELECT crd.b_date
FROM item_raw_data
crd
ORDER BY crd.b_date
DESC
LIMIT 0, 1) -
INTERVAL 1 minute )
ORDER BY crd.b_date DESC) x) -
((SELECT Count(DISTINCT c_pair)
FROM item_raw_data)) )
ORDER BY crd.b_date DESC) b,
(SELECT crd.b_date AS 'c1',
crd.c_pair AS 'c2',
crd.d_value AS 'c3'
FROM item_raw_data crd
WHERE crd.b_date < ( (SELECT crd.b_date
FROM item_raw_data crd
ORDER BY crd.b_date DESC
LIMIT 0, 1) - INTERVAL 2 minute )
AND crd.a_unique_id > ( (SELECT Max(x.a_unique_id)
FROM (SELECT crd.a_unique_id,
crd.b_date,
crd.c_pair,
crd.d_value
FROM item_raw_data crd
WHERE crd.b_date < (
(SELECT crd.b_date
FROM item_raw_data
crd
ORDER BY crd.b_date
DESC
LIMIT 0, 1) -
INTERVAL 2 minute )
ORDER BY crd.b_date DESC) x) -
((SELECT Count(DISTINCT c_pair)
FROM item_raw_data)) )
ORDER BY crd.b_date DESC) c,
(SELECT crd.b_date AS 'd1',
crd.c_pair AS 'd2',
crd.d_value AS 'd3'
FROM item_raw_data crd
WHERE crd.b_date < ( (SELECT crd.b_date
FROM item_raw_data crd
ORDER BY crd.b_date DESC
LIMIT 0, 1) - INTERVAL 3 minute )
AND crd.a_unique_id > ( (SELECT Max(x.a_unique_id)
FROM (SELECT crd.a_unique_id,
crd.b_date,
crd.c_pair,
crd.d_value
FROM item_raw_data crd
WHERE crd.b_date < (
(SELECT crd.b_date
FROM item_raw_data
crd
ORDER BY crd.b_date
DESC
LIMIT 0, 1) -
INTERVAL 3 minute )
ORDER BY crd.b_date DESC) x) -
((SELECT Count(DISTINCT c_pair)
FROM item_raw_data)) )
ORDER BY crd.b_date DESC) d,
(SELECT crd.b_date AS 'e1',
crd.c_pair AS 'e2',
crd.d_value AS 'e3'
FROM item_raw_data crd
WHERE crd.b_date < ( (SELECT crd.b_date
FROM item_raw_data crd
ORDER BY crd.b_date DESC
LIMIT 0, 1) - INTERVAL 4 minute )
AND crd.a_unique_id > ( (SELECT Max(x.a_unique_id)
FROM (SELECT crd.a_unique_id,
crd.b_date,
crd.c_pair,
crd.d_value
FROM item_raw_data crd
WHERE crd.b_date < (
(SELECT crd.b_date
FROM item_raw_data
crd
ORDER BY crd.b_date
DESC
LIMIT 0, 1) -
INTERVAL 4 minute )
ORDER BY crd.b_date DESC) x) -
((SELECT Count(DISTINCT c_pair)
FROM item_raw_data)) )
ORDER BY crd.b_date DESC) e,
(SELECT crd.b_date AS 'f1',
crd.c_pair AS 'f2',
crd.d_value AS 'f3'
FROM item_raw_data crd
WHERE crd.b_date < ( (SELECT crd.b_date
FROM item_raw_data crd
ORDER BY crd.b_date DESC
LIMIT 0, 1) - INTERVAL 5 minute )
AND crd.a_unique_id > ( (SELECT Max(x.a_unique_id)
FROM (SELECT crd.a_unique_id,
crd.b_date,
crd.c_pair,
crd.d_value
FROM item_raw_data crd
WHERE crd.b_date < (
(SELECT crd.b_date
FROM item_raw_data
crd
ORDER BY crd.b_date
DESC
LIMIT 0, 1) -
INTERVAL 5 minute )
ORDER BY crd.b_date DESC) x) -
((SELECT Count(DISTINCT c_pair)
FROM item_raw_data)) )
ORDER BY crd.b_date DESC) f
WHERE
a.a2 = b.b2
and
b.b2 = c.c2
and
c.c2 = d.d2
and
d.d2 = e.e2
and
e.e2 = f.f2
The output of this query is the following:
1. The data behind items_raw_data is the following:
-every 5 seconds 110 items are inserted in the database with their current prices
-actual_pair or c_pair is a reference to a main table which contains the full item description, not of much relevance
-luckily the 110 rows are inserted in 2 seconds, leaving a gap of 3 seconds. This makes it easier to build the queries.
2. The aim of this query is to use this data to do a chart with live prices but we need to further expand the query to (example) 10 minutes, 15 minutes, 1 hour, 2 hours etc... So you will be able to see an items price with its value now and few moments back in time.
3. The problem is that this query takes 2.5 seconds to run with 450.000 (few hours data present and we need up to 1 week of data) total data rows and 6 set of data (actual, 1m, 2m, 3m, 4m, 5m).
What have we tried so far:
Using tables as Memory instead of INNO reduced the query times from 2.5 seconds to 2 seconds. The system has 64gb ECC RAM and a 12 core CPU and an NVMe drive, hardware shouldn't be a problem.
Finding all this data for each item individually yields worse results than all the items together.
Doing exactly the same in server-side language (Java) through threaded code is slower also.
Tried using Inner Join's instead of Where, similar result.
An easier to view query:
SELECT crd.b_date AS 'a1',
crd.c_pair AS 'a2',
crd.d_value AS 'a3'
FROM items_raw_data crd
WHERE crd.a_unique_id > ( (SELECT crd.a_unique_id
FROM items_raw_data crd
ORDER BY crd.a_unique_id DESC
LIMIT 0, 1) - ((SELECT Count(DISTINCT c_pair)
FROM items_raw_data)) )
ORDER BY crd.b_date DESC
And the results:
This is for the actual prices only.
Table description:
Update 1
Added explain:
Update 2
Here are the links to download the database, with the query also (own hosting server):
Item raw data SQL: https://cloud.technorah.com/index.php/s/sR3mdK2Oos2EbC3
SQL Query: https://cloud.technorah.com/index.php/s/bdndmLGAUfpduif
Update 3
Using @hunteke query gave a result of 4.7 seconds, which was very strange because the query and the advices seem logical.
Using @hunteke's tips we changed the following:
SELECT a_unique_id FROM item_raw_data ORDER BY a_unique_id DESC LIMIT 0, 1to
SELECT MAX(a_unique_id) FROM item_raw_data
this lowered the query time from 2.8 seconds to 2.7 seconds. Adding USE INDEX(primary) on main queries further improved the time from 2.7 seconds to 2.6 seconds.
Update 4
We failed at a basic task, using Timestamp instead of int(11) - primary key in the order by. The latest inserted date is also the latest unique id inserted. So changing from ORDER BY crd.b_date to ORDER BY crd.a_unique_id lowered the query by more than 1 SECOND, from 2.6 to 1.3, almost half.
So the actual query looks like this, completly reworked. The execution time changed from 1.3 seconds to 0.55s
SELECT *
FROM
(SELECT sub.a_unique_id AS 'a0',
sub.b_date AS 'a1',
sub.c_pair AS 'a2',
sub.d_value AS 'a3'
FROM (SELECT *
FROM item_raw_data) sub,
(SELECT crd.a_unique_id AS 'max_id',
crd.b_date AS 'xdate'
FROM item_raw_data crd
ORDER BY crd.a_unique_id DESC
LIMIT 0, 1) aux
WHERE sub.b_date <= aux.xdate
AND sub.a_unique_id > ( aux.max_id - (SELECT
Count(DISTINCT c_pair) AS
max_rows
FROM item_raw_data)
)
ORDER BY sub.a_unique_id DESC) a,
(SELECT sub.a_unique_id AS 'b0',
sub.b_date AS 'b1',
sub.c_pair AS 'b2',
sub.d_value AS 'b3'
FROM (SELECT *
FROM item_raw_data) sub,
(SELECT crd.a_unique_id AS 'max_id',
crd.b_date AS 'xdate'
FROM item_raw_data crd
WHERE crd.b_date < (SELECT ( crdx.b_date - INTERVAL 1 minute )
FROM item_raw_data crdx
ORDER BY crdx.a_unique_id DESC
LIMIT 0, 1)
ORDER BY crd.a_unique_id DESC
LIMIT 0, 1) aux
WHERE sub.b_date <= aux.xdate
AND sub.a_unique_id > ( aux.max_id - (SELECT
Count(DISTINCT c_pair) AS
max_rows
FROM item_raw_data)
)
ORDER BY sub.a_unique_id DESC) b,
(SELECT sub.a_unique_id AS 'c0',
sub.b_date AS 'c1',
sub.c_pair AS 'c2',
sub.d_value AS 'c3'
FROM (SELECT *
FROM item_raw_data) sub,
(SELECT crd.a_unique_id AS 'max_id',
crd.b_date AS 'xdate'
FROM item_raw_data crd
WHERE crd.b_date < (SELECT ( crdx.b_date - INTERVAL 2 minute )
FROM item_raw_data crdx
ORDER BY crdx.a_unique_id DESC
LIMIT 0, 1)
ORDER BY crd.a_unique_id DESC
LIMIT 0, 1) aux
WHERE sub.b_date <= aux.xdate
AND sub.a_unique_id > ( aux.max_id - (SELECT
Count(DISTINCT c_pair) AS
max_rows
FROM item_raw_data)
)
ORDER BY sub.a_unique_id DESC) c,
(SELECT sub.a_unique_id AS 'd0',
sub.b_date AS 'd1',
sub.c_pair AS 'd2',
sub.d_value AS 'd3'
FROM (SELECT *
FROM item_raw_data) sub,
(SELECT crd.a_unique_id AS 'max_id',
crd.b_date AS 'xdate'
FROM item_raw_data crd
WHERE crd.b_date < (SELECT ( crdx.b_date - INTERVAL 3 minute )
FROM item_raw_data crdx
ORDER BY crdx.a_unique_id DESC
LIMIT 0, 1)
ORDER BY crd.a_unique_id DESC
LIMIT 0, 1) aux
WHERE sub.b_date <= aux.xdate
AND sub.a_unique_id > ( aux.max_id - (SELECT
Count(DISTINCT c_pair) AS
max_rows
FROM item_raw_data)
)
ORDER BY sub.a_unique_id DESC) d,
(SELECT sub.a_unique_id AS 'e0',
sub.b_date AS 'e1',
sub.c_pair AS 'e2',
sub.d_value AS 'e3'
FROM (SELECT *
FROM item_raw_data) sub,
(SELECT crd.a_unique_id AS 'max_id',
crd.b_date AS 'xdate'
FROM item_raw_data crd
WHERE crd.b_date < (SELECT ( crdx.b_date - INTERVAL 4 minute )
FROM item_raw_data crdx
ORDER BY crdx.a_unique_id DESC
LIMIT 0, 1)
ORDER BY crd.a_unique_id DESC
LIMIT 0, 1) aux
WHERE sub.b_date <= aux.xdate
AND sub.a_unique_id > ( aux.max_id - (SELECT
Count(DISTINCT c_pair) AS
max_rows
FROM item_raw_data)
)
ORDER BY sub.a_unique_id DESC) e,
(SELECT sub.a_unique_id AS 'f0',
sub.b_date AS 'f1',
sub.c_pair AS 'f2',
sub.d_value AS 'f3'
FROM (SELECT *
FROM item_raw_data) sub,
(SELECT crd.a_unique_id AS 'max_id',
crd.b_date AS 'xdate'
FROM item_raw_data crd
WHERE crd.b_date < (SELECT ( crdx.b_date - INTERVAL 5 minute )
FROM item_raw_data crdx
ORDER BY crdx.a_unique_id DESC
LIMIT 0, 1)
ORDER BY crd.a_unique_id DESC
LIMIT 0, 1) aux
WHERE sub.b_date <= aux.xdate
AND sub.a_unique_id > ( aux.max_id - (SELECT
Count(DISTINCT c_pair) AS
max_rows
FROM item_raw_data)
)
ORDER BY sub.a_unique_id DESC) f
WHERE
a.a2 = b.b2
AND
b.b2 = c.c2
AND
c.c2 = d.d2
AND
d.d2 = e.e2
AND
e.e2 = f.f2
While this is good, lowering from 1.3s almost to 0.55s, we can now use this but we are still looking for further improvements, not only to yield a better result, but an more in-depth understanding of optimization on such big queries and MySQL. Will keep updating the query execution time as table grows.




WHEREclause, but they both appear to be non correlated, and so should not be performance killers.