Stable15 dsa fix #163

queenofpigeons · 2023-04-27T10:30:49Z

Fix memory allocation for query texts to avoid segfault when out of memory

…stable15.

Fix core patch according to 23e7b38. fix problem with test in unsupported

…aded

translate relids in this subtree.

…alized.

…exists.

1. Increase stability of the pgbench test. 2. Open subsidiary AQO relations more carefully.

…, clients and threads from the environment.

Change CI to drastically increase concurrency among pgbench clients

'DROP EXTENSION aqo'.

…4-bit hash

…iables or routines. (Includes modified core patch).

statement timeout AQO add one more timeout right before this. If timeout is expired, AQO walks across the PlanState tree and learn on partially executed nodes. TODO: 1. We should somehow remember, that partial knowledge isn't real and use it only before first successful execution. 2. We can distinguish already finished nodes and partially finished nodes. For nodes, which really have time to finish execution we should store cardinality "AS IS". In other situation we should use some extrapolation formula. 3. Maybe we shouldn't change instrumentation during partial walk? 4. Think about parallel workers.

on a partially executed query plan. Fix some issues.

Use aqo.learn_statement_timeout to enable this feature. On more function here is to do cleanup on this cache and memory context.

Now it works quite stable, merge it into master branch.

…ure. Fix the bug with false finished node. Add some DEBUG messages. Just for conveniency.

It allows us to reuse ML data at different instance and learn on temporary tables.

function memctx_htab_sizes outputs allocated sizes and used sizes of aqo's memory contexts and hash tables

Should rethink test principles of time-dependendent features to make it more stable.

the test.

added case.

… the optimizer which can vary on version of PG core.

In an extravagant situation: (mode=disabled, forced stat gathering = 'on') we can get into a situation when AQO is disabled for a query, but previously cached plan contains some AQO preferences. Even so, we should ignore the query at the end of execution.

the aqo_reset() routine: we want to clean all the AQO internal state on reset.

reviewed-by: a.rybakina

Using such a context we should remember about the risks: * Recursion in AQO hooks can induce accidential memory context reset. * System routines which we call from the extension, could require more long- lived memory contexts on the outside than our.

Move GUCs, which can be changed in runtime, from global regression tests conf to first executed test 'aqo_disabled.sql'. There we set these values by ALTER SYSTEM/pg_reload_conf() and use them during the test. Also, we call aqo_reset() at the start of each test. And a bit more: 1. Avoid to show a number of records in AQO ML storage - it can depend on optimizer settings and quite unstable (in progress). 2. Use aliases query in output to avoid unstability of naming of anonymous columns.

…tallcheck over an instance in different modes. - run JOB benchmark [1] on a self hosted runner. Utility scripts stores in the .github folder. Branch name is a key to define the name of suitable PostgreSQL core branch: use "stable[XX]" phrase in the name of git branch to trigger compiling and launch of this commit with REL_[XX]_STABLE branch of the core. If the branch name doesn't contain such a phrase, use master branch. TODO: ===== 1. Add 'long' JOB test (parallel strategy disabled). 2. Add JOB test which would be executed up to full convergency of learning on each query. 3. Add installchecks with reusage of existed database and the AQO extension installed (sanity checks will be definitely broken but still). 4. Additional queries [2] can be a marker for successful learning. [1] https://github.com/danolivo/jo-bench [2] https://github.com/RyanMarcus/imdb_pg_dataset

Remember, each query can be executed longer than the timeout on an ancient machines of buildfarm. So, RESET this GUC each time when it isn't really needed for a test query.

@Anisimov-ds

different libraries. To avoid such a problem in future, refactor AQO interfaces: declare all hooks as static, reduce number of exporting functions and introduce concept of *_init() function for a module that needs some actions in the PG_init() routine. Reviewed by: @Anisimov-ds

@Anisimov-ds

One installcheck test was added into the github actions workflow. Reviewed by: @Anisimov-ds

…n of AQO prediction hooks. It isn't a strict rule, but we should know about that.

It mostly caused by desire of reducing number of failures 001_pgbench.pl test on WINDOWS OSes (it is related to speed of file descriptor allocations in the test, where we CREATE/DROP extensions competitively by several threads. Also, the aqo_CVE-2020-14350 test is corrected.

…n Windows

…lel_workers test: EXPLAIN of Partial Aggregate sometimes showed 0 rows instead 1. It is a race: parallel workers ran when main process have read all underlying tuples. Use explain without analyze to avoid such a problem. As I see, we don't lose anything important.

@Alena0704

Reviewed by: @Alena0704

@Alena0704

Reviewed by: @Alena0704

…mory

Alena Rybakina and others added 30 commits March 15, 2022 10:28

Edit documentation for installing aqo extension

548faf8

Correct automatic CI-test in aqo master version

c3c09d7

Move master branch on the stable15 branch. Also adjusted CI test for …

492c082

…stable15.

Remove duplicating definition of prev_create_plan_hook in aqo.c

a0c9c05

Start of massive cherry-pick from stable13.

e03b311

Fix core patch according to 23e7b38. fix problem with test in unsupported

PGPRO-6403: fix conf.add so PostgreSQL installchecks pass with aqo lo…

0237b19

…aded

Clear AQO_cache_mem_ctx memory context.

68902ac

Remove an ignored node detection feature.

67acb9b

Bugfix. Recursing into subquery we must use subroot instead of root to

43bf4e5

translate relids in this subtree.

Fix print_node_explain. Avoid situation where an AQO node isn't initi…

26e74af

…alized.

Bugfix. Do not try to open an AQO heap relation if an index does not …

1491169

…exists.

Bugfixes:

dddd851

1. Increase stability of the pgbench test. 2. Open subsidiary AQO relations more carefully.

Parameterize 001_pgbench.pl: allow to define a number of transactions…

ae1e19e

…, clients and threads from the environment.

Update c-cpp.yml

d66ee77

Change CI to drastically increase concurrency among pgbench clients

Bugfix. close heap relation in the case of races between backend and

d92d5ee

'DROP EXTENSION aqo'.

Bugfix. Fix omissions related to shifting from 32-bit query hash to 6…

78208e7

…4-bit hash

Bugfix: we can't use C++ reserved words as identifiers for shared var…

64536b6

…iables or routines. (Includes modified core patch).

Bugfix. Normalize cardinality error.

967011c

Resolve a problem with gathering of instrumentation data

844e100

on a partially executed query plan. Fix some issues.

An iteration of the code improvement.

7010e42

Hide the AQO Statement Timeout feature under a GUC.

5220b8a

Use aqo.learn_statement_timeout to enable this feature. On more function here is to do cleanup on this cache and memory context.

Distinguish finished and running plan nodes.

d27d079

Add reliability factor (rfactor) into interface of learning procedures.

58182d0

Introduce AQO v.1.4. Add reliability field into the aqo_data table.

3b79fa0

Add reliability into the ML model.

b556d96

Add basic code for support of DSM cache.

9ad3490

Cumulative commit on the 'learn on statement timeout' feature.

b5a56c3

Now it works quite stable, merge it into master branch.

Add tests for the 'Learn after an query interruption by timeout' feat…

f2dc710

…ure. Fix the bug with false finished node. Add some DEBUG messages. Just for conveniency.

Move AQO from a relid based approach to a relation name based approach.

f71b87c

It allows us to reuse ML data at different instance and learn on temporary tables.

Andrey Kazarinov and others added 27 commits January 30, 2023 13:31

[PGPRO-7366] add function which shows memory usage

6cf69af

function memctx_htab_sizes outputs allocated sizes and used sizes of aqo's memory contexts and hash tables

Collect some artifacts of CI tests - initial commit

490954a

Remove regression tests on smart statement timeout.

c17a948

Should rethink test principles of time-dependendent features to make it more stable.

Increase stability of the look_a_like test: clear learning data before

830aa98

the test.

Bugfix. Initialization of kNN data structure was omitted in one newly

4db4d4d

added case.

Rewrite update_functions.sql to avoid dependency on internal logic of…

c7f1857

… the optimizer which can vary on version of PG core.

Arrange extension with subtle changes in the optimizer

c3567f7

Improvement. Clean a list of deactivated queries during the call of

7f54694

the aqo_reset() routine: we want to clean all the AQO internal state on reset.

Generalize basic CI script

3961540

reviewed-by: a.rybakina

Improvement of time-dependent test statement_timeout.

96616fd

Remember, each query can be executed longer than the timeout on an ancient machines of buildfarm. So, RESET this GUC each time when it isn't really needed for a test query.

Improve basic CI and installcheck CI code.

bf6ad8e

CI Refactoring: Unify code of all three CI workflows

107a016

Bugfix. Switch off quickly all AQO features if queryId is disabled.

5437c07

One installcheck test was added into the github actions workflow. Reviewed by: @Anisimov-ds

Enhancement. Report if someone external inserted a hook into the chai…

ee4ffc5

…n of AQO prediction hooks. It isn't a strict rule, but we should know about that.

Fix. Conventionally use of hooks.

271b0da

Skip 'DROP EXTENSION' test in 001_pgbench.pl because of unstability o…

3770160

…n Windows

Bugfix. Correctly use of a routine for joins counting.

d2f713a

Add the routine for safe update.

6c85f0a

Reviewed by: @Alena0704

Add small bugfixes and refactoring.

44f8cb6

Reviewed by: @Alena0704

Fix dsa_allocate for aqo_qtext_store to avoid segfault when out of me…

103138c

…mory

queenofpigeons requested a review from Alena0704 April 27, 2023 10:30

queenofpigeons closed this Apr 27, 2023

queenofpigeons deleted the stable15-dsa-fix branch April 27, 2023 10:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stable15 dsa fix #163

Stable15 dsa fix #163

Uh oh!

queenofpigeons commented Apr 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Stable15 dsa fix #163

Stable15 dsa fix #163

Uh oh!

Conversation

queenofpigeons commented Apr 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants