Skip to content

Conversation

@queenofpigeons
Copy link
Contributor

Fix memory allocation for query texts to avoid segfault when out of memory

Alena Rybakina and others added 30 commits March 15, 2022 10:28
Fix core patch according to 23e7b38.
fix problem with test in unsupported
1. Increase stability of the pgbench test.
2. Open subsidiary AQO relations more carefully.
Change CI to drastically increase concurrency among pgbench clients
…iables or

routines.
(Includes modified core patch).
statement timeout AQO add one more timeout right before this.
If timeout is expired, AQO walks across the PlanState tree and learn
on partially executed nodes.

TODO:
1. We should somehow remember, that partial knowledge isn't real and use it
only before first successful execution.
2. We can distinguish already finished nodes and partially finished nodes. For
nodes, which really have time to finish execution we should store cardinality
"AS IS". In other situation we should use some extrapolation formula.
3. Maybe we shouldn't change instrumentation during partial walk?
4. Think about parallel workers.
on a partially executed query plan. Fix some issues.
Use aqo.learn_statement_timeout to enable this feature. On more function here
is to do cleanup on this cache and memory context.
Now it works quite stable, merge it into master branch.
…ure. Fix the bug with false finished node. Add some DEBUG messages. Just for conveniency.
It allows us to reuse ML data at different instance and learn on temporary
tables.
Andrey Kazarinov and others added 27 commits January 30, 2023 13:31
function memctx_htab_sizes outputs allocated sizes and used sizes of aqo's memory contexts and hash tables
Should rethink test principles of time-dependendent features to make it more
stable.
… the

optimizer which can vary on version of PG core.
In an extravagant situation:
(mode=disabled, forced stat gathering = 'on')
we can get into a situation when AQO is disabled
for a query, but previously cached plan contains some AQO preferences. Even so,
we should ignore the query at the end of execution.
the aqo_reset() routine: we want to clean all the AQO internal state on reset.
reviewed-by: a.rybakina
Using such a context we should remember about the risks:
* Recursion in AQO hooks can induce accidential memory context reset.
* System routines which we call from the extension, could require more long-
lived memory contexts on the outside than our.
Move GUCs, which can be changed in runtime, from global regression tests conf
to first executed test 'aqo_disabled.sql'. There we set these values by ALTER
SYSTEM/pg_reload_conf() and use them during the test.
Also, we call aqo_reset() at the start of each test.

And a bit more:
1. Avoid to show a number of records in AQO ML storage - it can
depend on optimizer settings and quite unstable (in progress).
2. Use aliases query in output to avoid unstability of naming of anonymous
columns.
…tallcheck over an instance in different modes. - run JOB benchmark [1] on a self hosted runner.

Utility scripts stores in the .github folder.

Branch name is a key to define the name of suitable PostgreSQL core branch:
use "stable[XX]" phrase in the name of git branch to trigger compiling and
launch of this commit with REL_[XX]_STABLE branch of the core.
If the branch name doesn't contain such a phrase, use master branch.

TODO:
=====

1. Add 'long' JOB test (parallel strategy disabled).
2. Add JOB test which would be executed up to full convergency of learning on
   each query.
3. Add installchecks with reusage of existed database and the AQO extension
   installed (sanity checks will be definitely broken but still).
4. Additional queries [2] can be a marker for successful learning.

[1] https://github.com/danolivo/jo-bench
[2] https://github.com/RyanMarcus/imdb_pg_dataset
Remember, each query can be executed longer than the timeout on an ancient
machines of buildfarm. So, RESET this GUC each time when it isn't really
needed for a test query.
different libraries.

To avoid such a problem in future, refactor AQO interfaces: declare all
hooks as static, reduce number of exporting functions and introduce
concept of *_init() function for a module that needs some actions in the
PG_init() routine.

Reviewed by: @Anisimov-ds
One installcheck test was added into the github actions workflow.

Reviewed by: @Anisimov-ds
…n of AQO

prediction hooks. It isn't a strict rule, but we should know about that.
It mostly caused by desire of reducing number of failures 001_pgbench.pl test
on WINDOWS OSes (it is related to speed of file descriptor allocations in
the test, where we CREATE/DROP extensions competitively by several threads.

Also, the aqo_CVE-2020-14350 test is corrected.
…lel_workers

test: EXPLAIN of Partial Aggregate sometimes showed 0 rows instead 1.
It is a race: parallel workers ran when main process have read all
underlying tuples.

Use explain without analyze to avoid such a problem. As I see, we don't lose
anything important.
@queenofpigeons queenofpigeons deleted the stable15-dsa-fix branch April 27, 2023 10:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants