-1

I have a parsed representation of an SQL query in the form of an abstract syntax tree. I'd like to convert it into an intermediate logical tree representation, which can in turn be expanded into candidate physical operation trees by the optimizer.

Following various resources discussing the topic (CMU lecture, Database Management System, Dayal's 1987 paper on nested subqueries), I managed to translate most of the query into an intermediate algebraic form (e.g. π_a(R⋈S)) but I get stuck at translating common table expressions and lateral joins.

  • I can't find a way to include CTEs in the algebraic tree without inlining all references to them. While running down the algebraic tree, the optimizer would come across the same expanded CTE, which seems inefficient.
  • By definition, subqueries in lateral joins can reference columns provided by preceding FROM items. When using a relational algebra tree representation, FROM items preceding the subquery are parent nodes, but join nodes can only reference columns output by child nodes. For instance:
SELECT *
FROM
  (SELECT 1 id) A,
  (
    (SELECT 2) B
    JOIN LATERAL (SELECT A.id) C ON TRUE
  ) D;
    NAT. JOIN
        ├── A
        └── LATERAL JOIN (D)
            ├── B
            └── C <- should be able to reference A

Can these constraints be expressed in terms of algebraic operations or do they typically need bespoke data structures? For example, separate subtrees for CTEs instead of a single tree for the whole logical plan - but then, how can the optimizer optimize the query globally instead of each subtree individually?

More generally, are there better intermediate representations than algebraic trees for the purpose of query optimization?

7
  • Please ask 1 specific researched non-duplicate question. How to Ask Help center This more or less asks for chapters of a textbook on introduction to formal semantics & a bespoke tutorial, a lot. How are you 1st stuck answering? PS A relational algebra is some values & operators; so nested calls form a functional querying programming language, but an algebra has nothing to say about how an object or meta language it is used in deals with definition expressions like a WITH. Your goal doesn't add anything to the question you ask, so it doesn't belong, unless you are trying to ask a different question. Commented Feb 22, 2024 at 1:46
  • Googling/researching 'sql formal semantics' will give you decades of papers formally describing SQL. PS SQL tables hold bags not sets of rows, so only restricted use of it is relational. So a relational algebra is not very helpful although it is relevant. PS There are many relational algebras. They differ in operators & even what a relation is. What are you using? (Beware, algebra-inspired/embedding query languages are often called, but are not, algebras.) Again, if you're not mentioning one the question is overly abstract. Nb Codd's papers' algebras are ill-defined. Commented Feb 22, 2024 at 1:56
  • Re relational querying. Re SQL querying. Querying with NULL though not duplicates. PS What is the difference between a LATERAL JOIN and a subquery in PostgreSQL? How to do the equivalent of a lateral join in MySQL 5.7? PS LATERAL JOIN is documented, how are you stuck describing it with as much algebra and/or calculus as you can? Commented Feb 22, 2024 at 2:24
  • @philipxy thanks for taking the time. I already have a parsed representation of an SQL query. I am now looking for a way to express that parsed query into an intermediate representation made of logical operations. The goal of this intermediate representation is to later be expanded into a search space by the optimizer to select a good physical plan. Following Database Management System's and Dayal's 1987 paper, I managed to translate most of the query, expect forCTEs an Commented Feb 22, 2024 at 2:28
  • Please clarify etc via edits, not comments. (But: It is not clear what exactly your comment has to do with whatever question you are trying to ask.) Please read the links in my comments. Please act on the feedback. PS Again: CTEs have nothing to do with relational algebra per se. Again: LATERAL JOIN has a definition; "how are you stuck". Commented Feb 22, 2024 at 2:31

1 Answer 1

1

While algebraic trees are quite common as intermediate representations, they may not capture all the complexities of modern SQL queries. Using more flexible representations such as DAGs, augmenting trees with annotations, and maintaining logical/physical separation can help address the challenges posed e.g. by CTEs and lateral joins.

For instance:

  • with Directed Acyclic Graphs: each node in the DAG represents an operation, and edges represent data flow. This allows for more complex relationships between operations, such as lateral joins referencing columns from preceding FROM items.
  • Relational Operator Trees with Annotations: Augmenting the relational operator tree with annotations or metadata can help capture additional information needed for optimization, such as context for lateral joins, bindings, or the presence of CTEs.

At to inlining CTEs vs. using subtrees: inlining CTEs into the algebraic tree can lead to redundancy and inefficiency, as you've rightly pointed out. However, in some cases, inlining might be the simplest approach for optimization. For more complex queries or for optimization across multiple queries, maintaining a separate representation for CTEs might be beneficial. This could involve storing the CTEs separately and referencing them as needed in the main query plan.

For further reading on the topic and in addition to the sources you already mentioned, I would recommend "System R: Relational Approach to Database Management" by Chamberlin, D.D., et al. which discusses the System R project and its internals, "Volcano: An Extensible and Parallel Query Evaluation System" by Graefe, G., as well as Postgres' parser and optimiser implementations.

Sign up to request clarification or add additional context in comments.

1 Comment

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.