Skip to content

Commit

Permalink
add some documentation for RelCommon and saved computations
Browse files Browse the repository at this point in the history
  • Loading branch information
EpsilonPrime committed Sep 25, 2024
1 parent 129f985 commit 55cba8b
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 1 deletion.
3 changes: 2 additions & 1 deletion site/docs/relations/_config
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
arrange:
- basics.md
- common_fields.md
- logical_relations.md
- physical_relations.md
- user_defined_relations.md
- embedded_relations.md
- embedded_relations.md
28 changes: 28 additions & 0 deletions site/docs/relations/common_fields.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Common Fields

Every relation contains a common section containing optional hints and emit behavior.


## Emit

A relation which has a direct emit kind outputs the relation's output without reordering or selection. A relation that specifies an emit output mapping can output its output columns in any order and may leave output columns out.

???+ note "Relation Output"

* Relations by default provide as their output the list of all of its input columns plus any generated columns as its output columns. One notable exception is aggregations which only output new columns.


## Hints

Hints provide information that can improve performance but cannot be used to control the behavior. Table statistics, runtime constraints, name hints, and saved computations all fall into this category.

???+ note "Hint Design"

* If a hint is not present or has incorrect data the consumer should be able to arrive at the correct result.


### Saved Computations

Computations can be used to save on data structure to use elsewhere. For instance, let's say we have a plan with a HashEquiJoin and an AggregateDistinct operation. The HashEquiJoin could save its hash table as part of saved computation id #1 and the AggregateDistinct could read in computation id #1.

Now let's try a more complicated example. We have a relation that has constructs two hash tables and we'd like one of them to go to our aggregate relation still but the other to go elsewhere. We can use the computation number to select which data structure goes where. For instance computation #1 could be hash table number 1 and computation #2 could be hash table number 2. The reciving entity just needs to know which of its data structures it needs to put that computation in. So if it has 5 hash table datastructures the LoadedComputation record needs to point to the number that it intends for that incoming data to go.

0 comments on commit 55cba8b

Please sign in to comment.