-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(hogql): automatic person table joins #14286
Conversation
…into hogql-symbol-resolution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
12 files now, 12 files on Monday 🤣
def clickhouse_table(self): | ||
# This is a bit of a hack to make sure person.properties.x works | ||
return "events" | ||
def join_with_max_person_distinct_id_table(from_table: str, to_table: str, requested_fields: List[str]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's maybe too much refactoring for now but this method and join_with_persons_table
are almost identical and feel like they'd change together... but they're also almost next to each other in a single file so 🤷♀️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will change even more when I add support for querying "is person in cohort", which needs a similar join. So I'd punt on refactoring this file much further for now. The repetitive patterns and absstractions will emerge as soon as we add a few more joins.
posthog/hogql/printer.py
Outdated
elif isinstance(symbol.table, ast.SelectQuerySymbol) or isinstance(symbol.table, ast.SelectQueryAliasSymbol): | ||
field_sql = self._print_identifier(symbol.name) | ||
if isinstance(symbol.table, ast.SelectQueryAliasSymbol) or symbol_with_name_in_scope != symbol: | ||
field_sql = f"{self.visit(symbol.table)}.{field_sql}" | ||
|
||
# :KLUDGE: Legacy person properties handling. Only used within non-HogQL queries, such as insights. | ||
if self.context.legacy_person_property_handling and field_sql == "events__pdi__person.properties": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something like within_non_hogql_query
is shorter than legacy_person_property_handling
and carries more information. Then the comment can be deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a much better name 👍
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks quite right to me
I don't have comments though, as I haven't really worked with this code and so at the increased level of complexity I might be missing something |
Changes
LazyTable
database type and a correspondingLazyTableSymbol
symbol.events.pdi
andevents.pdi.person
joins using lazy tables, including materialised columns where applicable.VirtualTable
database type and a correspondingVirtualTableSymbol
symbol.poe
as a virtual table with a structure resembling the persons tableFieldTraverser
class and symbol. This lets us override that "events.person" == "events.pdi.person".Here's a table where I'm comparing the "email" property from PoE and from PDI.
It does not yet implement a mechanism for swapping between PoE on and off as the default behaviour for
events.person
, but locks it to beevents.pdi.person
(PoE off) because that's where we are. I'd like implement a proper swap in another PR, as that requires creating a new virtual database table for each team, and other fun.Also out of scope: we will currently copy the entire
properties
field between shards. It's easy to fix (fields are already lazily added after all), but I'd like to keep this PR from totally exploding.How did you test this code?
Added tests