Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1] Add simplified AST classes and visitors #1579

Open
wants to merge 4 commits into
base: v1
Choose a base branch
from
Open

Conversation

alancai98
Copy link
Member

@alancai98 alancai98 commented Sep 13, 2024

Relevant Issues

  • None

Description

  • Adds simplified AST classes and visitors written in Java
  • Also unnests existing AST classes to 1 level of nesting maximum (e.g. will have foo.bar)
  • Subsequent PRs will
    • Add docs for new classes
    • Add equals/hashcode impls
    • Integrate w/ existing v1 code
    • Add factory functions + builders
  • Note: DDL has also been removed for now. DDL will be added back at some point before the v1 release to show we can add new features without breaking existing code.

Other Information

  • Updated Unreleased Section in CHANGELOG: [NO]

    • No -- v1 branch
  • Any backward-incompatible changes? [NO]

  • Any new external dependencies? [NO]

  • Do your changes comply with the Contributing Guidelines
    and Code Style Guidelines? [YES]

License Information

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@alancai98 alancai98 self-assigned this Sep 13, 2024
Copy link

github-actions bot commented Sep 13, 2024

CROSS-ENGINE Conformance Report ❌

BASE (LEGACY-FAA0149) TARGET (EVAL-FAA0149) +/-
% Passing 89.67% 94.20% 4.53% ✅
Passing 5287 5554 267 ✅
Failing 609 75 -534 ✅
Ignored 0 267 267 🔶
Total Tests 5896 5896 0 ✅

Testing Details

  • Base Commit: faa0149
  • Base Engine: LEGACY
  • Target Commit: faa0149
  • Target Engine: EVAL

Result Details

  • ❌ REGRESSION DETECTED. See Now Failing Tests. ❌
  • Passing in both: 5019
  • Failing in both: 74
  • PASSING in BASE but now FAILING in TARGET: 1
  • FAILING in BASE but now PASSING in TARGET: 535

Now Failing Tests ❌

The following 1 test(s) were previously PASSING in BASE but are now FAILING in TARGET:

Click here to see
  1. MYSQL_SELECT_29, compileOption: LEGACY

Now Passing Tests

535 test(s) were previously failing in BASE (LEGACY-FAA0149) but now pass in TARGET (EVAL-FAA0149). Before merging, confirm they are intended to pass.

The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact.

CROSS-COMMIT-LEGACY Conformance Report ✅

BASE (LEGACY-7A07D25) TARGET (LEGACY-FAA0149) +/-
% Passing 89.67% 89.67% 0.00% ✅
Passing 5287 5287 0 ✅
Failing 609 609 0 ✅
Ignored 0 0 0 ✅
Total Tests 5896 5896 0 ✅

Testing Details

  • Base Commit: 7a07d25
  • Base Engine: LEGACY
  • Target Commit: faa0149
  • Target Engine: LEGACY

Result Details

  • Passing in both: 5287
  • Failing in both: 609
  • PASSING in BASE but now FAILING in TARGET: 0
  • FAILING in BASE but now PASSING in TARGET: 0

CROSS-COMMIT-EVAL Conformance Report ✅

BASE (EVAL-7A07D25) TARGET (EVAL-FAA0149) +/-
% Passing 94.20% 94.20% 0.00% ✅
Passing 5554 5554 0 ✅
Failing 75 75 0 ✅
Ignored 267 267 0 ✅
Total Tests 5896 5896 0 ✅

Testing Details

  • Base Commit: 7a07d25
  • Base Engine: EVAL
  • Target Commit: faa0149
  • Target Engine: EVAL

Result Details

  • Passing in both: 5554
  • Failing in both: 75
  • PASSING in BASE but now FAILING in TARGET: 0
  • FAILING in BASE but now PASSING in TARGET: 0

Base automatically changed from v1-upgrade-1.9 to v1 September 16, 2024 23:41
@alancai98 alancai98 marked this pull request as draft September 17, 2024 23:49
@alancai98
Copy link
Member Author

Converting to a draft. From offline discussion, will look into

  • rewriting using simple Java classes
  • see the viability of using Lombok w/ the Java classes
  • unnest some AST nodes

@alancai98 alancai98 marked this pull request as ready for review September 23, 2024 23:48

R visitIdentifier(Identifier node, C ctx);

R visitIdentifierSymbol(Identifier.Symbol node, C ctx);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(self-review) I tried to get rid of excessive nesting of classes from the original sprout-generated AST code. Commit 1a4e2de (#1579) shows the change to use at most 1 nested class in the Kotlin code before translating the classes and interfaces to Java.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I've already brought this up in person, I'm adding some context and links here.

Thinking about the backwards/forwards compatibility, I think it would be a better approach to use integers rather than enums. Some Java libraries have done this, and I've also seen a similar approach from protobuf. Here's an interesting issue that discusses the downsides of using closed enums.

We could potentially have a model that looks something like this:

public interface IdentifierUnqualified {
    ...
    CaseSensitivity getCase();
}

public interface Enum {
    int getValue();
}

public interface CaseSensitivity implements Enum {
    static const int UNKNOWN = 0;
    static const int SENSITIVE = 1;
    static const int INSENSITIVE = 2;
}

OR, we could just use the integers directly without the Enum. Though, the enum does provide the possibility for better readability (toString overriding):

public interface IdentifierUnqualified {
    ...
    /**
     * Valid returns are specified as constants in: `CaseSensitivity.CASE_*`
     */
    int getCase();
}

public class CaseSensitivity {
    static const int CASE_UNKNOWN = 0;
    static const int CASE_SENSITIVE = 1;
    static const int CASE_INSENSITIVE = 2;
}

This modeling enforces that Java users use the default branch for switch statements, protecting us from backward compatibility issues. Also, the UNKNOWN case helps us in forward compatibility scenarios for deserialization. This is an approach that protobuf takes.

Our internal implementations, following the first model from above, could look like:

// Internal impl file
val id = idUnqualified("some_table", CaseSensitivity.SENSITIVE)
when (id.getCase().getValue()) {
    CaseSensitivity.SENSITIVE -> TODO()
    CaseSensitivity.INSENSITIVE -> TODO()
    else -> error("Unsupported. Or we can also convert to the UNKNOWN variant if need be.")
}

fun idUnqualified(name: String, case: Int): IdentifierUnqualified {
    return IdentifierUnqualifiedImpl(name, case)
}

/**
* TODO docs, equals, hashcode
*/
public static class Symbol extends Identifier {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<identifier chain> ::=
  <identifier> [ { <period> <identifier> }... ]

<identifier> ::=
  <actual identifier>

<actual identifier> ::=
  <regular identifier>
  | <delimited identifier>

IMO these two variants should be Identifier and IdentifierChain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL:1999 also has <local or schema qualified name>, though, it doesn't fit our use-case particularly well, since we don't have the same two-level catalog structure. Could be of use to adopt this name. Identifier and IdentifierChain.

Comment on lines +31 to +32
@NotNull
public CaseSensitivity caseSensitivity;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@NotNull
public CaseSensitivity caseSensitivity;
public boolean isDelimited();

/**
* TODO docs, equals, hashcode
*/
public class ExprVar extends Expr {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public class ExprVar extends Expr {
// or VarRef, VarReference, Ref, BindingRef, etc.
public class Reference extends Expr {

I like this modeling as it provides a distinction between AS <identifier> vs referencing fields/globals/etc. In SQL:1999, this would be a <column reference> or <table reference>, though, PartiQL is a bit nuanced, since we allow for both everywhere an expression is allowed.

Anyways, the use of reference aligns with SQL.

<select list> ::=
  <asterisk>
  | <select sublist> [ { <comma> <select sublist> }... ]

<select sublist> ::=
  <derived column>
  | ...

<derived column> ::=
  <value expression> [ <as clause> ]

<as clause> ::= [ AS ] <column name>

<nonparenthesized value expression primary> ::=
  <unsigned value specification>
  | <column reference>
  | ...

<column reference> ::=
  <basic identifier chain>
  | MODULE <period> <qualified identifier> <period> <column name>

Comment on lines +16 to +17
@NotNull
public Identifier identifier;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@NotNull
public Identifier identifier;
@NotNull
public IdentifierChain identifier;

This would more closely align with the EBNF. Essentially all areas (except for projection aliases, AS <identifier>) uses <identifier chain> or <local or schema qualified name>. That being said, the chain may only have the root.

Up to you on this. It could help our modeling since we would never have to switch on the variants. You could have two distinct classes that don't derive from the same abstract class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to follow SQL more closely.

/**
 * <element reference> ::= <array value expression> <left bracket or trigraph> <numeric value expression> <right bracket or trigraph>
 * Example: some_array_ref[2 + 2], some_ref['hello'], some_ref[CAST('' || 'a' AS STRING)]
 */
class ElementReference extends Expr {
    @NotNull
    Expr root;

    @NotNull
    Expr element;
}

/**
 * <field reference> ::= <value expression primary> <period> <field name>
 */
class FieldReference {
    @NotNull
    Expr root;

    @NotNull
    Identifier field;
}

/**
 * This is used as part of the projection list in SQL, but it directly uses <qualified asterisk>.
 * <all fields reference> ::= <value expression primary> <period> <asterisk>
 */
class AllFieldsReference {
  @NotNull
  Expr root;
}


/**
 * Below is not specified by SQL:1999.
 * <all elements reference> ::= <value expression primary> <left bracket or trigraph> <asterisk> <right bracket or trigraph>
 */
class AllElementsReference {
  @NotNull
  Expr root;
}

Anyway, you could scrap the use of ExprPath and ExprPathStep, and builders would help with code-readability.

// SELECT * FROM t WHERE t['a'].b[0] > 2
val query = QueryBuilder()
    .selectStar()
    .from("t")
    .where(ExprBuilder().id("t").elementRef("a").fieldRef("b").elementRef(0).gt(2))
    .build()

Comment on lines +24 to +25
@NotNull
public final Boolean not;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@NotNull
public final Boolean not;
public final boolean not;

Comment on lines +15 to +16
@NotNull
public IonElement value;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might actually be a good opportunity to just pass this as a string since this is just the AST. This could remove the only 3rd party dependency we have. Since it wouldn't be in the public API for the modeling, we could shadow IonReader in the evaluator to allow our customers to use whatever version of Ion they want in their own libraries.

Comment on lines +10 to +15
/**
* TODO docs, equals, hashcode
*/
public class TypeCustom extends Type {
@NotNull
public String name;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/**
* TODO docs, equals, hashcode
*/
public class TypeCustom extends Type {
@NotNull
public String name;
/**
* TODO docs, equals, hashcode
* <data type> ::=
* <predefined type>
* | <row type>
* | <user-defined type>
* | ...
* <user-defined type> ::= <user-defined type name>
* <user-defined type name> ::= <schema qualified type name>
*/
public class TypeUserDefined extends Type {
@NotNull
public IdentifierChain name;

/**
* TODO docs, equals, hashcode
*/
public class TypeFloat32 extends Type {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public class TypeFloat32 extends Type {
public class TypeReal extends Type {

/**
* TODO docs, equals, hashcode
*/
public class TypeFloat64 extends Type {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public class TypeFloat64 extends Type {
public class TypeDoublePrecision extends Type {

/**
* TODO docs, equals, hashcode
*/
public class TypeVarchar extends Type {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pointing out the VARCHAR(l) and CHARACTER VARYING(l) are represented by TypeVarchar -- but INT4, INTEGER, and INT are represented by two different types: TypeInt and TypeInt4.

/**
* TODO docs, equals, hashcode
*/
public class TypeByteString extends Type {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither SQL nor Ion has BYTE. We can remove this for now.

/**
* TODO docs, equals, hashcode
*/
public class TypeAny extends Type {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd defer to @jpschorr, but this may be a good opportunity for us to use the TypeDynamic naming since these changes are breaking anyways.

@NotNull
public Expr from;

@NotNull
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@NotNull
@Nullable

public Expr value;

@NotNull
public Expr nullifier;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to update this, but do you know the history of the name nullifier? I've always been confused by it:

NULLIF (V1 , V2 ) is equivalent to the following :
CASE WHEN V1=V2 THEN NULL ELSE V1 END

I would personally just use lhs and rhs. I might just be missing something.

Comment on lines +14 to +16
public class ExprLit extends Expr {
@NotNull
public PartiQLValue value;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've discussed this earlier, but would you like to create the literal AST representations as part of this PR? AKA:

public class LiteralNumericExact extends Expr { ... }
public class LiteralNumericExactPeriod extends Expr { ... }
public class LiteralNumericApprox extends Expr { ... }

For:

<unsigned numeric literal> ::=
  <exact numeric literal>
  | <approximate numeric literal>;

<exact numeric literal> ::=
  <unsigned integer> [ <period> [ <unsigned integer> ] ]
  | <period> <unsigned integer>;

<approximate numeric literal> ::= <mantissa> E <exponent>

Comment on lines +14 to +16
* TODO docs, equals, hashcode
*/
public class ExprIsType extends Expr {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this?

/**
* TODO docs, equals, hashcode
*/
public class ExprDateDiff extends Expr {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really hope we don't need to add this to the AST. Same goes for DATE_ADD. I wonder if there's a way we can rewrite this to be +/- for datetimes and intervals.

/**
* TODO docs, equals, hashcode
*/
public class ExprCollection extends Expr {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that we don't have to switch on the type, we could just create a BagConstructor and ArrayConstructor?

Copy link
Member

@johnedquinn johnedquinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this! I've gone through most of the files, though several remain. I can re-review once some modifications have been made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants