Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structural ADTs: first step #1770

Merged
merged 21 commits into from
Jan 30, 2024
Merged

Structural ADTs: first step #1770

merged 21 commits into from
Jan 30, 2024

Conversation

yannham
Copy link
Member

@yannham yannham commented Jan 22, 2024

MVP for adding structural ADTs to Nickel (see #1448), see what works and what breaks, and iterate from there.

Content

This PR adds some preliminary elements of structural algebraic data types, although some functionalities are still missing. The goal is to produce a reviewable chunk of work, that doesn't impact the normal usage of the language, and doesn't commit permanently when bikshedding needs to be involved. More specifically, this PR:

  • Adds a new EnumVariant(Ident, RichTerm) node to the AST, which represent an applied enum tag (ie an ADT with one argument). Extend the existing code to support it/fix compiler errors.
  • Adds a temporary, intentionally ugly (but easy to add and non-ambiguous) syntax for experimentation: 'Foo...(arg) (yes, it's a literal ellipsis, not the expression of some kind of repetition)
  • Extend enum row types to take an additional optional argument. Current enum tags correspond to a row EnumRow { id, typ: None}, and an applied tag corresponds to EnumRow {id, typ: Some(ty_arg)
  • Add the most likely final (because non ambiguous and uncontroversial) syntax for those new rows, which is just a tag applied to another type. For example, now [| 'Foo, 'Bar Number, 'Baz {foo : String}, 'Motto (Dyn -> Dyn) |] is a valid type.
  • Extend the typechecker to handle those new rows and the new AST node

I've also expanded/reworded some of the existing in-code documentation: going through our own comments a few month later is a good exercise to see what is useful, and what is missing, when the technical aspects aren't so fresh anymore 🙂

The bulk of this PR, beside adding new definitions and fixing compiler errors, is to extend row types and adapt the typechecker to handle the corresponding new cases. In particular, a lot of cases were previously simpler for enum rows (as compared to record rows) because those couldn't contain any other type inside, beside an enum row unification variable in tail position. Now, as with record rows, enum rows might contain normal types inside, which thus must be walked/folded over/variable-level updated, etc. It actually makes the handling of

Missing features

The following important aspects are missing for the moment, and will be covered in follow-up PRs:

  • extracting data/matching an ADT. There's currently no way to destructure or unwrap an enum variant (pedantically, there's no elimination form). In general, I think this might be time to allow destructuring patterns in match expressions - put differently, allow the same patterns for destructuring and match expressions. If the latter proves to be too big and to interfere with getting ADTs down the finish line, we can add ad-hoc pattern matching to match expression just for ADTs, but I suspect that wouldn't be much easier than to go for the general approach directly.
  • add code to handle enum variants in the implementation of ==: currently 'Foo..(1) == 'Foo(1) is false.
  • use the normal application syntax instead of the temporary ..( ). Distinguishing between zero-ary enum tags and 1-ary enum tags would be done at parsing time alone.
  • add relevant user documentation

Maintainability shortcomings

In order to keep the diff reasonable, some visible shortcomings aren't addressed right now but should be explored nonetheless:

  • after implementing variable levels Introduce unification variable levels and fix unsound generalization #1372, on of the follow-up was:

    look for code deduplication opportunities. It seems at least some type definitions and generic functions could be shared between record rows variables and enum rows variables.
    This seems even truer after this PR, which makes enum rows and record rows closer in behavior. A lot of the implementation of missing functions for enum rows felt like copy pasting the one for record rows and changing a few names (the biggest difference is the Option in enum rows).

  • related to the previous point (but still a thing on its own), I wonder if the case of an unapplied enum tag could be better represented morally as an argument of type Unit. Ie, [| 'Foo |] would be surface syntax for [| 'Foo Unit |] and 'Foo alone would be surface syntax for 'Foo (). The type Unit doesn't exist currently, but could be added even only internally, as a special value, if we don't want to create confusion around two different ways of writing the same thing ('Foo == 'Foo ()). This might make some of the special casing because of the optional type argument of enum rows go away, and we could use the exact same definition for record rows and enum rows.
  • In a similar spirit, should we have only one ast node for enums, whether they're applied to something or not? The proposal just above would say yes, with the argument being () : Unit for unapplied enum tags. Another solution, if we don't go the unit route, is to have one AST node with an optional argument. It's hard to say currently if that would be a win or not.
  • some type errors around row constraints are fishy. It seems their parameters aren't used consistently across different parts of the typechecker; however this is unrelated to this PR but was just noticed during this work.

@github-actions github-actions bot temporarily deployed to pull request January 22, 2024 19:23 Inactive
@github-actions github-actions bot temporarily deployed to pull request January 23, 2024 18:30 Inactive
@github-actions github-actions bot temporarily deployed to pull request January 24, 2024 14:31 Inactive
@github-actions github-actions bot temporarily deployed to pull request January 24, 2024 19:23 Inactive
This commit is a very first step toward adding structural Algebraic Data
Types to Nickel. This commit adds a new node in the AST for an enum tag
with an argument (a variant), and fill the blanks/compiler errors from
there.

A temporary syntax, choosen mostly to be obscure, ugly, easy to add and
to not conflict with existing syntax has been added for further testing
and experimentation: 'SomeTag..(argument).
This commit extend enum row types to accept a variant, that is to handle
types such as `[| 'Foo Number, 'Bar {foo : String} |]`, which are
basically structural ADTs.

The bulk of the work consist to extend most function operating on enum
rows to now take into account the fact that enum rows can have types
inside, much as record rows. That is, the code handling enum rows and
record rows is much more similar than before; the main difference being
that for now, enum rows have an optional type argument (the variant,
which can be there or not), while a record row always have a type
assignment.

At this point, no elimination forms for ADTs have been introduced yet
(that is, you can't match on/unwrap ADTs). This part is intentionally on
the side for the time being.
@github-actions github-actions bot temporarily deployed to pull request January 25, 2024 12:15 Inactive
Add missing surface syntax to write enum rows with a variant (an
argument). Fix some pretty printing shortcomings as well.
Extend various typechecker internal error types with new cases for
augmented (with variants) enum rows
@github-actions github-actions bot temporarily deployed to pull request January 25, 2024 12:38 Inactive
@github-actions github-actions bot temporarily deployed to pull request January 25, 2024 14:21 Inactive
@yannham yannham marked this pull request as ready for review January 25, 2024 16:05
core/src/typecheck/eq.rs Outdated Show resolved Hide resolved
core/src/typecheck/mod.rs Outdated Show resolved Hide resolved
core/src/typecheck/unif.rs Outdated Show resolved Hide resolved
@github-actions github-actions bot temporarily deployed to pull request January 25, 2024 16:30 Inactive
@github-actions github-actions bot temporarily deployed to pull request January 25, 2024 17:44 Inactive
@yannham yannham changed the title [WIP] Structural ADTs experiment Structural ADTs: first step Jan 25, 2024
@github-actions github-actions bot temporarily deployed to pull request January 25, 2024 19:35 Inactive
Copy link
Member

@jneem jneem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting! I'd vote for a single ast node with a unit type, especially if that would make it easier to deduplicate the rrows/erows stuff.

core/src/parser/grammar.lalrpop Outdated Show resolved Hide resolved
core/src/parser/grammar.lalrpop Show resolved Hide resolved
core/src/term/mod.rs Show resolved Hide resolved
core/src/typ.rs Outdated Show resolved Hide resolved
@github-actions github-actions bot temporarily deployed to pull request January 29, 2024 09:17 Inactive
core/src/typecheck/unif.rs Outdated Show resolved Hide resolved
core/src/typecheck/unif.rs Show resolved Hide resolved
core/src/typ.rs Show resolved Hide resolved
@github-actions github-actions bot temporarily deployed to pull request January 30, 2024 09:41 Inactive
@yannham yannham added this pull request to the merge queue Jan 30, 2024
Merged via the queue into master with commit bbc56d0 Jan 30, 2024
5 checks passed
@yannham yannham deleted the feat/adt-first-step branch January 30, 2024 16:14
This was referenced Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants