lang: Add modern type unification implementation

This adds a modern type unification algorithm, which drastically improves performance, particularly for bigger programs. This required a change to the AST to add TypeCheck methods (for Stmt) and Infer/Check methods (for Expr). This also changed how the functions express their invariants, and as a result this was changed as well. This greatly improves the way we express these invariants, and as a result it makes adding new polymorphic functions significantly easier. This also makes error output for the user a lot better in pretty much all scenarios. The one downside of this patch is that a good chunk of it is merged in this giant single commit since it was hard to do it step-wise. That's not the end of the world. This couldn't be done without the guidance of Sam who helped me in explaining, debugging, and writing all the sneaky algorithmic parts and much more. Thanks again Sam! Co-authored-by: Samuel Gélineau <[email protected]>
purpleidea · Jul 1, 2024 · d06f6a0 · d06f6a0
1 parent 4e18c9c
commit d06f6a0
Show file tree

Hide file tree

Showing 97 changed files with 3,509 additions and 11,005 deletions.
diff --git a/docs/function-guide.md b/docs/function-guide.md
@@ -41,7 +41,7 @@ To implement a function, you'll need to create a file that imports the
 [`lang/funcs/simple/`](https://github.com/purpleidea/mgmt/tree/master/lang/funcs/simple/)
 module. It should probably get created in the correct directory inside of:
 [`lang/core/`](https://github.com/purpleidea/mgmt/tree/master/lang/core/). The
-function should be implemented as a `FuncValue` in our type system. It is then
+function should be implemented as a `simple.Scaffold` in our API. It is then
 registered with the engine during `init()`. An example explains it best:
 
 ### Example
@@ -50,6 +50,7 @@ registered with the engine during `init()`. An example explains it best:
 package simple
 
 import (
+	"context"
 	"fmt"
 
 	"github.com/purpleidea/mgmt/lang/funcs/simple"
@@ -59,9 +60,10 @@ import (
 // you must register your functions in init when the program starts up
 func init() {
 	// Example function that squares an int and prints out answer as an str.
-	simple.ModuleRegister(ModuleName, "talkingsquare", &types.FuncValue{
+
+	simple.ModuleRegister(ModuleName, "talkingsquare", &simple.Scaffold{
 		T: types.NewType("func(int) str"), // declare the signature
-		V: func(input []types.Value) (types.Value, error) {
+		F: func(ctx context.Context, input []types.Value) (types.Value, error) {
 			i := input[0].Int() // get first arg as an int64
 			// must return the above specified value
 			return &types.StrValue{
@@ -87,109 +89,41 @@ mgmt engine to shutdown. It should be seen as the equivalent to calling a
 Ideally, your functions should never need to error. You should never cause a
 real `panic()`, since this could have negative consequences to the system.
 
-## Simple Polymorphic Function API
-
-Most functions should be implemented using the simple function API. If they need
-to have multiple polymorphic forms under the same name, then you can use this
-API. This is useful for situations when it would be unhelpful to name the
-functions differently, or when the number of possible signatures for the
-function would be infinite.
-
-The canonical example of this is the `len` function which returns the number of
-elements in either a `list` or a `map`. Since lists and maps are two different
-types, you can see that polymorphism is more convenient than requiring a
-`listlen` and `maplen` function. Nevertheless, it is also required because a
-`list of int` is a different type than a `list of str`, which is a different
-type than a `list of list of str` and so on. As you can see the number of
-possible input types for such a `len` function is infinite.
-
-Another downside to implementing your functions with this API is that they will
-*not* be made available for use inside templates. This is a limitation of the
-`golang` template library. In the future if this limitation proves to be
-significantly annoying, we might consider writing our own template library.
-
-As with the simple, non-polymorphic API, you can only implement [pure](https://en.wikipedia.org/wiki/Pure_function)
-functions, without writing too much boilerplate code. They will be automatically
-re-evaluated as needed when their input values change.
-
-To implement a function, you'll need to create a file that imports the
-[`lang/funcs/simplepoly/`](https://github.com/purpleidea/mgmt/tree/master/lang/funcs/simplepoly/)
-module. It should probably get created in the correct directory inside of:
-[`lang/core/`](https://github.com/purpleidea/mgmt/tree/master/lang/core/). The
-function should be implemented as a list of `FuncValue`'s in our type system. It
-is then registered with the engine during `init()`. You may also use the
-`variant` type in your type definitions. This special type will never be seen
-inside a running program, and will get converted to a concrete type if a
-suitable match to this signature can be found. Be warned that signatures which
-contain too many variants, or which are very general, might be hard for the
-compiler to match, and ambiguous type graphs make for user compiler errors. The
-top-level type must still be a function type, it may only contain variants as
-part of its signature. It is probably more difficult to unify a function if its
-return type is a variant, as opposed to if one of its args was.
-
-An example explains it best:
-
 ### Example
 
 ```golang
+package simple
+
 import (
+	"context"
 	"fmt"
 
-	"github.com/purpleidea/mgmt/lang/funcs/simplepoly"
+	"github.com/purpleidea/mgmt/lang/funcs/simple"
 	"github.com/purpleidea/mgmt/lang/types"
 )
 
 func init() {
-	// You may use the simplepoly.ModuleRegister method to register your
-	// function if it's in a module, as seen in the simple function example.
-	simplepoly.Register("len", []*types.FuncValue{
-		{
-			T: types.NewType("func([]variant) int"),
-			V: Len,
-		},
-		{
-			T: types.NewType("func({variant: variant}) int"),
-			V: Len,
-		},
+	// This is the actual definition of the `len` function.
+	simple.Register("len", &simple.Scaffold{
+		T: types.NewType("func(?1) int"), // contains a unification var
+		C: simple.TypeMatch([]string{     // match on any of these sigs
+			"func(str) int",
+			"func([]?1) int",
+			"func(map{?1: ?2}) int",
+		}),
+		// The implementation is left as an exercise for the reader.
+		F: Len,
 	})
 }
-
-// Len returns the number of elements in a list or the number of key pairs in a
-// map. It can operate on either of these types.
-func Len(input []types.Value) (types.Value, error) {
-	var length int
-	switch k := input[0].Type().Kind; k {
-	case types.KindList:
-		length = len(input[0].List())
-	case types.KindMap:
-		length = len(input[0].Map())
-
-	default:
-		return nil, fmt.Errorf("unsupported kind: %+v", k)
-	}
-
-	return &types.IntValue{
-		V: int64(length),
-	}, nil
-}
 ```
 
-This simple polymorphic function can accept an infinite number of signatures, of
-which there are two basic forms. Both forms return an `int` as is seen above.
-The first form takes a `[]variant` which means a `list` of `variant`'s, which
-means that it can be a list of any type, since `variant` itself is not a
-concrete type. The second form accepts a `{variant: variant}`, which means that
-it accepts any form of `map` as input.
-
-The implementation for both of these forms is the same: it is handled by the
-same `Len` function which is clever enough to be able to deal with any of the
-type signatures possible from those two patterns.
+## Simple Polymorphic Function API
 
-At compile time, if your `mcl` code type checks correctly, a concrete type will
-be known for each and every usage of the `len` function, and specific values
-will be passed in for this code to compute the length of. As usual, make sure to
-only write safe code that will not panic! A panic is a bug. If you really cannot
-continue, then you must return an error.
+Most functions should be implemented using the simple function API. If they need
+to have multiple polymorphic forms under the same name, with each resultant type
+match needing to be paired to a different implementation, then you can use this
+API. This is useful for situations when the functions differ in output type
+only.
 
 ## Function API
 
@@ -358,23 +292,6 @@ We don't expect this functionality to be particularly useful or common, as it's
 probably easier and preferable to simply import common golang library code into
 multiple different functions instead.
 
-## Polymorphic Function API
-
-The polymorphic function API is an API that lets you implement functions which
-do not necessarily have a single static function signature. After compile time,
-all functions must have a static function signature. We also know that there
-might be different ways you would want to call `printf`, such as:
-`printf("the %s is %d", "answer", 42)` or `printf("3 * 2 = %d", 3 * 2)`. Since
-you couldn't implement the infinite number of possible signatures, this API lets
-you write code which can be coerced into different forms. This makes
-implementing what would appear to be generic or polymorphic, instead of
-something that is actually static and that still has the static type safety
-properties that were guaranteed by the mgmt language.
-
-Since this is an advanced topic, it is not described in full at this time. For
-more information please have a look at the source code comments, some of the
-existing implementations, and ask around in the community.
-
 ## Frequently asked questions
 
 (Send your questions as a patch to this FAQ! I'll review it, merge it, and

diff --git a/docs/language-guide.md b/docs/language-guide.md
@@ -639,23 +639,27 @@ so that each `Expr` node in the AST knows what to expect. Type annotation is
 allowed in situations when you want to explicitly specify a type, or when the
 compiler cannot deduce it, however, most of it can usually be inferred.
 
-For type inferrence to work, each node in the AST implements a `Unify` method
-which is able to return a list of invariants that must hold true. This starts at
-the top most AST node, and gets called through to it's children to assemble a
-giant list of invariants. The invariants can take different forms. They can
-specify that a particular expression must have a particular type, or they can
-specify that two expressions must have the same types. More complex invariants
-allow you to specify relationships between different types and expressions.
-Furthermore, invariants can allow you to specify that only one invariant out of
-a set must hold true.
+For type inference to work, each `Stmt` node in the AST implements a `TypeCheck`
+method which is able to return a list of invariants that must hold true. This
+starts at the top most AST node, and gets called through to it's children to
+assemble a giant list of invariants. The invariants all have the same form. They
+specify that a particular expression corresponds to two particular types which
+may both contain unification variables.
+
+Each `Expr` node in the AST implements an `Infer` and `Check` method. The
+`Infer` method returns the type of that node along with a list of invariants as
+described above. Unification variables can of course be used throughout. The
+`Check` method always uses a generic check implementation and generally doesn't
+need to be implemented by the user.
 
 Once the list of invariants has been collected, they are run through an
 invariant solver. The solver can return either return successfully or with an
-error. If the solver returns successfully, it means that it has found a trivial
+error. If the solver returns successfully, it means that it has found a single
 mapping between every expression and it's corresponding type. At this point it
 is a simple task to run `SetType` on every expression so that the types are
-known. If the solver returns in error, it is usually due to one of two
-possibilities:
+known. During this stage, each SetType method verifies that it's a compatible
+type that it can use. If either that method or if the solver returns in error,
+it is usually due to one of two possibilities:
 
 1. Ambiguity
 
@@ -675,8 +679,8 @@ possibilities:
 	always happens if the user has made a type error in their program.
 
 Only one solver currently exists, but it is possible to easily plug in an
-alternate implementation if someone more skilled in the art of solver design
-would like to propose a more logical or performant variant.
+alternate implementation if someone wants to experiment with the art of solver
+design and would like to propose a more logical or performant variant.
 
 #### Function graph generation
 
@@ -717,8 +721,9 @@ If you'd like to create a built-in, core function, you'll need to implement the
 function API interface named `Func`. It can be found in
 [lang/interfaces/func.go](https://github.com/purpleidea/mgmt/tree/master/lang/interfaces/func.go).
 Your function must have a specific type. For example, a simple math function
-might have a signature of `func(x int, y int) int`. As you can see, all the
-types are known _before_ compile time.
+might have a signature of `func(x int, y int) int`. The simple functions have
+their types known _before_ compile time. You may also include unification
+variables in the function signature as long as the top-level type is a function.
 
 A separate discussion on this matter can be found in the [function guide](function-guide.md).
 
@@ -746,6 +751,12 @@ added in the future. This method is usually called before any other, and should
 not depend on any other method being called first. Other methods must not depend
 on this method being called first.
 
+If you use any unification variables in the function signature, then your
+function will *not* be made available for use inside templates. This is a
+limitation of the `golang` templating library. In the future if this limitation
+proves to be significantly annoying, we might consider writing our own template
+library.
+
 #### Example
 
 ```golang
@@ -756,6 +767,18 @@ func (obj *FooFunc) Info() *interfaces.Info {
 }
 ```
 
+#### Example
+
+This example contains unification variables.
+
+```golang
+func (obj *FooFunc) Info() *interfaces.Info {
+	return &interfaces.Info{
+		Sig: types.NewType("func(a ?1, b ?2, foo [?3]) ?1"),
+	}
+}
+```
+
 ### Init
 
 ```golang
@@ -818,43 +841,46 @@ Please see the example functions in
 [lang/core/](https://github.com/purpleidea/mgmt/tree/master/lang/core/).
 ```
 
-### Polymorphic Function API
+### BuildableFunc Function API
 
-For some functions, it might be helpful to be able to implement a function once,
-but to have multiple polymorphic variants that can be chosen at compile time.
-For this more advanced topic, you will need to use the
-[Polymorphic Function API](#polymorphic-function-api). This will help with code
-reuse when you have a small, finite number of possible type signatures, and also
-for more complicated cases where you might have an infinite number of possible
-type signatures. (eg: `[]str`, or `[][]str`, or `[][][]str`, etc...)
+For some functions, it might be helpful to have a function which needs a "build"
+step which is run after type unification. This step can be used to build the
+function using the determined type, but it may also just be used for checking
+that unification picked a valid solution.
 
 Suppose you want to implement a function which can assume different type
 signatures. The mgmt language does not support polymorphic types-- you must use
 static types throughout the language, however, it is legal to implement a
 function which can take different specific type signatures based on how it is
 used. For example, you might wish to add a math function which could take the
-form of `func(x int, x int) int` or `func(x float, x float) float` depending on
-the input values. You might also want to implement a function which takes an
-arbitrary number of input arguments (the number must be statically fixed at the
-compile time of your program though) and which returns a string.
-
-The `PolyFunc` interface adds additional methods which you must implement to
-satisfy such a function implementation. If you'd like to implement such a
-function, then please notify the project authors, and they will expand this
-section with a longer description of the process.
-
-#### Examples
-
-What follows are a few examples that might help you understand some of the
-language details.
-
-##### Example Foo
-
-TODO: please add an example here!
-
-##### Example Bar
-
-TODO: please add an example here!
+form of `func(x int, y int) int` or `func(x float, y float) float` depending on
+the input values. For this case you could use a signature containing unification
+variables, eg: `func(x ?1, y ?1) ?1`. At the end the buildable function would
+need to check that it received a `?1` type of either `int` or `float`, since
+this function might not support doing math on strings. Remember that type
+unification can only return zero or one solutions, it's not possible to return
+more than one, which is why this secondary validation step is a brilliant way to
+filter out invalid solutions without needing to encode them as algebraic
+conditions during the solver state, which would otherwise make it exponential.
+
+### InferableFunc Function API
+
+You might also want to implement a function which takes an arbitrary number of
+input arguments (the number must be statically fixed at the compile time of your
+program though) and which returns a string or something else.
+
+The `InferableFunc` interface adds ad additional `FuncInfer` method which you
+must implement to satisfy such a function implementation. This lets you
+dynamically generate a type signature (including unification variables) and a
+list of invariants before running the type unification solver. It takes as input
+a list of the statically known input types and input values (if any) and as well
+the number of input arguments specified. This is usually enough information to
+generate a fixed type signature of a fixed size.
+
+Using this API should generally be pretty rare, but it is how certain special
+functions such as `fmt.printf` are built. If you'd like to implement such a
+function, then please notify the project authors as we're curious about your
+use case.
 
 ## Frequently asked questions
 

diff --git a/engine/util/util.go b/engine/util/util.go
@@ -278,6 +278,7 @@ func LangFieldNameToStructFieldName(kind string) (map[string]string, error) {
 
 // LangFieldNameToStructType returns the mapping from lang (AST) field names,
 // and the expected type in our type system for each.
+// XXX: Should this return unification variables instead of variant types?
 func LangFieldNameToStructType(kind string) (map[string]*types.Type, error) {
 	res, err := engine.NewResource(kind)
 	if err != nil {