Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Relay] Expand type unification and other utilities #2189

Merged
merged 51 commits into from
Jan 15, 2019

Conversation

slyubomirsky
Copy link
Contributor

@slyubomirsky slyubomirsky commented Nov 29, 2018

This change adds greater support for type unification to Relay. In particular, it allows for handling cases that require recursing down the type ASTs. E.g., Unify(Tuple(?a, ?b), Tuple(Tensor(s, bt), Tensor(s, bt))) should be able to conclude that the unified type is Tuple(Tensor(s, bt), Tensor(s, bt)) and that Unify(Tuple(?a, ?b), Tuple(?c, ?d)) implies ?a = ?c and ?b = ?d.

The main concern is ensuring that if we have a type variable nested inside another type, then an update to that type variable should result in re-running all relations where that type variable appears, including those where that type variable is nested inside another type. That was the main source of complexity in these changes.

(Note to reviewers: It would be very good to ensure that this unifier does not unnecessarily add relations to the queue and thus make the solver do unnecessary work. I do not believe that it does, though.)

Please review as to whether this is the direction we want to take, @jroesch, @tqchen, and others

Outline of Relay's Type Checking and Inference Implementation

For the benefit of reviewers and future work on Relay's type system implementation, I will give a high-level outline of how type checking and inference work in Relay.

Types are actually inferred during the process of type checking, which runs in two stages:

  1. Constraint generation
  2. Constraint solving

Explanation of Constraints on Types

By "constraints," we mean constraints on the possible types permitted for a given term. For example, if a variable v appears in a tuple projection v.1, we would like to be able to express the constraint that v must have a tuple type with at least two members. Later, in the constraint solving phase, we would fill in the specific tuple type that v must have.

Handling type inference requires filling in types for terms whose types have been omitted, as in functions where parameter or return types have been omitted or let bindings where the variable is not annotated with a type. In these cases, those terms are given incomplete types (known as type variables or type holes in the programming languages literature), to be filled in during the constraint solving phase.

Unification

The most common constraint employed is the main subject of this PR, namely unification: the constraint that two types must be structurally equal. (For example, unifying two tuple types requires ensuring that the corresponding members of the tuples can be unified.) Very many type-checking rules require unification: for example, type-checking a function call requires ensuring that the types of the arguments to the call match the types expected for function arguments, so the types of the call arguments will be unified with the function's expected argument types.

Incomplete types require special handling during unification, which is where most of the complexity in the implementation here emerges. Namely, if an incomplete type is unified with another incomplete type, that means both those incomplete types must eventually resolve to the same type. So, if incomplete type ?a and ?b are unified and then ?b is later unified with a type Tensor[(10, 10), float32], all occurrences of both ?a and ?b must be equal to Tensor[(10, 10), float32].

Unification for incomplete types is thus handled via a union-find (UF) data structure, where incomplete types are nodes in the UF tree. For unifying incomplete types with concrete types, we must also keep a mapping of incomplete types to concrete types: if an incomplete type is unified with a concrete type, we say it resolves to that concrete type.

There is one more complication, which is that recursive unification is not permitted. E.g., Unify(?a, (?a, ?b)) is invalid because there is no concrete type that is a tuple containing itself as one member. Thus, we must check for such recursion when unifying incomplete types and reject it (this is called an "occurs check" in the PL literature, since it checks whether a type variable occurs inside another type).

We can thus outline the algorithm for unifying two types t1 and t2 (Unify(t1, t2)) as follows:

  1. If t1 and t2 are both incomplete types, find r1 and r2, their respective UF roots. If r1 is the same as r2, there is no more work to do. Otherwise, we may unify r1 and r2 in the UF tree. If one of r1 and r2 has a resolved (concrete) type and the other doesn't, set the resolved type of the new UF root to that resolved type; if both have a resolved type (call them c1 and c2), set the resolved type of the new UF root to Unify(c1, c2). Return the resolved type of the new UF root (or the UF root itself if there was no resolved type).
  2. If one of t1 and t2 is an incomplete type and the other is concrete (WLOG, assume that t1 is incomplete and t2 is concrete), obtain the UF root of t1, r1. Next perform an occurs check in t2 for r1 (the occurs check simply traverses t2's AST and for each incomplete type encountered, checks whether its UF root is equal to r1); reject if there is an occurrence. If r1 does not have a resolved type, set its resolved type to t2 and return t2. If r1 has a resolved type c1, set r1's resolved type to Unify(c1, t2) and return the final resolved type.
  3. If neither t1 nor t2 is an incomplete type, unify them by recursing down their ASTs. If t1 and t2 fail to match at any point of their AST (e.g., if t1 is a tuple type and t2 is a function type), reject the unification. For types that do not contain nested types (e.g., tensor types), reject if they are not equal and return the type if they are equal. Otherwise, unify all subtrees of the two types and return the unified type.

Because union-find with path compression is very asymptotically efficient, unification takes time approximately linear in the size of the type ASTs being unified.

Explicit Type Relations

The other constraints typically generated in Relay are explicit type relations on functions. Type relations are typically associated with Relay operators (see the developer tutorial on adding operators, https://docs.tvm.ai/dev/relay_add_op.html, for some discussion). In their current implementation, they take the argument and return types to a Relay function and define a procedure (i.e., a function in C++) that checks if the constraint holds. The procedures may perform arbitrary checks and can update any shape variables or incomplete types within the types passed to them.

For example, the Identity relation simply unifies the types passed to it and is satisfied if unifying the types succeeds. The Broadcast relation is satisfied if all the types passed to it are tuple types and the shape of the function return type is the broadcast of the argument types' shapes.

Note that because it is possible for type relations to be given incomplete types or types that contain nested incomplete types as arguments, the queue of type relations must be appropriately updated as those incomplete types are unified and resolved. In order to track these updates, each type in the program keeps a list of all the type relations in which it appears as an argument (we can call this list a "trigger list"), which is also propagated to every sub-tree of that type's AST. When two types are unified, the final unified type will combine the trigger lists of the two argument types and any unresolved relation in the two trigger lists will be put on the queue of relations, as there is now new information that may allow that relation to be resolved.

For example, if we have a tuple type (?a, ?b) that is given as an argument to relation R1, then R1 is in the trigger lists of (?a, ?b), ?a, and ?b. If ?a is unified with another type and R1 is not yet solved, R1 will be put back on the queue of relations because updating ?a give new information.

When adding a type relation, it takes time linear in the size of each argument type AST to propagate that type relation to the trigger list of each argument type. When unifying two types, it takes time linear in the size of the types' trigger lists to iterate through the trigger lists and determine whether a relation needs to be put back on the queue. While the linear performance times here can quickly add up, the number of explicit relations in any given program will typically be quite small and will only apply to a few of the types in the program, so this is not expected to be a threat to performance.

Constraint Generation

The constraint generation phase of type checking is implemented by recursing down the AST of a program and adding new type constraints based on the type-checking rules for each AST node. Without listing out the rules for each node, this procedure consists primarily of the following:

  • Keep a map of terms to types.
  • For local variables bound in a let expression or function argument, if the type annotation is omitted, set it to a fresh incomplete type. Similarly, if a return type to a function is not annotated, treat it as having an incomplete type for the return type.
  • For operators and functions that feature explicit type relations, add the relation to a queue for the constraint solving phase.
  • Unify types of terms everywhere they are expected to match. For example, if there is a call to a function in the program, the type of the call should be unified with the return type of the function and the type of each call argument should be unified with the corresponding function parameter types.

This phase of type checking should not do very much for each AST node besides adding type relations to a queue or adding more unification constraints and should only visit each AST node once, so it should be approximately linear in the size of the program AST.

Constraint Solving

The second phase of type checking requires solving constraints. In the case of unification constraints, solving is simple: for each type t, resolve it by recursing down its AST and replacing each incomplete type with the concrete resolved type of its root in the UF tree.

Explicit type relations currently take somewhat of an ad hoc approach: There is a queue of relations kept from the generation phase. For each relation in the queue, call its updater function. If the relation holds and all types passed to the relation are fully concrete, dequeue the relation; it is solved. If a relation returns false, there is a type checking error. Otherwise, if incomplete types remain, add the relation back to the queue. Keep iterating through the queue until all relations are solved or a fixpoint is reached (i.e., no types are changing). A fixpoint may either mean there is an error (impossible constraint to satisfy) or that there is not enough information given (more annotations may be necessary).

Note that resolution of incomplete types should be performed after solving all type relations, since the relations may update incomplete types (i.e., add more unification constraints).

The complexity of this phase is difficult to classify, since relations may perform arbitrary operations. It is certainly possible to imagine constructing very entangled queues of interdependent relations that would have poor performance, but in practice, the relations used for most operators in Relay have simple constraints that do not take many iterations of the queue to fully resolve.

Remaining Issue

Bound type variables don't fit into the above story. That will take some additional design work (see below discussion); we will address it later.

@jroesch
Copy link
Member

jroesch commented Nov 29, 2018

I will take a review pass after I get back from MSR tomorrow.

@slyubomirsky slyubomirsky changed the title [Relay][WIP] Expand unification in type solver [Relay] Expand unification in type solver Nov 30, 2018
@slyubomirsky
Copy link
Contributor Author

slyubomirsky commented Nov 30, 2018

Had the below case fail when I tried it. This appears to be happening because the identity relations between t4 and tup1, t5 and tup2, and t4 and t5 are marked as "resolved" in the queue (because the identity relation function always returns true) even though tup1 and tup2 contain nested incomplete types.

There are two ways I can think of right now to fix this bug:

  1. During unification, have a child node mark the parent's relations as unresolved if the child is an IncompleteType and enqueue them again (probably will lead to unnecessarily running relations more than once, so hopefully there's a way to be smarter about it).
  2. Change the identity relation to return false if there are nested incomplete types (not sure this is correct)

Edit: Figured out another potential issue. In the case of Unify(?a, Tuple(?b, ?c)), ?b and ?c are not being marked as children of ?a, i.e., updates to ?b and ?c are not causing updates to ?a's relations. I will investigate this too.

Edit2: Additionally, if we have a type Tuple(?a, ?b) and update ?a and ?b, the solver's Resolve() will still resolve Tuple(?a, ?b) as Tuple(?a, ?b), so perhaps Resolve() needs to have a visitor too.

def test_backward_solving_after_child_update():
    solver = make_solver()

    tensor1 = relay.ty.TensorType((10, 20), "float32")
    tensor2 = relay.ty.TensorType((10, 1, 1), "float32")

    t1 = relay.ty.IncompleteType()
    t2 = relay.ty.IncompleteType()
    t3 = relay.ty.IncompleteType()

    tup1 = relay.ty.TupleType([t1, t2])
    tup2 = relay.ty.TupleType([t1, t3])

    tup_concrete = relay.ty.TupleType([tensor1, tensor2])

    t4 = solver.gen_type("Identity", [tup1])
    t5 = solver.gen_type("Identity", [tup2])

    solver.gen_type("Identity", [t4], out=t5)
    assert solver.Solve()
    assert solver.Resolve(t5) == tup1 or solver.Resolve(t5) == tup2

    # updating the variables *inside* tup1 and tup2 should update t4 and t5
    solver.gen_type("Identity", [t1], out=tensor1)
    solver.gen_type("Identity", [t2], out=tensor2)
    assert solver.Solve()
    assert solver.Resolve(t3) == tensor2
    assert solver.Resolve(t5) == tup_concrete

@slyubomirsky
Copy link
Contributor Author

Introducing a visitor for resolving solved the problems I had in #2189 (comment) without needing to implement any of the other measures, but I would appreciate discussion on whether that's a good approach to resolution or if there need to be larger rewrites of TypeSolver to accommodate nested type vars.

@jroesch
Copy link
Member

jroesch commented Dec 1, 2018

@slyubomirsky you definitely need to recursively resolve, the boolean should only indicate if progress has been made on the relation not whether it is fully resolved. The main places that should have been changed are (1) add recursive unifier, (2) make resolver recursive, and (3) update solver to handle new relations.

@slyubomirsky
Copy link
Contributor Author

slyubomirsky commented Dec 2, 2018

@jroesch could you please clarify what (3) means in your last comment? I'm not sure what specifically is missing from here; an example test case (even a sketch) could also be helpful

@slyubomirsky
Copy link
Contributor Author

I think I understand now what might be missing (please correct me if I'm wrong, @jroesch): Adding a relation to the solver via AddConstraint should similarly propagate down to nested types, which requires another visitor. I'll add that soon.

I don't know of a non-unification test case where this applies, however; any suggestions?

… is added, not during unification. Resolve before running a relation
@slyubomirsky
Copy link
Contributor Author

slyubomirsky commented Dec 3, 2018

My most recent commit should propagate recursively, as described, as well as resolve before running a relation (potentially filling in solved nested type vars). Please review @jroesch, @MarisaKirisame. I think we might do well to have more complex test cases, so if any has been giving you problems, I would appreciate a suggestion.

(Not sure what the issue is with the CI, does not happen to me locally.)

@jroesch
Copy link
Member

jroesch commented Dec 5, 2018

I think this looks like it solves the appropriate cases, @MarisaKirisame can you send Steven the broken tests we had committed?

@MarisaKirisame
Copy link
Contributor

@jroesch done privatedly

@slyubomirsky
Copy link
Contributor Author

@jroesch @MarisaKirisame I think the cases you showed me will require changes to type inference, rather than the solver. I will handle them in this PR.

@slyubomirsky
Copy link
Contributor Author

I thought the above problem was addressed by the addition of the propagator, which ensures that all nested types list the relations from their parent types. I will see if there is a need to have further recursive merging and if I can add more test cases for such recursive updates.

@slyubomirsky
Copy link
Contributor Author

Another option would be calling the propagator in every call to Unify, but that seems unnecessary so long as AddConstraint() is called whenever a type relation is encountered.

@tqchen
Copy link
Member

tqchen commented Dec 27, 2018

The current propagator logic only solves the problem when the initial relation contains recursive ones, like R1((?c1, ?c2), ?a), but did not resolve the problem when an existing IncompleteType get unified to a tuple which also contains IncompleteType.

For example. initial condition: R2(?d, ?a), then we call MergeFromTo((?c1, ?c2), ?d)

@slyubomirsky
Copy link
Contributor Author

Ah, I see the issue now, very good catch. I'll work on a solution over the next day.

@slyubomirsky
Copy link
Contributor Author

slyubomirsky commented Dec 28, 2018

I have recursive merging in, but I think each merge also needs to propagate in order for the rel lists to be correctly updated (since it is always true that if a type has nested types, updates to the nested types should cause updates to relations for the parent). I am getting some weird problems with circular links being formed, so I think I will try using different data structures to both eliminate the issue of circular links and make it easier to keep the trigger lists up to date (a lot of wacky visitors going around).

(This will probably lead to some inefficiencies, but I'd rather have things be obviously correct and ready for us to fix later than subtly flawed.)

@tqchen tqchen self-assigned this Jan 8, 2019
@tqchen
Copy link
Member

tqchen commented Jan 11, 2019

I think we will need to revisit this mainly for performance reasons (the use of unodered_set on Node), but since this appears to implement features we need correctly, I can move on to merge this in. @jroesch can you take another close look and https://docs.tvm.ai/contribute/code_review.html#approve-and-request-changes-explicitly

@tqchen
Copy link
Member

tqchen commented Jan 11, 2019

@junrushao1994 please help to take a look as well if you have time

@junrushao
Copy link
Member

@tqchen Will do tomorrow night

*
* \param expr the expression.
*
* \return List of free vars, in the PostDFS order in the expression.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it should be "List of all vars", not only free vars?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That it is; I'll correct it. Thanks.

Copy link
Member

@jroesch jroesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are probably a few more changes we want to make in the future, but overalls good for now.

@tqchen
Copy link
Member

tqchen commented Jan 15, 2019

Copy link
Member

@junrushao junrushao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel like it is pretty good for now

@tqchen tqchen merged commit e0a20ad into apache:master Jan 15, 2019
@tqchen
Copy link
Member

tqchen commented Jan 15, 2019

Thanks to @slyubomirsky @jroesch @junrushao1994 , this is now merged

@tqchen
Copy link
Member

tqchen commented Jan 15, 2019

This PR is reverted due to some additional problems revealed by with the recent added AD, please send in another PR and fix the issues.

wweic pushed a commit to neo-ai/tvm that referenced this pull request Feb 20, 2019
wweic pushed a commit to neo-ai/tvm that referenced this pull request Feb 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants