Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for syntactic simplifications #4579

Open
PetarMax opened this issue Aug 8, 2024 · 21 comments
Open

Support for syntactic simplifications #4579

PetarMax opened this issue Aug 8, 2024 · 21 comments
Assignees

Comments

@PetarMax
Copy link
Contributor

PetarMax commented Aug 8, 2024

To improve the simplification mechanism of the backend, we require the addition of a syntactic attribute in simplifications, which takes a list of positive integers. These integers would then indicate to the backend that the corresponding clauses in the requires should be checked only syntactically. Two examples of this would be:

rule X <=Int Y => X <Int Y
  requires notBool (X ==Int Y)
  [simplification, syntactic(1)]

and

rule X <=Int Y => true 
  requires X <=Int Z
   andBool Z <=Int Y
  [simplification, concrete(Y, Z), syntactic(1)]

Note that, in the second example, there is a symbolic variable Z introduced in the first clause that is not present in the LHS. This appears to be supported by the front-end if the corresponding module is marked as [symbolic]. Optionally, we would like such variables to be supported also if the clause in they first appear is marked as syntactic.

@Scott-Guest
Copy link
Contributor

@PetarMax Not familiar here - what explicitly counts as a clause / how is the numbering of clauses precisely defined?

@PetarMax
Copy link
Contributor Author

PetarMax commented Aug 8, 2024

I am thinking of simplifications as follows:

rule LHS => RHS
  requires C1 andBool C2 andBool ... andBool CN
  [simplification]

where the Ci are the clauses, separated by andBool, and numbered in order in which they appear in the simplification.

@geo2a
Copy link
Contributor

geo2a commented Aug 14, 2024

I've implemented a prototype of this feature in Booster (runtimeverification/haskell-backend#4022). I've manually modified a definition.kore file and added syntactic{}("1") to the attributes of a simplification.

I have one question for @Scott-Guest: do you know if the K compiler preserves the requires clause as-is when compiling surface K to kore? This is important, as we are relying on the ordering of conjuncts.

And another question for @PetarMax: what behavior you expecting when there are more than one syntactic clause? I.e. syntactic(1, 2). I'd expect we process them from left to right, i.e. if the first one binds all variables from the second, than the second will have no effect and the result will be equivalent to syntactic(1). Does that seem reasonable?

@PetarMax
Copy link
Contributor Author

Left to right, yes please.

@Scott-Guest
Copy link
Contributor

Scott-Guest commented Aug 14, 2024

@geo2a: do you know if the K compiler preserves the requires clause as-is when compiling surface K to kore?

Hmm, that's a good point - regardless of the current situation, it's very fragile to rely on requires clauses never being manipulated, and I think we should consider an alternative syntax which doesn't require this assumption.

At a minimum, we already do constant folding which could collapse the number of clauses.

I'm not that familiar with the symbolic backends, so I don't know if this is feasible, but what about adding a special symbol? Something like

syntax Bool ::= syntactic(Bool) [symbol(syntactic)]

to mark

rule X <=Int Y => true 
  requires syntactic(X <=Int Z)
   andBool Z <=Int Y
  [simplification, concrete(Y, Z)]

@PetarMax thoughts?

@PetarMax
Copy link
Contributor Author

PetarMax commented Aug 14, 2024

As somebody who writes rules, I would expect to be able to refer to the specific clauses of the rule, in the specific order in which I write them. Before you start processing the clauses, you should be able to tag them and then process them tagged. The rules marked for syntactic matching will not disappear if written correctly by the user, and if written incorrectly... 🤷‍♂️ . For example, starting from:

rule X <=Int Y => true 
  requires X <=Int Z andBool T <=Int W andBool W <=Int T
  [simplification, concrete(Y, Z), syntactic(1, 3)]

you should be able to tag the first and third clause with an attribute of some kind indicating ordering in syntactic matching, say

{ X <=Int Z; syntactic: 1 }
{ T <=Int W }
{ W <=Int T; syntactic: 2 } <--- note that here is a 2, indicating that this clause is the second to be matched

and then work with them as such, permute, transmute, whatever you like. The backend would not have to receive them in order or think about order. It's not clear to me how your syntax would preserve the ordering in which the syntactic matching is to happen - perhaps we can work on that?

@Scott-Guest
Copy link
Contributor

Scott-Guest commented Aug 14, 2024

@PetarMax

The rules marked for syntactic matching will not disappear if written correctly by the user, and if written incorrectly... 🤷‍♂️

My concern is that, as this attribute was originally proposed, "correctly" here means "in a way which won't trigger certain compiler optimizations / manipulations". Just on principle, we shouldn't be exposing compiler implementation details to the user - it can be finicky to reason about even in the relatively simple case of constant folding, and prone to breakage if the user does refactoring or if compiler internals change.

Before you start processing the clauses, you should be able to tag them and then process them tagged.

I definitely agree. The finickiness I mention above is solved by recording the information directly onto the expression itself, rather than non-locally as external meta-data that needs to be kept in-sync if the compiler manipulates the expression.

However, KORE only supports attributes on sentences / declarations, not on sub-terms. Regardless of the difficulty of adding tagging infrastructure for sub-terms to the front-end (which AFAICT would require auditing the whole frontend pipeline), at the end of the day, our only option is to either change KORE or emit an actual symbol similar to what I proposed.

We then have a few options:

  1. The user directly writes this tagging symbol in the source code
  2. The user writes a syntactic(_) attribute on the rule which refers to clauses in that rule. The compiler internally places tagging symbols on the referred-to terms, then we either
    a) Use these tagging symbols to reconstruct the syntactic(_) attribute before re-emitting to KORE
    b) Just have the backends directly consume these tagging symbols

2b) seems preferable over 2a). Otherwise, the backend needlessly has to care about ordering / do work to reconstruct info we already had in the frontend.

I'm less certain about 1) versus 2b), and would be interested in your thoughts.

It's not clear to me how your syntax would preserve the ordering in which the syntactic matching is to happen - perhaps we can work on that?

Ah I missed this part. We could update the syntax to also include an Int argument indicating the ordering (or priority) for syntactic matching:

syntax Bool ::= syntactic(Bool, Int) [symbol(syntactic)]

Alternatively, we could even more broadly add syntax to name a sub-term like

syntax Bool ::= named(Bool, String) [symbol(named)]

where relevant attributes can then refer to names

rule X <=Int Y => true 
  requires named(X <=Int Z, foo) andBool T <=Int W andBool named(W <=Int T, bar)
  [simplification, concrete(Y, Z), syntactic(foo, bar)]

Do you agree with my assessment? What're your thoughts on 1) vs 2b)?

@PetarMax
Copy link
Contributor Author

Yes, I agree. I would be happy with syntactic(Bool, Int), it's quite flexible. The String version, I think, loses the ordering again, unless it'd be lexicographic?

@Scott-Guest
Copy link
Contributor

Scott-Guest commented Aug 14, 2024

The idea with the String version is that

  • named(Bool, String) just provides a name to a sub-term. Any attribute could then use this name to refer to that sub-term, and this isn't specific to anything about syntactic(_).
  • syntactic(_) would take a list of names for sub-terms. Just like with syntactic(1, 2), the order in the attribute specifies the ordering between the clauses - the only difference is we refer to those terms by name rather than number.

@PetarMax
Copy link
Contributor Author

Yes, ok, understood. Do we have use cases other than syntactic? Could other current features be re-expressed like this? If not, then I'm more inclined to just go for syntactic.

@Scott-Guest
Copy link
Contributor

Looking through the existing attributes, it doesn't appear broadly applicable - let's go with syntactic(Bool, Int) then.

@PetarMax
Copy link
Contributor Author

Yes please, thank you! While we-re doing this, could we please also add another simplification attribute, no-smt?

rule LHS => RHS requires ... ensures ... [simplification, no-smt]

Its meaning in the backnd would be that during its application we should not use the SMT solver.

@geo2a
Copy link
Contributor

geo2a commented Aug 15, 2024

Thanks for the in-depth analysis @Scott-Guest and @PetarMax!

We've discussed the proposal in the Haskell backend daily meeting with @jberthold and @goodlyrottenapple.

Our consensus is that using inner syntax to drive a backend feature does not align well with K's design.
With making a symbol like named or syntactic special, we are blocking semantic writers from using this symbol for their own purposes. I'd argue that such special casing is unacceptable for a semantics framework.

Instead, the information regarding special treatment of parts of inner syntax should flow to the backend via attributes. This is already done for many cases such as the symbol attribute, the concrete and symolic attributes etc. The marking of clauses as syntactic should be no different.

What happens inside the compiler is of course another story: we can have any sort of special-casing there, and we likely already do. With that in mind, one of the approaches @Scott-Guest suggests:

The user writes a syntactic() attribute on the rule which refers to clauses in that rule. The compiler internally places tagging symbols on the referred-to terms, then we either
a) Use these tagging symbols to reconstruct the syntactic(
) attribute before re-emitting to KORE

sounds suitable.

@Scott-Guest what do you think? I'm happy to set-up a meeting to discuss this in more depth.

@Scott-Guest
Copy link
Contributor

Scott-Guest commented Aug 16, 2024

@geo2a I don't really agree, but I'm happy to concede to the consensus.

I'll open an initial PR adding syntactic to the attribute whitelist + easing the restriction on unbound variables so you can use it immediately. Just be careful to check the output KORE and confirm the requires clause doesn't get changed by the compiler.

I'll add the tagging under the hood to make it more robust in a follow-up PR.

@Scott-Guest Scott-Guest self-assigned this Aug 16, 2024
@geo2a
Copy link
Contributor

geo2a commented Aug 19, 2024

Thank you @Scott-Guest!

easing the restriction on unbound variables

I don't think there's anything to do here, as rules like this:

rule X <=Int Y => true 
  requires X <=Int Z
   andBool Z <=Int Y
  [simplification]

are currently accepted and compiled as expected by kompile --backend haskell.

@PetarMax
Copy link
Contributor Author

Are they accepted even if the module that contains them is not marked as [symbolic]? I think that simplificatios are, in principle, also given to the LLVM backend, unless the module that contains them is marked like that.

@Scott-Guest
Copy link
Contributor

Scott-Guest commented Aug 21, 2024

I have the PR just adding syntactic to the whitelist (#4597), but need clarity regarding what we want to do for unbound variables.

Presumably, the LLVM backend can't handle these syntactic simplifications, so wouldn't we want to only allow this in symbolic modules as is already the case? Why would we drop that restriction?

@PetarMax
Copy link
Contributor Author

Yes, you are right, they belong only in [symbolic], and that already works. It would appear that we don't need to do anything for unbound variables, right, @geo2a?

@geo2a
Copy link
Contributor

geo2a commented Aug 21, 2024

No, I do not think we need anything.

rv-jenkins pushed a commit that referenced this issue Aug 22, 2024
Part of #4579

Adds the `syntactic` attribute to the whitelist, and checks that it is
only applied to simplification rules.

Notably, we still need to:
- Check that the arguments actually refer to existing clauses
- Tag the underlying clauses and re-construct the attribute at the end
of the compilation pipeline because the `requires` clauses may be
manipulated during compilation
@palinatolmach
Copy link
Contributor

@geo2a @PetarMax @Scott-Guest can we close this issue now that #4597 has been merged, or is there anything else to it?

@Scott-Guest
Copy link
Contributor

@palinatolmach In the frontend, we still need to add tagging to track if the compiler modifies the term and update the attribute accordingly. I likely won't have time to get to this for awhile, so we can move this to the backlog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants