Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid UTF #2620

Closed
sorawee opened this issue Aug 22, 2022 · 2 comments
Closed

Invalid UTF #2620

sorawee opened this issue Aug 22, 2022 · 2 comments
Labels
part: verifier Translation from Dafny to Boogie (translator)

Comments

@sorawee
Copy link
Contributor

sorawee commented Aug 22, 2022

From https://mnaoumov.wordpress.com/2014/06/14/stripping-invalid-characters-from-utf-16-strings/:

"\ud800b" is invalid because "here \ud800 is lead surrogate but it is followed by b letter which is not a low surrogate."

But

method Main() {
  print "\ud800b";
}

works fine, and outputs �b

Ideally, this should result in a verification error.

Reported by @jswrenn

@prvshah51 prvshah51 added kind: bug Crashes, unsoundness, incorrect output, etc. If possible, add a `part:` label part: verifier Translation from Dafny to Boogie (translator) labels Aug 23, 2022
@cpitclaudel
Copy link
Member

Dafny strings are not valid unicode strings, currently, so I would say "not a bug"; we are in the process of revamping those strings in dafny-lang/rfcs#13

@prvshah51 prvshah51 removed the kind: bug Crashes, unsoundness, incorrect output, etc. If possible, add a `part:` label label Aug 23, 2022
@robin-aws
Copy link
Member

Hi Sorawee! The new --unicode-char option, which will be on by default in Dafny 4.0, mostly addresses this gap. When it is enabled, the \uXXXX style of unicode escapes that identifies UTF-16 code points is no longer accepted, in favor of the \U{X...X} style that identifies Unicode scalar values instead. That means it's no longer possible to specify invalid strings like the one in the example. See https://dafny.org/latest/DafnyRef/DafnyRef#sec-characters for further details.

Even better, Dafny supports arbitrary Unicode characters directly in string literals now, so you no longer have to use Unicode escapes for such characters.

Closing this as I believe we've done everything we're planning to do in this area, but feel free to reopen!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
part: verifier Translation from Dafny to Boogie (translator)
Projects
None yet
Development

No branches or pull requests

4 participants