Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests for nqp::coerce_si #759

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Tests for nqp::coerce_si #759

wants to merge 1 commit into from

Conversation

nwc10
Copy link
Contributor

@nwc10 nwc10 commented Jan 29, 2022

coerce_si is not sufficiently tested. This draft PR asks what we should test (and hence specify)

Currently with MoarVM master we see this:

Iteration past end of grapheme iterator
   at -e:1  (<ephemeral file>:<mainline>)
 from gen/moar/stage2/NQPHLL.nqp:1949  (/home/nick/Perl/nqp/NQPHLL.moarvm:eval)
 from gen/moar/stage2/NQPHLL.nqp:2059  (/home/nick/Perl/nqp/NQPHLL.moarvm:)
 from gen/moar/stage2/NQPHLL.nqp:2114  (/home/nick/Perl/nqp/NQPHLL.moarvm:command_eval)
 from gen/moar/stage2/NQPHLL.nqp:2039  (/home/nick/Perl/nqp/NQPHLL.moarvm:command_line)
 from gen/moar/stage2/NQP.nqp:4111  (nqp.moarvm:MAIN)
 from gen/moar/stage2/NQP.nqp:1  (nqp.moarvm:<mainline>)
 from <unknown>:1  (nqp.moarvm:<main>)
 from <unknown>:1  (nqp.moarvm:<entry>)

which is a change in behaviour from all releases. This is caused by MoarVM/MoarVM@b80996f (and will soon be fixed)

However, it's unclear what the desired behaviour is for corner cases. The tests here all pass once MoarVM is fixed. However, they would fail on released nqp

MoarVM used to use strtoll, which returns 0 on parsing failures. However, the call MVM_string_ascii_encode_any would throw an exception for any character outside of 0-127, hence the tests here fail like this:

ok 205 - coerce_si(chr 127) is 0
Error encoding ASCII string: could not encode codepoint 128
   at /home/nick/test/109-coercions.t:106  (<ephemeral file>:<mainline>)
 from gen/moar/stage2/NQPHLL.nqp:1949  (/home/nick/Perl/nqp/NQPHLL.moarvm:eval)
 from gen/moar/stage2/NQPHLL.nqp:2154  (/home/nick/Perl/nqp/NQPHLL.moarvm:evalfiles)
 from gen/moar/stage2/NQPHLL.nqp:2114  (/home/nick/Perl/nqp/NQPHLL.moarvm:command_eval)
 from gen/moar/stage2/NQPHLL.nqp:2039  (/home/nick/Perl/nqp/NQPHLL.moarvm:command_line)
 from gen/moar/stage2/NQP.nqp:4111  (nqp.moarvm:MAIN)
 from gen/moar/stage2/NQP.nqp:1  (nqp.moarvm:<mainline>)
 from <unknown>:1  (nqp.moarvm:<main>)
 from <unknown>:1  (nqp.moarvm:<entry>)

JVM is much stricter, failing at the first new test:

ok 28 - int32 to str conversion
java.lang.NumberFormatException: For input string: "    "
  in <mainline> (/home/nick/test/109-coercions.t:109)
  in eval (gen/jvm/stage2/NQPHLL.nqp:1217)
  in evalfiles (gen/jvm/stage2/NQPHLL.nqp:1461)
  in command_eval (gen/jvm/stage2/NQPHLL.nqp:1351)
  in command_line (gen/jvm/stage2/NQPHLL.nqp:1310)
  in MAIN (gen/jvm/stage2/NQP.nqp:4191)
  in <mainline> (gen/jvm/stage2/NQP.nqp:4187)
  in <anon> (gen/jvm/stage2/NQP.nqp)

@MasterDuke17 reports that the relevant conversion API we use is very exacting:

The characters in the string must all be decimal digits except that the first character may be an ASCII minus sign '-' (\u002D') to indicate a negative value or an ASCII plus sign '+' ('\u002B') to indicate a positive value.

Hence any spaces are right out:

$ ./nqp-j -e 'nqp::say(nqp::coerce_si(" 42"));'
java.lang.NumberFormatException: For input string: " 42"
  in <mainline> (-e:1)
  in eval (gen/jvm/stage2/NQPHLL.nqp:1217)
  in <anon> (gen/jvm/stage2/NQPHLL.nqp:1329)
  in command_eval (gen/jvm/stage2/NQPHLL.nqp:1326)
  in command_line (gen/jvm/stage2/NQPHLL.nqp:1310)
  in MAIN (gen/jvm/stage2/NQP.nqp:4191)
  in <mainline> (gen/jvm/stage2/NQP.nqp:4187)
  in <anon> (gen/jvm/stage2/NQP.nqp)

Hence the question is - what should the spec be?

I'm happy to recode either MoarVM or the Java code (or both) to be consistent, but I'd like to have a more tightly defined spec of how much slop we're allowed in the number, and for which cases we throw an exception, which cases we return 0, which leading characters we can ignore, and which cases we just ignore "trailing garbage"

@BloomingAzaleas
Copy link

"Hence the question is - what should the spec be?

I'm happy to recode either MoarVM or the Java code (or both) to be consistent, but I'd like to have a more tightly defined spec of how much slop we're allowed in the number, and for which cases we throw an exception, which cases we return 0, which leading characters we can ignore, and which cases we just ignore 'trailing garbage'"

I suggest a way to think about this should be in terms of layers based on the observation that the lower the layer, the more difficult and hidden it is to diagnose the sloppiness of a spec and the more widespread and opaque the symptoms. The lowest layer here should have the tightest spec, but part of that spec should be a statement or contract expectation on the next layer(s) up the expected usage call stack. I have only a vague grasp of the stack here but it seems to be approximately raku on nqp on moar/jvm, the latter being the bytecode engines at the lowest level. In this case of string to numeric conversion, likely a hot code path for almost everything, I suggest the bytecode engines should only ever have to deal with 'well-formed' string input with the goal of absolutely minimizing code complexity (meaning detecting and dealing with featuritus cases) in the bytecode engines. NQP (or higher?), the next level up in abstraction, would then offer coerce_possibly_sloppy_numeric_strings_xx () using code at a higher level of abstration (and thus presumably more transparent and diagnosable) to deal with any featuritus complexity you might want to offer.

In this model, you deliberately consider complexity elements and trade-off allocate them to appropriate layers in the stack. Your complete solution would then be (possibly) mods to the bytecode engines paired with (definitely) contract mods to NQP.

Regards.

@coke coke changed the base branch from master to main April 19, 2023 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants