Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser considers 0 in "$foo[0]" to be a string instead of an int #325

Closed
jesseschalken opened this issue Dec 7, 2016 · 8 comments
Closed

Comments

@jesseschalken
Copy link
Contributor

$ ./bin/php-parse '<?php "$foo[0]";'
====> Code <?php "$foo[0]";
==> Node dump:
array(
    0: Scalar_Encapsed(
        parts: array(
            0: Expr_ArrayDimFetch(
                var: Expr_Variable(
                    name: foo
                )
                dim: Scalar_String(
                    value: 0
                )
            )
        )
    )
)

I expected Scalar_LNumber instead of Scalar_String.

PHP apparently considers it an integer (but not if it's not intish):

<?php

class Foo implements ArrayAccess {
  public function offsetGet($x) {
    var_dump($x);
  }
  public function offsetExists($x) {
    var_dump($x);
  }
  public function offsetSet($x, $y) {
    var_dump($x);
  }
  public function offsetUnset($x) {
    var_dump($x);
  }
}

$foo = new Foo();

echo "$foo[0]";
echo "$foo[1]";
echo "$foo[00]";
int(0)
int(1)
string(2) "00"

https://3v4l.org/qU9E9

@martinsik
Copy link

Isn't this left like this intentionally?

See comment in https://github.com/nikic/PHP-Parser/blob/master/grammar/php5.y#L647

@nikic
Copy link
Owner

nikic commented Dec 7, 2016

Testing using ArrayAccess doesn't really tell you how PHP "originally" interpreted it, because constant integer string indexes are normalized to integer indexes during compilation, see for example https://3v4l.org/PFZWp (this uses three strings, and yet two of them dump as integers).

However, I do agree that we should be representing this as integers, consistent with how the php ast represents this: http://lxr.php.net/xref/PHP-MASTER/Zend/zend_language_scanner.l#1719

@nikic
Copy link
Owner

nikic commented Dec 7, 2016

@martinsik "Alternative syntax" refers to the $foo{$bar} array access syntax, which was not supported in some places in PHP 5. That's what the comment is about.

nikic added a commit that referenced this issue Dec 7, 2016
@nikic
Copy link
Owner

nikic commented Dec 7, 2016

Implemented in 70319e2.

@nikic nikic closed this as completed Dec 7, 2016
jesseschalken added a commit to jesseschalken/normalize-php that referenced this issue Dec 8, 2016
@jesseschalken
Copy link
Contributor Author

@nikic Gah, I thought automatic intish string => int conversion was only a thing that happened with arrays, not all ArrayAccess. That's annoying.

Thanks for the prompt fix, it works now. :)

@nikic
Copy link
Owner

nikic commented Dec 11, 2016

I'm having second thoughts about this change due to c1e0bab. PHP 7.1 adds support for "$a[-1]". Interpreting the offset as a number (if possible) causes some unpleasantness in this case.

Particularly concerning is that the offset might be PHP_INT_MIN. We will interpret it as such in this situation, because the offset is interpreted under symtable numeric string rules (as of php/php-src@2c70581), however outside of this context this would instead be interpreted as -(PHP_INT_MAX + 1), which is a double. Because the pretty printer normalizes from "$a[-1]" to "{$a[-1]}", the result is no longer semantically equivalent in this case.

This would require special handling at some point. If an LNumber has value PHP_INT_MIN I might emit -PHP_INT_MAX - 1 in the pretty printer. I guess this would make sense even independently of this issue, because someone could create a PHP_INT_MIN LNumber manually (even if the parser, prior to this change, would never create one by itself).

@jesseschalken
Copy link
Contributor Author

@nikic That makes sense to me. I would expect pretty printing new LNumber(\PHP_INT_MIN) to yield an expression that evaluates to an integer, even if that means emitting -9223372036854775807 - 1. Unfortunately that means the syntax tree doesn't precisely round-trip through the pretty printer, but that's a better problem to have if the code is still semantically equivalent.

@nikic
Copy link
Owner

nikic commented Dec 11, 2016

I've fixed the pretty-printing issue (along with a couple other things) in 58970e2. Still not sure whether the string->number conversion here is worthwhile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants