Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lexer: Object literal with inconsistent indentation parsed as multiple objects #2998

Closed
loveencounterflow opened this issue May 24, 2013 · 19 comments
Labels

Comments

@loveencounterflow
Copy link

today i wrote

known_colors =
   45:              'cyan'
   52:              'sepia'
  100:              'olive'
  124:              'red'

in an effort to neatly align those figures. of course, this is isn't syntactically valid, as it doesn't keep indentation... BUT it didn't generate an error, either, so my script silently failed. so i compiled it, and this is what came out:

  known_colors = {
    45: 'cyan',
    52: 'sepia'
  };

  ({
    100: 'olive',
    124: 'red'
  });

could the result be meaningful in any real-life situation? it seems rather meaningless to me. i think the above code should throw a syntax error.

@vendethiel
Copy link
Collaborator

Bare objects are allowed in JS too (although as expression), may be completion value, or may be the result of a temporary comment.
The real problem is Coffee's tolerance to whitespace, but then again, I don't think we're gonna change that now..

@connec
Copy link
Collaborator

connec commented May 24, 2013

As a side-note, you can achieve the same result by using 045 and 052.

I lied, CoffeeScript tries to parse it as an un-prefixed octal (which seems kind of weird... surely if there is no prefix (0oXX) then it is not an octal...).

@loveencounterflow
Copy link
Author

not really, if you need numerical keys converted to strings to correspond to integer numbers... but it's not about numbers here, it's about inadvertently adding or missing that single space which leads to a drastically different construct.

curiously, when you erase that top assignment line, the resulting javascript will only retain the top part of the object literal, and the bottom part will be swallowed silently. what on earth can justify this behavior?

@connec edit: i'd almost said so much but then i thought nah CS will put those into quotes... but no.–those leading zeroes are outlawed because they used to be used for decades, but are now widely regarded bad practice.

@jashkenas
Copy link
Owner

That is pretty damn terrible.

@jashkenas jashkenas reopened this May 24, 2013
@vendethiel
Copy link
Collaborator

But should be resolved with #2985

@epidemian
Copy link
Contributor

what on earth can justify this behavior?

I think that behaviour is the same one reported in #2981. Pretty bad indeed.

@loveencounterflow
Copy link
Author

over @ #2981 someone says that initial indentation is ok as long as it's consistent... well, many years spent with Python's somewhat complex indentation rules tell me it's all about simplicity. i would be fine if CS would only allow files that start w/ zero indentation, all indentation must be done using two spaces, tabs are disallowed, you can only go one step deeper than the previous line, and only where explicitly licensed by the construct on that previous line... something like that. in Python they tried to magically make things work for a lot of different typing habits but in the end they wrote too many tools to fix too many issues with too many indentation styles. plus early on someone had to implement the rule that one tab equals eight spaces, coz that's what they used to get on teletypes–meaning that whitespace rules are now fundamentally broken for those 90% who set one tab to sth less than eight spaces. it's not worth it.

@goto-bus-stop
Copy link

I believe there've been tons of issues on coffeescript's (overly) permissive indentation handling. Perhaps it's time to force balanced in/outdents? (as in Coco and Livescript?)

$ xclip -o | coco -bc
SyntaxError: unmatched dedent (1 for 3) on line 4

@loveencounterflow
Copy link
Author

mind to elaborate on "balanced"? would that be even one rule stricter than the tightjacketted rules i outlined above?

@vendethiel
Copy link
Collaborator

@renekooi as said before, #2985 should attenuate the problem.
You can see an old discussion about that in #1275.

(as a tab user, I'm obviously strongly about disallowing them, even if I don't use coffee)

@loveencounterflow
Copy link
Author

@Nami-Doc each to their own, but let it be said that differing indentation styles, and especially allowing tabs and spaces in source, are a constant source of frustration when copying code / using editors that use different settings etc. fact is, you cannot have the language correct all that for you unless you require 'magic', so imho it seems better to dumb the language down to make sure it works well without that unhelpful complexity.

@goto-bus-stop
Copy link

mind to elaborate on "balanced"?

CoffeeScript currently does something like this:

if a
    b: 0 # INDENTed 4 spaces
    c: 1
  b() # OUTDENTed 2 spaces, but now top-level! Executed regardless of `a`

You could still do 2 spaces here, 7 spaces there, but things would actually have to be on the same level to be parsed as 'on the same level'

@Nami-Doc, yeah; #1275 was one of them. I don't think #2985 fixes that, though? That's only for outdents that go over top-level

@michaelficarra
Copy link
Collaborator

@renekooi: correct; this is an example of #1275.

@vendethiel
Copy link
Collaborator

@renekooi wont "fix", which is why I said "attenuate".

@loveencounterflow two types of indentation in one file is of course something that should be disallowed imho, as coco currently does.

@loveencounterflow
Copy link
Author

@Nami-Doc @renekooi exactly, two types of in/dedentation within a file should be disallowed; also, CS does not have any provision that would allow semantics of a 'double indentation' (i.e a\n→ba\n→→b) since there is no predefined unit of indentation (i call it 'the dent').

therefore, i believe the 'dent' should be defined by the first line in a file that has some printing material (maybe comments excluded) on it that are preceded by whitespace. (additionally i think the first line of code must not be indented at all). on the following lines, only indentations in units of the dent should be allowed.

with these 2 or 3 rules, @renekooi 's example would not parse–either the indentation of the object literal is too deep, or the indentation of b() is only half a dent, which is forbidden.

@xixixao
Copy link
Contributor

xixixao commented Jan 22, 2014

This is not an example of 1275 or 2981 (2432), 2985 was not merged and the other PR did not fix this. Let's reopen (nasty thing indeed).

@jashkenas jashkenas reopened this Jan 22, 2014
@GeoffreyBooth GeoffreyBooth changed the title bordercase with object literal / indentation Lexer: Object literal with inconsistent indentation parsed as multiple objects May 6, 2017
@GeoffreyBooth GeoffreyBooth added bug and removed change labels May 6, 2017
@GeoffreyBooth
Copy link
Collaborator

As of 2017-05-05 in CS2, the initial example still fails, in my opinion:

var known_colors;

known_colors = {
  45: 'cyan',
  52: 'sepia'
};

({
  100: 'olive',
  124: 'red'
});

At the very least we should throw an “inconsistent indentation” error. Compiling into two objects is worse than outputting invalid JavaScript. At least invalid JS throws an error in the runtime.

@GeoffreyBooth
Copy link
Collaborator

Related to #2667.

@GeoffreyBooth
Copy link
Collaborator

Closing per discussion in #4630. Unfortunately it appears there’s no solution for this that doesn’t cause unwanted side effects that may be breaking changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants