Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better support for YAML-like syntax #645

Closed
ben-x9 opened this issue Aug 25, 2010 · 34 comments
Closed

Better support for YAML-like syntax #645

ben-x9 opened this issue Aug 25, 2010 · 34 comments

Comments

@ben-x9
Copy link

ben-x9 commented Aug 25, 2010

I love the new cleaner syntax made possible by the implicit object-literals in 0.9

Would it be possible to go a step further toward YAML and support implicit arrays, so I can do this:

states =
  step_1:
    '#attach_link'
  step_2: 
    '#attach_form'
    '#file_select'
  step_3: 
    '#attach_form'
    '#file_submit'

Instead of this:

states =
  step_1:
    '#attach_link'
  step_2: [
    '#attach_form'
    '#file_select' ]
  step_3: [
    '#attach_form'
    '#file_submit' ]
@jashkenas
Copy link
Owner

Not without changing the current semantics, because of the single-element array pathological case:

one = 
  1

Right now that compiles to: one = 1. With your proposed syntax, it would also be ambiguous with one = [1].

The reason why implicit object's don't have this problem, is because their key/value pairs makes it unambiguous as to what's going on. Closing the ticket.

@ben-x9
Copy link
Author

ben-x9 commented Aug 28, 2010

Thanks for taking the time to reply. The idea would have been that single elements are assigned as-is, and only multiple elements are assigned as an array.

Is that too far-fetched an idea? It does create a small complication when reading the resulting data; we would have to perform a check to determine whether the data is a single value or an array of values, handling the single value case directly and the array of values through a loop.

Another idea would be to implement the actual YAML syntax for arrays, explictly demarking array items with a "-".

states =
  step_1:
    - '#attach_link'
  step_2: 
    - '#attach_form'
    - '#file_select' 
  step_3: 
    - '#attach_form'
    - '#file_submit' 

This would make it easier to process the data, being able to choose whether to set the single case as an array of one, or as a directly assigned value (by omitting the dash).

I know the square bracket solution works fine, it just feels a little funny in this case, so I thought I'd put the idea of a syntax change out there.

@jashkenas
Copy link
Owner

The dash syntax would work, except for (of course) negative numbers:

negatives =
  - 1
  - 2
  - 3

@ben-x9
Copy link
Author

ben-x9 commented Aug 28, 2010

Interesting, I wonder how YAML handles this case.

I guess if there is a space between the dash and the number then it's a list item, otherwise it's a negative number.

Unfortunately JavaScript permits superfluous whitespace between the minus operator and the number.

An asterix is another commonly recognized symbol for demarking a list.

@jashkenas
Copy link
Owner

Nicely done, I think that asterisks, since they cannot be unary, might just do the trick. Reopening the ticket. To recap:

states =
  step_1:
    * '#attach_link'
  step_2: 
    * '#attach_form'
    * '#file_select' 
  step_3: 
    * '#attach_form'
    * '#file_submit' 

Edit: now you get to the real pros and cons. In a long list of objects, is having to type out 20 asterisks and 20 spaces really better than typing a single pair of brackets?

@ben-x9
Copy link
Author

ben-x9 commented Aug 28, 2010

I would argue that a combination of nested object-literals and short lists (say, from 2 to 5 elements) is probably the most common use case when structured data is being entered by hand like this. Or if not surely at least a very common use case worthy of its own syntax.

Brackets would still be available for cases where longer lists are required. I imagine these cases would be edge cases. And the asterisks really aren't that much of a pain, no more so than the commas that people type for JavaScript arrays -- if long lists were the exception rather than the rule, I'd probably still use asterisks for the sake of readability. Even with long lists, if the list is being appended to one-by-one periodically over a span of time (this commonly happens to me), I find that continually having to move the closing square bracket is more of a pain than adding the extra comma (which would be an asterisk in this case).

@satyr
Copy link
Collaborator

satyr commented Aug 28, 2010

How about instead allowing implicit array literal.

f k: v, v, v  # f({k: [v, v, v]})
x, y = y, x   # swap

@ben-x9
Copy link
Author

ben-x9 commented Aug 28, 2010

Not a bad idea, except of course that your first example would be:
f k: v, v, v #f({k: v}, v, v)

@satyr
Copy link
Collaborator

satyr commented Aug 28, 2010

f k: v, v, v #f({k: v}, v, v)

Yet the current compilation is:

f({
  k: v,
  v: v,
  v: v
});

Adjusting the object literal grammar is a requirement for this. (See #618.)

@rkh
Copy link

rkh commented Aug 29, 2010

+1 for implicit array literals. swaping is just a lot sexier.

f k: v, v, v #f({k: v}, v, v)

Makes more sense with the current semantics, I guess. since f v, v, v # f(v, v, v), not f([v, v, v]) and otherwise this would be confusing: f k: v, k: v.

@jashkenas
Copy link
Owner

I'm going to close this one as a wontfix. There are two proposals in here now, the first:

list = 
  * one
  * two
  * three

Isn't so great because it makes for more typing than a regular array literal at large sizes, not less, and can't be used on a single line, unlike regular array literals ... finally, there's the introduction of yet another meaning for a fairly normal symbol... Instead:

list = [
  one
  two
  three
]

The second proposal is to treat one, two, three as an implicit array. This doesn't work because it gets mixed up with argument and parameter lists, along with shorthand objects... call one, two, three -- is that an array getting passed to a function, or three arguments?

@ben-x9
Copy link
Author

ben-x9 commented Aug 31, 2010

Yes, you're probably right. This proposal really boils down to a mere style preference, without adding any meaningful gains. Thanks for the consideration.

@ben-x9
Copy link
Author

ben-x9 commented Aug 31, 2010

There still could be a case for implicit arrays, if they were given a 3rd order preference, behind shorthand objects and argument lists.

There is already a clash between shorthand objects and arguments lists, but shorthand objects were introduced nonetheless. Although it's potentially confusing (see my mistake 5 comments above), I'm sure the result is worth it, and that the right order of preference was made (shorthand object-literals take precedence over argument lists, which means regular arguments need to go before shorthand objects -- in effect this mimics the behavior of Python's keyword arguments.)

In the same way, where one, two, three occurs outside the context of an argument list or shorthand object-literal, it might be valuable to read it as an implicit array, rather than throwing a compiler error.

I think most of the time, the meaning will be clear. Where it is not clear, optional square brackets can be used. Consider nested function calls. Most of the time we can leave the parenthesis off. However sometimes when it is not clear which function an argument belongs to, we need to add the parenthesis:

#Parenthesis not required
myFnA argA, myFnB argB #myFnA(argA, myFnB(argB))

#Parenthesis not required (but might be a good idea to add parenthesis for clarity)
myFnA argA, myFnB argB1, argB2 #myFnA(argA, myFnB(argB1, argB2))

#Parenthesis required
myFnA argA1, myFnB(argB), argA2  #myFnA(argA1, myFnB(argB), argA2)

This is not really so different from the concept of leaving off the square brackets most of the time, where the meaning is clear, but adding them in the context of function calls and implicit object-literals.

I don't know if this is ultimately a good idea or not, but it should be worth at least considering.

@ben-x9
Copy link
Author

ben-x9 commented Aug 31, 2010

Also, correct me if I'm wrong, but isn't it true that there is rarely any reason to manually construct an array of arguments when making a function call?

"call one, two, three -- is that an array getting passed to a function, or three arguments?"

I just don't see anyone ever wanting to do this: call [one, two, three]

People are simply going to pass arguments as a list of arguments, never an array.

Where the number of arguments is undefined, we use splats. Where we've cooked up an array somewhere else that we want to pass as a list of arguments, we use fn.apply(array)

So I think the idea of a clash between argument lists and implicit arrays is a non-issue. (And obviously the parameter lists in function definitions are also very clear -- we simply don't use arrays here.)

So the only clash is in the case of implicit object-literals. Here we should use the square brackets. Anywhere else implicit arrays should work.

EDIT: In some cases people will use an array as one of the arguments in a list of arguments, but in that case it's common sense to apply square brackets to separate the array elements from the other arguments. I can't see anyone getting confused over a call to myFunc = (myVar1, myArray, myVar2)->. It's obvious they are going to use square brackets to denote the array elements, otherwise the argument list would just look silly.

And equally I can't see reason for anyone to create a function that only takes a single array as an argument, without any other parameters. If you're going to do that, use a splat.

@jashkenas
Copy link
Owner

Very nicely written up -- thanks for arguing the point. Re-opening the ticket. I'll see if I can find a way to fit it in to the grammar...

@jashkenas
Copy link
Owner

Because of the extreme ambiguity between these proposed implicit arrays and nearly every other comma-separated language feature, I don't think I'm going to be able to fit 'em in to the grammar. Open season for anyone who wants to try their hand, but closing the ticket again in the meantime...

@pschyska
Copy link

pschyska commented Dec 2, 2010

I can only second that I really like this syntax:

list = 
  * one
  * two
  * three

In this particular case it's 1 (or 4) characters more to type, but I find it alot easier on the eyes nonetheless. Also, I'm trying to write a little Ext JS app, which has lots and lots of nested literals of objects, which have array properties (for example the "items" property listing all child components of a container) wich consists of object, and so on. When you are in such code that's nested 8 levels all the opening and closing brackets and braces really make your eyes hurt. It's also easy to make mistakes there and I would love the * form. If you don't like it you can still use the common form :-)

@pschyska
Copy link

pschyska commented Dec 3, 2010

I've play a bit with the rewriter and found out, that it should be really easy to parse. Because of the MATH tokens, the items of the array are already grouped into objects. One would need to remove the MATH tokens, and wrap the expression into [].
I succeded in rewriting the following
foo:
* a: 1
* b:
* ba: 2
* bb: 3
into
foo: [
{a: 1]
{b: [
{ba: 2},
{bb: 3]
]}
]

I did not succeed, however to rewrite it when it's an assignment
foo=
* a: 1

If you decide this syntax would be acceptable, I'd share my code so someone could have a look at it. I've really done this before, so I think it should be a quick&painless thing to do for someone who has worked with the rewriter before.

Thanks,

Paul

@TrevorBurnham
Copy link
Collaborator

Wait, do you realize that the output you got from your rewriter doesn't make sense? It starts with foo: [{a: 1]... I believe the output you really want is

foo: [
  {a: 1},
  {b: [
     {ba: 2},
     {bb: 3}
  ]}
]

omitting the opening and closing curly braces, of course.

I'd like to support this feature if a fully operational patch were submitted. You can think of

list
  * 'a'
  * 'b'
  * 'c'

as shorthand for

list
  0: 'a'
  1: 'b'
  2: 'c'

except, of course, that list is then declared as an array and can't contain any entries that aren't prefixed with * (that needs to be a syntax error).

@pschyska
Copy link

pschyska commented Dec 5, 2010

Yep, wrote that from the top of my head.
Here is what really happens:
one=
root:
* a: 1
* b:
* ba: 21
* bb: 22

get's compiled to
(function() {
var one;
one = {
root: [
{
a: 1
}, {
b: [
{
ba: 21
}, {
bb: 22
}
]
}
]
};
}).call(this);

which is OK, i guess?

Have some question regarding the nesting level of the expressions. Is it somehow readable from the raw token stream? I don't get what the third element in the tokens mean, is it some kind of nesting information? Is it save to count INDENT and OUTDENT tokens, to determine the nesting level? Reason is, that i want to preprocess (wrap into [] and remove MATH tokens) only a and b in the above example in the first step.

Anyway, I pushed my work to https://github.com/pschyska/coffee-script if anyone wants to take a look.

Just noticed that the issue was closed, why is that? Did you change your mind?

Thanks,

Paul

@jashkenas
Copy link
Owner

Nope -- it must have been closed accidentally. Reopening.

@pschyska
Copy link

pschyska commented Dec 5, 2010

Hi,
I have made noticeable progress with implementing it. You can check the testcase what's working already:
https://github.com/pschyska/coffee-script/blob/master/test/test_array_shorthand.coffee

I have implemented a syntax check that throws an error when a * is omitted. Also nesting is possible.

I would appreciate comments from you guys if this is acceptable!

Sorry for the mess with the commits :-) However i just merged from upstream and there shouldn't be any merge conflicts.
The test test_ranges_slices_and_splices.coffee does not run, but I think this is unrelated to my changes.

It has been a very fun experience, I learned a lot about CoffeeScript and a bit about parsers. :-)

Thanks,

Paul

@jashkenas
Copy link
Owner

Sorry, folks, but the stars are a bit too special-case-y for me. You'd only ever need them with object literals. I know it's not quite as pretty, but i think that explicit braces are the way to go for these cases. Closing the ticket.

@delfick
Copy link

delfick commented Jan 19, 2011

Can we instead just make the closing bracket optional and use indentation to determine when the list ends ?

So for example,

lst = [
    {id:"blah", name:"blah object"}
    {id:"stuff", name:"stuff object"}
]

and

lst = [
    {id:"blah", name:"blah object"}
    {id:"stuff", name:"stuff object"}

would be equivalent

:)

@pschyska
Copy link

Still think the original proposal is better, also I don't think it's "special-casey" when you look at client side applications, it's all over the place.
Reducing the syntactic complexity is a little is still better than nothing. Having unbalanced brackets is a bad thing though, imho. Needed special handling for parens-matching rewriting step, and also is troublesome in editor support.
In this specific case one could remove the [] alltogether, and the meaning would be still clear. But this won't work for array with one object element but maybe that's not too common.
Jeremy, if you don't like the stars, maybe it could be some other symbol?

@ozra
Copy link

ozra commented Jan 27, 2011

The ending/trailing brackets, commas, tags (in html) etc. has always been the annoying part in any syntax in my opionion - that's why significant space and indentation rules!

I propose explicit starts like the asterisk thing above, with a symbol sequence that make sense and don't create ambigiousity problems:

list = [:
    {: id: 1, name: "One liner"
    {:
        id: 2
        name: "Multi liner"
        subList: [:
            'just on array item'
    {:          
        id: 3
        name: "Yet a multi liner"
    {:id: 4
        name: "Yet a multi liner, different formatting"
    {:id: 5
        name: "And a last one before hitting kicksville"
        is: 'last item'

That is:

  • {: means 'begin object, to be closed by DEDENT'
  • [: means 'begin array, to be closed by DEDENT'

This is much better than unbalancing standard brackets. It's explicit and still clean.
The colon mirrors pythonish block declaration. I first tinkered with {>, [> and {=, [=. Any character that never follow { or [ would be a candidate i guess.

I like the original proposal too. But would really like the explicit start-obj/arr construct for stacking up hash-objs in a list.

In accordance with orginial proposal, implicit array with explicit objs as items:

list =
    {:id: 1, name: "One liner"
    {:id: 2
        name: "Multi liner"
    {:id: 3
        name: "Yet a multi liner"

Insted of:
list = [
{id: 1, name: "One liner"}
{
id: 2
name: "Multi liner"
}
{
id: 3
name: "Yet a multi liner"
}
]

I like to be able to structure data/blocks in a visually - at a glance - logical way. I don't need infinte implicism like I don't need a headache from ugly trailing brackets.

Agreed, the colon does compete visually with the key:value colon, so maybe another character.

@mikemaccana
Copy link

What about a simpler syntax, using indentation, that covers the case of a single object?

listofobjectliterals = [
  ,
    foo:'bar'
    baz:'zam'
  ,
    zoo:'zig'
]

The leading comma indicates the next element in the list. Weirdly enough, this is what I tried (and just assumed it would work) because it felt logical to me. What are your thoughts?

@ghost
Copy link

ghost commented Dec 10, 2011

I kinda think it is doable if you are trying to solve it instead of open, close, open, close...

YAML has the same vision as CoffeeScript but for Markup:

http://nodeca.github.com/js-yaml/
http://nodeca.github.com/js-yaml/#yaml=b2JqZWN0OgogIGNvbG9yczoKICAgIHJlZDogIjkwJSIKICAgIGJsdWU6ICIxMCUiCiAgcGVvcGxlOgogICAgLSAiQW1hbmRhIgogICAgLSAiRXJpYyIKCmFycmF5OgogIC0gbmFtZTogImJsb2NrIgogICAgYWdlOiAyNAogIC0ge25hbWU6ICJpbmxpbmUiLCBhZ2U6IDI0fQ==
http://nodeca.github.com/js-yaml/#yaml=cmVjZWlwdDogICAgIE96LVdhcmUgUHVyY2hhc2UgSW52b2ljZQpkYXRlOiAgICAgICAgMjAwNy0wOC0wNgpjdXN0b21lcjoKICAgIGdpdmVuOiAgIERvcm90aHkKICAgIGZhbWlseTogIEdhbGUKCml0ZW1zOgogICAgLSBwYXJ0X25vOiAgIEE0Nzg2CiAgICAgIGRlc2NyaXA6ICAgV2F0ZXIgQnVja2V0IChGaWxsZWQpCiAgICAgIHByaWNlOiAgICAgMS40NwogICAgICBxdWFudGl0eTogIDQKCiAgICAtIHBhcnRfbm86ICAgRTE2MjgKICAgICAgZGVzY3JpcDogICBIaWdoIEhlZWxlZCAiUnVieSIgU2xpcHBlcnMKICAgICAgc2l6ZTogICAgICA4CiAgICAgIHByaWNlOiAgICAgMTAwLjI3CiAgICAgIHF1YW50aXR5OiAgMQoKYmlsbC10bzogICZpZDAwMQogICAgc3RyZWV0OiB8CiAgICAgICAgICAgIDEyMyBUb3JuYWRvIEFsbGV5CiAgICAgICAgICAgIFN1aXRlIDE2CiAgICBjaXR5OiAgIEVhc3QgQ2VudGVydmlsbGUKICAgIHN0YXRlOiAgS1MKCnNoaXAtdG86ICAqaWQwMDEKCnNwZWNpYWxEZWxpdmVyeTogPgogICAgRm9sbG93IHRoZSBZZWxsb3cgQnJpY2sKICAgIFJvYWQgdG8gdGhlIEVtZXJhbGQgQ2l0eS4KICAgIFBheSBubyBhdHRlbnRpb24gdG8gdGhlCiAgICBtYW4gYmVoaW5kIHRoZSBjdXJ0YWluLg==

I think keeping it close to YAML syntax is a win-win for both CS and YAML. YAML is really beautiful and easy for the eyes just like CS. Also, it's a superset of JSON.

No need to reinvent the wheel. Perhaps YAML has already solved it.

@protometa
Copy link

What if optionally colons without keys were used to denote indexed items in arrays/objects?

For example:

arr = 
   : 2
   : -1
   : "str"

or array of objects/arrays:

arr =
   :
      foo: "bar"
      zip: "zap"
   :
      : 1
      : 2
      : 3

It's somewhat intuitive considering how arrays/objects work in javascript. Colons without keys would just be shorthand for assigning indices.

You could also mix indices and keys (maybe?):

obj = 
   foo: "bar"
   key: "val"
   : "indexed prop"

obj[0] is "indexed prop"

I'm very new to coffeescript. If I'm missing something obvious, I apologize. I love coffeescript, but I hate the array brackets. :/

@Soviut
Copy link

Soviut commented Oct 29, 2012

That actually doesn't seem like a bad syntax.

@protometa
Copy link

It is an intriguing data format...

@Soviut
Copy link

Soviut commented Oct 29, 2012

I do agree that something has to be done to represent nested objects and lists better. This feels pretty good because it follows the javascript conventions.

@vendethiel
Copy link
Collaborator

"Sorry, folks, but the stars are a bit too special-case-y for me. You'd only ever need them with object literals. I know it's not quite as pretty, but i think that explicit braces are the way to go for these cases. Closing the ticket"

See #1872

@protometa
Copy link

The idea is that colons might be considered less special-case-y than stars. I think I would prefer this syntax over stars or even yaml's solution.

...sorry I meant colons this whole time. edited. ;P

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests