Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add syntax for tuples and re-enforce homogenous arrays. #154

Closed
wants to merge 1 commit into from

Conversation

mojombo
Copy link
Member

@mojombo mojombo commented Mar 1, 2013

Hurray for tuples! This PR also brings back homogenous arrays, which I would prefer, and the addition of tuples solves any decent use cases for mixed types.

(1, "tom", "staff")  # Perhaps some UNIX account info.
(2.48, 9.37, 28.81)  # Or maybe x/y/z coordinates.

@BurntSushi
Copy link
Member

Awesome. 👍

One clarification I'd like to suggest: are tuples smaller than length 2 allowed? I propose that they not to be allowed. Tuples of length 1 carry no extra information than a bare value, and tuples of length 0 are a bit weird. (A unit type.)

Also, I think it might be worth adding in invalid example based on the size of tuples:

[ (1, 2), (1, 2, 3) ]    # NOPE

Which shows that tuples have length encoded into their type.

@tnm
Copy link
Contributor

tnm commented Mar 1, 2013

I'd probably not allow tuples of length 1 for sanity reasons, although I suspect implementations would resolve the tuples fine.

That said, incidentally, Python (which has first-class tuples) resolves an (apparently) syntactic 1-tuple (which is in fact, not a tuple) like this —

>>> (1,2).__class__
<type 'tuple'>

>>> (1).__class__
<type 'int'>

>>> ("math").__class__
<type 'str'> 

@rossipedia
Copy link
Contributor

That's because that's not a tuple actually. In Python (1) is a parenthesized expression evaluating to the expression contained in the parentheses. If you want a single element tuple, you need (1,):

>>> (1).__class__
<type 'int'>
>>> (1,).__class__
<type 'tuple'>

There are advantages to allowing single-length tuples, the same as there are for single-length arrays, mainly that in code you can treat the configuration value as a sequence, regardless of how many elements it has

@tnm
Copy link
Contributor

tnm commented Mar 1, 2013

@rossipedia — Aye, that's what I was getting at regarding the sanity reasons (clarified the comment).

@rossipedia
Copy link
Contributor

I'm not sure I understand what sanity reasons you mean. I'd think that requiring the type of a configuration element to change based on how many elements it has would make code working with that configuration element more complicated, and that's a Bad Thing™.

@tnm
Copy link
Contributor

tnm commented Mar 1, 2013

Hm. Yeah the more I think about it, I suppose I don't see the harm in unit tuples, although I don't really see much usefulness in the config format. We might need to clarify the actual text since we say "They are represented by a comma separated list inside of parentheses."

@BurntSushi
Copy link
Member

Don't forget about the empty tuple (). Personally, I think it's simpler to just say, "Tuples of length less than 2 are not allowed." But I truthfully don't have a strong opinion either way---a clarification is sufficient IMO.

@rossipedia
Copy link
Contributor

Well, since this isn't actually an execution language, we don't need to worry about ( and ) denoting expressions, we can just go ahead and reserve them for tuples, with items separated by commas (with whitespace/eol and trailing comma ala arrays allowed).

+1

@pygy
Copy link
Contributor

pygy commented Mar 2, 2013

Also, I think it might be worth adding in invalid example based on the size of tuples:

[ (1, 2), (1, 2, 3) ]    # NOPE

Which shows that tuples have length encoded into their type.

In some languages, the type of the type of a tuple depends not only on the number of elements, but als on their types. For example, in Julia:

julia> isa((1,"e",3.4), (Int64,ASCIIString,Float64))
true

julia> isa((1,"e",3.4), (Any, Any, Any))
true

julia> isa((1,"e",3.4), Tuple)
true

As you can see, it offers some leeway, and it also does for arrays:

julia> [(1,"e",3.4)]
1-element (Int64,ASCIIString,Float64) Array:
 (1,"e",3.4)

julia> ar = Array((Any,Any,Any),0)
0-element (Any,Any,Any) Array

julia> push!(ar,(2,"ER",4.5))
1-element (Any,Any,Any) Array:
 (2,"ER",4.5)

How strict do you want to be regarding type homogeneity?

@BurntSushi
Copy link
Member

@pygy Indeed. This is clarified in the commit provided by @mojombo :

[ (1, "red"), (2.1, 5.9) ] # NOPE

See #131 for more details. Maybe the spec should be clearer, but I thought the above example was enough.

How strict do you want to be regarding type homogeneity?

The point is to make arrays completely homogeneous with respect to the type of value it contains. The point of adding tuples is to provide a way to create arrays with non-homogeneous data. i.e., an array of (Int, String). The point of doing all of this is to provide a nice way to write well-typed structured data in TOML that is easily handled by a wide variety of languages, particularly of the strong and static variety.

Note that TOML has no explicit Any type, although a pure polymorphic type is implied with the use of [] with arrays as defined in this PR.

But TOML doesn't have to be strict like this for all types. Of particular note are anonymous hashes in #50.

@pygy
Copy link
Contributor

pygy commented Mar 4, 2013

Indeed, I had missed that part.

@pygy
Copy link
Contributor

pygy commented Mar 4, 2013

Should tuples behave like arrays regarding white space and comments?

@BurntSushi
Copy link
Member

@pygy Absolutely.

@ambv
Copy link
Contributor

ambv commented Mar 8, 2013

Yay for yet another type in the minimal language spec. </sarcasm>

  1. Basically you want homogeneous lists.
  2. Then you realize there are valid use cases for heterogeneous lists.
  3. You add a new type to a minimal and obvious language. Wat?

Don't add tuples, this complicates things. Instead, minimize. Proclaim that homogeneous lists is a thing which application should enforce, not TOML. Parsers can implement it as a "strict" variant, whatever.

Tuples complicate things because you have to decide on all sorts of corner cases which will confuse the users:

  1. Is [("a", [1, 2]), ("b", ["c", "d"])] valid?
  2. Is [("a", [1, 2]), ("b", [1])] valid?
  3. What if a user converts a list to a tuple? Havoc and breakage? E.g. explain to people why (1, 2, 3) is incompatible with [1, 2, 3].
  4. Can a tuple hold a tuple? If so, do we check type compatibility recursively or stay flat and ignore what's inside? This is what TOML currently suggests for lists of lists.
  5. I could go on.

And there's this peculiar idea to forbid tuples shorter than 2 elements. This makes auto-generating values tricky and will confuse users. There is also no need for it since parentheses aren't valid in other contexts. And people will hack around it by using lists instead. But they won't be able to since a list is homogeneous. Not before long you'll start seeing [[1], ["tom"], ["staff"]] in the wild as a way to circumvent type checks.

I would minimize instead. Keep it simple. And obvious.

@pygy
Copy link
Contributor

pygy commented Mar 8, 2013

I agree with @ambv on this one.

Config files are unlikely to be ported as-is from app to app (and thus from language to language), and type strictness will confuse the least technically inclined, and annoy others.

@BurntSushi
Copy link
Member

Don't add tuples, this complicates things. Instead, minimize. Proclaim that homogeneous lists is a thing which application should enforce, not TOML. Parsers can implement it as a "strict" variant, whatever.

Then those parsers are not compliant with the spec.

Tuples complicate things because you have to decide on all sorts of corner cases which will confuse the users:

None of the things you listed are "corner" cases. 1 is invalid because the types of tuples are different. 2 is valid because the length of a list does not affect its type. I don't understand 3; tuples and arrays are two different kinds of data. The whole point is that they serve two different purposes: arrays for homogeneous data and tuples for heterogeneous data. 4 is clarified in this proposal; tuples are ordered types, which means they are typed by the order of their component types.

And there's this peculiar idea to forbid tuples shorter than 2 elements. This makes auto-generating values tricky and will confuse users. There is also no need for it since parentheses aren't valid in other contexts. And people will hack around it by using lists instead.

Tuples shorter than 2 elements don't have to be banned, but I suggested it because they are peculiar things. And there's no reason to hack around such things. Tuples of 1 element have the same utility as just the bare element from the point of view of the type.

Not before long you'll start seeing [[1], ["tom"], ["staff"]] in the wild as a way to circumvent type checks.

Did you read this proposal? That is allowed in the spec right now. If this proposal is accepted, then that thing won't be allowed. The whole point of this proposal is to type arrays by the type of value they contain. Similarly for tuples.

You talk about confusing the user, but such things can be avoided by a parser that gives helpful error messages. A parser that doesn't give helpful error messages will confuse the user regardless of the spec.

Moreover, your comment doesn't really address the problem trying to be solved by this proposal: provide a way to write well-typed structured data and to allow static languages to easily play along. Making static languages use parsers that only support fully homogeneous arrays makes them non-compliant with the spec and strictly less useful, since tuples won't exist as a way to make heterogeneous data. This is consistent with one of the objectives of the spec:

TOML should be easy to parse into data structures in a wide variety of languages.

@ambv
Copy link
Contributor

ambv commented Mar 8, 2013

Then those parsers are not compliant with the spec.

TOML is a configuration file format. Your application will be using it to hold domain-specific information. What I'm saying is that array homogeneity is a domain-specific need. A parser might provide a strict option which enables that. Just as you'd validate whether TCP ports are between 1 - 65535. Such validation doesn't make your application not compliant to the TOML spec.

The whole point of this proposal is to type arrays by the type of value they contain.

Yes, I get that. The confusion I referred to is user-side, precisely because you need to inform non-programmers that there's a difference between [] and (). This is a valid question to address: "My deployment failed because I used the wrong kind of parentheses. Why couldn't you make it so that there's only one kind?"

Moreover, your comment doesn't really address the problem trying to be solved by this proposal: [...] to allow static languages to easily play along.

I must have failed to read the proposal because there's nothing in it that suggests that. It would also help to explicitly name those languages. Do those languages provide a parser for JSON or XML? If a language can specify a tuple, it can also specify a non-typed array.

@BurntSushi
Copy link
Member

Yes, I get that. The confusion I referred to is user-side, precisely because you need to inform non-programmers that there's a difference between [] and (). This is a valid question to address: "My deployment failed because I used the wrong kind of parentheses. Why couldn't you make it so that there's only one kind?"

I agree that's a valid question. Truthfully, I don't know whether people will be confused by such things or not. As I said, one hopes that if your application needs to support such users, then it will have appropriate error messages.

I must have failed to read the proposal because there's nothing in it that suggests that. It would also help to explicitly name those languages.

Any static and strongly typed language.

Do those languages provide a parser for JSON or XML? If a language can specify a tuple, it can also specify a non-typed array.

Of course they do. Look at the TOML implementation list right now. There are loads of parsers for strong and static languages. I didn't say that static languages can't handle TOML without this proposal; I said it would be easier.

I will also re-emphasize that it is nice to have well-typed structured data in a configuration file. You may claim that this leads to user confusion, but it can also lead to preventing the user from typing malformed data. (This can be provided by the application, as you say, but I think it is a worthy enough goal for it to be included in the spec itself.)

@dahu
Copy link

dahu commented Mar 9, 2013

I agree with ambv here. Adding tuples is a sign that the original decision of enforcing homogenous arrays was wrong. I agree that this should be left as an application check after the toml file has been parsed. If your static language has difficulty parsing a mixed type array then it will be equally difficult to parse a tuple. This is a non-argument. Reduce the complexity in this minimal data interchange format by losing the tuple idea and allowing heterogenous arrays.

@BurntSushi
Copy link
Member

If your static language has difficulty parsing a mixed type array then it will be equally difficult to parse a tuple.

Not at all. Static languages can represent tuples as an appropriate type (like, say, any particular construction of a product type), which is typically distinct from an array. It's not a non-argument, because an assumption of homogeneity or heterogeneity buys you stuff in a static language. It allows the programmer to choose an appropriate data type to represent the TOML data. If TOML only provides heterogeneous arrays, then you never get that homogeneity assumption which restricts your choices in a static language.

The idea here is to push those assumptions into the spec. The result does not benefit dynamic languages, but it does not harm them either. (e.g., A dynamic language with heterogeneous arrays could represent arrays and tuples in TOML in precisely the same way.) The result does benefit static languages. It can also benefit the user by catching malformed data. And as ambv pointed out, it can also detract from the user by having both the [] and () syntactic categories.

@dahu
Copy link

dahu commented Mar 9, 2013

And my argument is, the decision to represent something as an array should be made by the application. Parse it as your most forgiving type, and allow it to be cast out as the more restrictive type on processing. Your language's toml parser can have convenient methods to make this simple for the app dev. The application says: I want this section of data to be a homogenous array of values, toml-parser. Make it so. The parser slurps up the data permissively and then the non-parsing side of the tom-parser (the rendere... i'm lacking terminology here) re-casts the data as the type requested. My description here is vague and hand-wavy because I haven't fully considered the call interface. However, I still believe it is better to reduce the file-format complexity and make your tools smarter. For languages that natively support mixed type arrays, they have less smarts they need to build into their toml-parsers. I guess you could still implement the type-checking API into all toml-parsers, if you wanted to.

@BurntSushi
Copy link
Member

My description here is vague and hand-wavy because I haven't fully considered the call interface.

I know how to do what you ask. In fact, I've already done it for the current spec. It works great. (It's a decent demonstration of Go's reflection facilities IMO.)

But this isn't a win-win situation. Allowing the user to enforce homogeneous arrays means they lose out on the ability for mixed data in a well-typed manner. It's all still doable, but like I've said, inconvenient and possibly less safe depending on the language used.

However, I still believe it is better to reduce the file-format complexity and make your tools smarter.

I like simplicity. I've been an advocate for it on this issue tracker. But I also like safety and convenience.

For languages that natively support mixed type arrays, they have less smarts they need to build into their toml-parsers. I guess you could still implement the type-checking API into all toml-parsers, if you wanted to.

The type checking does add a bit more complexity to the parser. But not too much IMO. It took me a couple extra hours to add it in on a separate branch. (But I had these grand plans from the beginning.)

@dahu
Copy link

dahu commented Mar 9, 2013

You misunderstood one piece of my thinking - I wasn't suggesting that homogeneity be an all or nothing affair. I was suggesting that, per key, the app dev could get the toml-parser to validate that a collection was indeed homogenous and return it in the most efficient data structure (for the language/situation) accordingly.

hand waving ahead:

config = toml.parse(file)
config['ports'].validate_as('int').to_array

or however it is you might achieve that in the real world.

@BurntSushi
Copy link
Member

You misunderstood one piece of my thinking - I wasn't suggesting that homogeneity be an all or nothing affair. I was suggesting that, per key, the app dev could get the toml-parser to validate that a collection was indeed homogenous and return it in the most efficient data structure (for the language/situation) accordingly.

I didn't misunderstand. That's precisely how my parser operates. :-)

config['ports'].validate_as('int').to_array

This is exactly the kind of thing I'd classify as inconvenient. Instead of type safety being baked into a parser (one-time effort), the type safety has to be redone in every client use of it.

We are having the classic argument of where safety should live. I'm advocating for pushing some of it into the spec. It also makes working in a static language more convenient.

@dahu
Copy link

dahu commented Mar 9, 2013

But the problem with pushing it into the spec is that it complicates it and therefore the config files written in it. Those files can be touched by non-coders. My thinking is, leave those files as simple and intuitive as possible. Give your toml-parser an API that makes coercion and validation as simple as possible. Your coder knows about these things and is the right person for owning this responsibility.

@BurntSushi
Copy link
Member

We are in agreement that adding another syntactic category will add complexity. But it must be evaluated as a trade off. The pros are more safety for all parsers, more safety against users typing malformed data and more convenience in static languages. The cons are more complexity in the spec/parser and user confusion.

@dahu
Copy link

dahu commented Mar 9, 2013

I really don't see the convenience argument. Surely it's okay for the app dev to explicitly enquire/demand of the toml-parser that a set of keys be homogenous. That homogeneity is an aspect of his specific application. Indeed, it is an aspect that may change over time. Version 2.0 might see the need to make some of those collections heterogenous.

I look at this as a pyramid. Parser writers are generally more fastidious than app devs and they in turn more so than users. Put the responsibility of making the parser flexible, smart and a pleasure to use on the parser writers. Put the responsibility of ensuring type correctness and data validation on the app dev. Let the user blunder along with good error messages to guide them to safety and correctness.

@BurntSushi
Copy link
Member

I really don't see the convenience argument.

Imagine that TOML had only three data types: hashes, arrays and strings. Do you see how it is convenient to add integers, floats, bools and datetimes?

In the same sense, but not the same magnitude, having real arrays and tuples is more convenient. It pushes type information and safety into the spec, and therefore doesn't require the client to have to verify the types themselves.

I look at this as a pyramid. Parser writers are generally more fastidious than app devs and they in turn more so than users. Put the responsibility of making the parser flexible, smart and a pleasure to use on the parser writers. Put the responsibility of ensuring type correctness and data validation on the app dev. Let the user blunder along with good error messages to guide them to safety and correctness.

But you're missing trade-offs. Taken to the extreme, your only primitive data type in TOML would be a string. TOML includes some types because it moves safety and convenience into the parser.

@dahu
Copy link

dahu commented Mar 9, 2013

I remain unconvinced by this argument, and you're straw-manning by suggesting those extremes. I was not suggesting that we remove types from the toml spec. I was suggesting that we don't add the tuple type. All up, I favour a single heterogenous array type. I think we've articulated our opinions and perspectives well enough here for now. Let's see how it turns out.

@BurntSushi
Copy link
Member

and you're straw-manning by suggesting those extremes.

No. I'm not misrepresenting your argument. I'm merely trying to show that the decision is about a trade-off of safety, convenience and complexity, and not one of some pyramid of responsibility. Because invariably, the spec takes responsibility for some types. (Implying that app writers don't have full responsibility of types.)

@pygy
Copy link
Contributor

pygy commented Mar 9, 2013

Edit: This was written offline (I'm on the road), and sent before I could check if it was still relevant,... and I don't have the time to read the whole thread right now. Sorry for the noise if it has been covered meanwhile.

Moreover, your comment doesn't really address the problem trying to be solved by this proposal: provide a way to write well-typed structured data and to allow static languages to easily play along.

On the other hand, enforcing type homogeneity in a dynamic language means
that you have to write a whole type checker.

I will also re-emphasize that it is nice to have well-typed structured data in a configuration file. You may claim that this leads to user confusion, but it can also lead to preventing the user from typing malformed data.

The real solution to this kind of problem is a schema validator. Type
stricness still allows to fudge the configuration. I've added a lightweight
proposal in #116, assuming no type checking in arrays.

Rather than type strictness, I'd require compliant parsers to support some
form of schema validation (to be defined).

-- Pierre-Yves

@dahu
Copy link

dahu commented Mar 9, 2013

@pygy, I had considered a schema validator but thought it might be laughed off. I prefer this approach. hetero arrays (and no tuples) with validating schemas (permissive, as described in #116 so that non-specified members are valid by default)

@BurntSushi
Copy link
Member

@pygy - The type checker is more work, but as I mentioned, I don't think it's much more work. It took me an hour or two to add myself (but I had grand plans from the beginning).

The real solution to this kind of problem is a schema validator. Type
stricness still allows to fudge the configuration. I've added a lightweight
proposal in #116, assuming no type checking in arrays.

I think schema validation is a great idea. But as I've mentioned, sometimes safety is worth pushing into the spec. Also, enforcing homogeneity in arrays without tuples is much less expressive (which is why tuples go hand-in-hand with homogeneous arrays).

@dahu
Copy link

dahu commented Mar 10, 2013

With a schema validator, how about we lose arrays altogether and just have tuples?

@ambv
Copy link
Contributor

ambv commented Mar 10, 2013

@dahu, and call them arrays? Voila! ;-)

@dahu
Copy link

dahu commented Mar 10, 2013

You got the sarcasm there then. ;-)

@Ghoughpteighbteau
Copy link

Personally I fall on the side of strongly typed languages. I feel the comparison between tuples and arrays to be laughable.

Here are two things to keep in mind. @dahu is right, there is definitely some enforcement that is the applications responsibility. At the same time, these kinds of explicit structures prevent accidental input errors, and I can back this up with a real world example.

StarSector uses json to define it's ship files. Here's an example:

  "bounds": [
    -60,26,
    -60,-26,
    -14,-31,
    -2,-45,
    40,-46,
    57,-1,
    59,17,
    47,40,
    35,34,
    -49,35
  ],

Before the tools were developed to place these bounds with a GUI, someone was damn fool enough to do this:

  "bounds": [
    -33.0,
    15.0,
    -15.0,
    -2.0,
    3.0,
    43.0,
    43.0,
    9.0,
    -35.0,
    -34.0,
    -18.0,
    -21.0,
    -34.0,
    -14.0,
    -17.0,
    -29.0,
    -40.0,
    -26.0,
    -26.0,
    15.0,
    19.0,
    23.0,
    15.0,
    13.0,
    -2.0,
    -4.0
  ],

Which of course lead to the worlds first trans-dimentional clam!

trans-dimensional clam

For the record, this error went undetected for 8 months.

Two insertion errors blew things up in an undetectable way. So why didn't the author of the ship format enforce something like this?:

  "bounds": [
    [-33.0,15.0],
    [-15.0,-2.0],
    [3.0,43.0],
    [43.0,9.0],
    [-35.0,-34.0],
    [-18.0,-21.0],
    [-34.0,-14.0],
    [-17.0,-29.0],
    [-40.0,-26.0],
    [-26.0,15.0],
    [19.0,23.0],
    [15.0,13.0],
    [-2.0,-4.0]
  ],

because he wrote starsector in java of course! (the tragedy keeps on coming right?) His parser simply didn't provide him an easy way to translate those arrays into simple points. So he did what was simplest for him.

The proposal has it's advantages, most notably in how the markup would look:

   bounds = [(-33.0,15.0)
            ,(-15.0,-2.0)
            ,(3.0,43.0)
            ,(43.0,9.0)
            ,(-35.0,-34.0)
            ,(-18.0,-21.0)
            ,(-34.0,-14.0)
            ,(-17.0,-29.0)
            ,(-40.0,-26.0)
            ,(-26.0,15.0)
            ,(19.0,23.0)
            ,(15.0,13.0)
            ,(-2.0,-4.0)
            ]

And how simply it would parse into native structures or pojo's.

@dahu
Copy link

dahu commented Mar 10, 2013

Interesting example, @Ghoughpteighbteau :-)

However, nothing here convinces me that we can't just keep a single simple syntax in the toml files, and provide the app dev with tools necessary to get where she wants to be.

So, consider the equivalent data structure:

bounds = [[-33.0,15.0]
,[-15.0,-2.0]
,[3.0,43.0]
,[43.0,9.0]
,[-35.0,-34.0]
,[-18.0,-21.0]
,[-34.0,-14.0]
,[-17.0,-29.0]
,[-40.0,-26.0]
,[-26.0,15.0]
,[19.0,23.0]
,[15.0,13.0]
,[-2.0,-4.0]
]

With an appropriate schema validator that ensured it was an array of 2-value arrays of floats. Or however you want to describe that. Imaginably, the schema might even have specs in it to control the casting... No, I don't like that, on reflection. That would not be as language neutral. Issues of casting should reside in the toml-parser for each language. Dynamic languages may not need any casting at all whereas the strongly typed languages might need a bit of a helping hand. So, to my original thinking:

Let's assume Java (but that is not a strength, so I will hand-wave). The toml-parser there would slurp up (internally) the toml file into an array-of-arrays (or whatever best suits Java as the internal representation - perhaps that's a tuple type?) while checking the validating schema for correctness. Then the app-dev says, give me the bounds as an array of Points (or whatever actual type they should be).

I believe it's the job of the toml-parser and the app-dev to cast the parsed form into the required form. At the file-parsing face, the toml-parser for all implementations would look fairly similar. It's only at the app-dev facing side of the toml-parser that extra casting code would be required for strongly typed languages.

@BurntSushi
Copy link
Member

With an appropriate schema validator that ensured it was an array of 2-value arrays of floats.

The utility of schema validators is not in question. Everyone knows that a schema can be used to artificially restrict values based on a number of criteria. Integer/float/datetime ranges, enumerations, array lengths, sum types, etc. The list goes on.

Saying that "it can be pushed into a schema validator" isn't relevant here. What's relevant is the balance we all want to strike between complexity, safety and convenience in the specification.

I believe it's the job of the toml-parser and the app-dev to cast the parsed form into the required form. At the file-parsing face, the toml-parser for all implementations would look fairly similar. It's only at the app-dev facing side of the toml-parser that extra casting code would be required for strongly typed languages.

There's more to the story, since you support a variety of types included in the TOML specification, which means that the client of a TOML parser is not completely responsible for casting types. The parser takes some responsibility from the spec.

At this point, the typing implications of this proposal have been made clear. I believe that further discussion should be based on the trade offs that I and others have described. The ones that I can currently think of are:

  1. How much will the user be confused by having both [] and () syntax? (Will users of TOML data be able to distinguish two syntactic categories based on the type of their data?)
  2. How much of a burden is it on parser writers to implement the proposal? (A bit more advanced type checker is required to make sure arrays and tuples are well-typed. This is mostly due to the fact this proposal makes arrays and tuples composite types.)
  3. How much convenience do clients of TOML parsers get by expecting structured well-typed data? (e.g., Avoiding supremely general data types in static languages, since static languages usually do not support native heterogeneous arrays.)
  4. How much benefit do users get by being warned of typing malformed data by definition of TOML as opposed to a particular client deciding to be nice to their user?

Some of these points have already been discussed, and I'm sure there are others I've missed. But I believe an evaluation of these trade offs is the appropriate way to decide whether this proposal should be accepted or not. Therefore, the discussion should be focused on those points. Things like "it can be in the schema validator" are responses to any proposal that adds safety or convenience via types into the specification, including already existing types.

@pygy
Copy link
Contributor

pygy commented Mar 11, 2013

I just realized that parser have to handle mixed content in hashes, so having mixed type arrays isn't much more works.

The point of schema validators is that they allow to enforce type strictness, if that's your thing, with much more control than what the TOML spec provides or could provide if this proposal was accepted (because you have domain specific knowledge), and a whole bunch of other things, like value constrains.

@BurntSushi
Copy link
Member

I just realized that parser have to handle mixed content in hashes, so having mixed type arrays isn't much more works.

They are indeed cumbersome in a static language, but it's worth it. Homogeneous maps put severe restrictions on the ability to express data concisely.

@Ghoughpteighbteau
Copy link

  1. I think there is a percentage that will be confused, but syntax is pretty easy to glean from seeing it used elsewhere. This is only an issue when authoring new configuration files, not editing existing ones.

  2. I don't know. I haven't written one, so I'd love for someone else to fill in.

  3. Can't really say, myself.

  4. Depends on how expressive and explicit it is. The thing about configuration files is that they get manipulated allot, by non technical people. Programmers use a toml parser are likely to give generic errors

    (this config file is malformed... somewhere...)

while the toml parsers are likely to give explicit errors accompanied by line numbers

 (Unexpected '[' on line 34, are you missing a comma?.) 

(element 5 '("linear", 50, 57,0)' at line 37 does not match expected pattern (String, Integer, Integer).)  

This is a bit off topic, but I think there is an underlying worry that toml will not be able to express some concepts if array's are not heterogeneous. I understand that worry, but I think all use-cases for that are covered here: #153

@RichardHightower
Copy link

#213

I guess I will add my comment here.

Add to table, int, float, date, one more type... tuple.

You have table which is a k/v map.
You have array which all items have to be on type.

Add a list.

(1, "foo", 1979-05-27T07:32:00Z)

Sometimes things make sense in tuples. Disparate types are good for expressing many concepts.

Boon will have the 6th TOML implementation for Java.
I am writing a lot of config files, and I find JSON aggravating.

And I agree YAML has jumped the shark.

Off topic:
Boon will have tuple.
I tend to marshal JSON arrays instead of JSON objects to reduce the footprint of the JSON feed, which matters when you have a 10,000,000 user app.
I can see using toml in places where I normally might use JSON, not just config.

So in short... I agree you need Tuple. I really like that arrays are homogeneous. I also really like that my browser has spell correct because apparently I do not know how to spell homogeneous.

http://rick-hightower.blogspot.com/2014/04/toml-what-if-plist-json-and-windows-ini.html

@mojombo
Copy link
Member Author

mojombo commented Jul 16, 2014

Closing in favor of Inline Tables. Check them out on #235.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants