diff --git a/text/json-ast.md b/text/json-ast.md new file mode 100644 index 0000000..c551222 --- /dev/null +++ b/text/json-ast.md @@ -0,0 +1,277 @@ +# Adding JSON AST Standard + +** By: Matthew de Detrich, json4s organisation and Ivan Porto Carrero (original author) ** + +This proposal is an attempt to add a json-ast library for Scala. + +## History + +| Date | Version | +|---------------|-------------------------| +| Nov 4th 2015 | Initial Draft | +| Feb 30th 2016 | Scala Json Module Draft | +| Apr 18th 2016 | Final Draft of Scala AST| + +## EDIT - Apr 18th 2016 +The Scala JSON AST located at https://github.com/mdedetrich/scala-json-ast has a final draft +released. Apart from a larger suite of tests, the major changes are listed below + +- `JNumber` now stores a number as a String, which means it has proper unlimited precision. The +number is verified at runtime with a regex to make sure its a valid JSON number. Its up to the user +to determine how they should convert the number (i.e. if they convert it to a `Double`, they may get +some precision loss) + - This should also address concerns about certain operations taking too long due to the underlying + type previously being a `BigDecimal`) + - Equals and hashCode has also been implemented, so that `JNumber("34")` == `JNumber("34.00")`. Many + thanks to @IChoran for his help in this regard +- `scala.json.ast.safe.JValue` has been moved to `scala.json.ast.JValue` and `scala.json.fast.JValue` +has been moved to `scala.json.unsafe.JValue`. This has been done to improve messaging about the intent +of the libraries. +- Lot more tests have been added, as well as tests for Scala.js. Scala.js can also have some benchmarks, +however I have to set up suite for that (its not really trivial, especially for different Javascript platforms) +- Travis has been set up properly in regards to testing +- Documentation in general has been improved and updated + +## EDIT - Feb 30th 2016 +json4s-ast has now been repackaged so it uses the `scala.json` namespace. Its under a different +git repository located at https://github.com/mdedetrich/scala-json-ast. + +## Introduction to json-ast + +Scala currently is in a bit of a quagmire when it comes to json libraries. +There are roughly 6 competing JSON ast libraries, all of them are slight +variations and they all attempt to solve 2 problems (one is speed, often +used for parsing and the other is safe immutable representation, +which is often used for quering). Here is a good +[link](http://manuel.bernhardt.io/2015/11/06/a-quick-tour-of-json-libraries-in-scala/) +which shows how absurd the current situation is, the amount of duplication is huge + +Some time ago, casualjim (Ivan Porto Carrero) made an attempt to +create a json ast library (called json4s-ast) +in collaboration with the major scala webframeworks at +the time, to try and create a unified API. casualjim however has moved on, +and so the project has stagnated + +Recently I have reinstigated the project, which is currently hosted at +[json4s-ast] (https://github.com/json4s/json4s-ast). Most of the communication +has been done over the gitter channel, however we have had collaboration +from many of the developers/creators of popular Scala libraries, including +* Spray +* jawn +* rapture +* Scalatra +* Lift +* Play +* SBT + +The issue presents itself more in frameworks/libraries that have to deal with JSON. +As a quick example, [slick-pg](https://github.com/tminglei/slick-pg), an extension ontop +of slick for Postgres extensions, needs to provide implements of the JSON type for each of +the commonly used JSON libraries, instead of just needing to support one + +Conversation regarding the collaboration can be viewed by reading the gitter +[history](https://gitter.im/json4s/json4s). + +### Examples + +Examples can be currently seen at +the json4s-ast [repo](https://github.com/json4s/json4s-ast). + +### Counter-Examples + +Current json4s-ast is strictly a library that only provides an AST. It isn't designed to +address parsing/construction/querying. This is up to library/application developers. +The central problem being solved, is that regardless of where you get a `JValue`, you will +always be able to interopt `JValue` with other libraries. + +## Drawbacks + +JSON is still a rapidly moving field, and there are developments (such as BSON from mongodb) +which may be incompatible with json4s-ast (someone may need to verify this?). + +There are currently drawbacks in regards to deciding whether to use `Array` or `js.Array` +for Scala.js implementation. Using `Array` maintains type equality over Scala/Scala.js, however +it will have less performance. Using `js.Array` maintains source equality over Scala/Scala.js (which +isn't as good as type equality), however has better performance + +The `fast` version is unsafe (however this is intentional in design). It can blow up, at +runtime. + +Trying to cement a universal type (in this case, essentially the `String` version of `JSON`), +poses risks if its not done correctly and/or if its not what people need/want. Don't need +a situation occuring similar to what happened with regards to Java time libraries (i.e. +[jodatime](http://www.joda.org/joda-time/)). + +`Vector` may not be the best representation for the `safe` JArray, however this is what was decided +by for the community. + +`safe` is currently very strict about only adhering to the JSON spec, i.e. it uses `Map` for JObject +which means it doesn't have a concept of ordering. Although this is correct as per json [spec](http://www.json.org/), +it does create practical issues. As an example, a past [issue](https://github.com/json4s/json4s-ast/issues/8) would have +been requiring ordering on a `JObject` so you can deterministically create a cheksum (in order to verify if 2 `JSON` +objects are equal). With a representation of `Map` for `JObject`, it would not have been easily possible to solve this. +This was solved by providing the alternate `fast` API (uses `Array` which has a concept of ordering), however issues +like this can crop up + +## Alternatives + +There are a huge mumber of json ast alternative libraries, this includes +* [spray-json](https://github.com/spray/spray-json) +* [json4s old version](https://github.com/json4s/json4s) +* [jawn](https://github.com/non/jawn) +* [lift-json](https://github.com/lift/lift/tree/master/framework/lift-base/lift-json/) +* [argonaut](https://github.com/argonaut-io/argonaut) +* [rojoma](https://github.com/rjmac/rojoma-json) + + +## Design + +For greater in-depth reasoning for the design goals, you can read the +[README.md](https://github.com/json4s/json4s-ast/blob/master/README.md) +on the project site + +The general consensus was, it wasn't really ideal to create a single unified API +that made everyone happy. People had different goals for a JSON AST library. Some wanted +the highest performance possible, and others wanted a library that was correct and safe. +Some people also wanted features that weren't strictly part of the JSON spec, but incredibly useful +(i.e. ordering for a JSON Object) and some people wanted to minimize memory usage +(important in big data) + +Due to this reason, the current design of the JSON AST has 2 implements, one called +`fast` and the other called `safe`. `fast` is designed to be as fast as possible, and hence +uses many mutable datastructures (such as `Array` for JSON Array/Object, JSON Number uses a +String to avoid runtime penalty). Safe, on the other hand, represents an always correct +representation of JSON. It is also safe in regards to performance +characteristics for lookups (i.e. JSON Array is represented as a `Vector`, which provides near +constant lookup time, and JSON Object is represented as a `Map`, which also provides near +constant lookup time for keys, as well as providing `Map` equality regardless of ordering) + +There are also conversion functions between the two libraries, allowing you to +easily go from `safe` to `fast` and vice versa. + +An easy way to think about the difference between `safe` and `fast`, is that `safe` is your +`String` where as fast is your `Array[Byte]`. Both are equally important, and both are +needed for different reasons. + +### Footprint + +The footprint json4s-ast was intentionally designed to be as small as possible. The current +spec is trivial in design (and only a page long for each implementation, i.e. +[safe implementation for jvm](https://github.com/json4s/json4s-ast/blob/master/jvm/src/main/scala/org/json4s/ast/safe/JValue.scala) + +### Scala.js Support + +The current version of json4s-ast has support for Scala.js, which also +has exposed constructors for use in Javascript. There are some slight differences +(such as using `js.Array` instead of `Array` for performance reasons, note that +this isn't final) + +### Implementation + +There is good reasons (either way) about whether the json ast should be included in +`stdlib` or as a separate supported module. The current json4s-ast has been deliberately +designed to have a strict implementation that should almost never change which gives +support for adding it into `stdlib`. It also has (strictly) zero dependencies, and the +implementation is really trivial + +The only issue regarding implementation would be how to treat Scala.js support (ideally +the Scala.js section should be moved to the actual Scala.js implementation at [Scala.js] +(https://github.com/scala-js/scala-js), and the official Scala stdlib will only hold the +JVM implementation + +The current state of the project is its a SNAPSHOT, this is for various reasons +* We are looking for a way to benchmark on Scala.js (this is to help in gauging whether to +use `js.Array` or `Array` for Scala.js implementation). Note that this isn't related, at all, +to the JVM implementation. +* Need to do some benchmarking to verify that the current conversion methods (`toSafe` and `toFast`) +provide the fastest possible implementations (they currently use builders) +* More feedback from community that its correct + +Regarding timeframe, since a current Scala JSON standard doesn't officially exist, it can +be added at any time. The library is written with idiomatic Scala, so its unlikely to break +(even between major Scala releases). The current json4s-ast implementation supports `Scala` +2.10 and 2.11. There should be no problems in supporting 2.12. + +Current developer/contributor list can be seen at Contributers/Developers can +be seen [here](https://github.com/json4s/json4s-ast/blob/master/build.sbt#L41-L115) and +implementation can be seen [here] (https://github.com/json4s/json4s-ast) + +### Unresolved questions + +As per the current [issues](https://github.com/json4s/json4s-ast/issues), there are some +minor questions about whether we should provide converters for common types (specifically +numbers). These currently exist, however there is an argument about whether to remove this from +the current implementation to reduce its footprint. +There are also questions about the `js.Array`/`Array` problem for Scala.js. + +Since [scala-offheap](https://github.com/densh/scala-offheap) just came out, there is a +question whether the `fast` implementation should use scala-offheap (which would improve +performance). However this would break the current strict zero dependency claim and +scala-offheap is fairly new (there are also questions about whether it will exist in +future versions of the JVM) + +For `fast`, should we make the internal `Array` private (i.e. unable to be modified once you construct +the `JArray`/`JObject`). `Array` itself is mutable, and fast is designed to provide speed, but there are +good arguments to make it private. + +## References - Quotes from [gitter](https://gitter.im/json4s/json4s) +Note that these quotes are to demonstrate adoption/collaboration. If I have quoted out of context, please let me know, also if you want a quote +to be removed/clarified + +@jroper (Play) +> Let me just say - from my point of view, if json4s AST gets released as it is currently in the reboot branch, I would be happy. and, if it gets released with some of the changes that are being mentioned (not just my own), I would be happy. We would seek to adopt it in Play (the question would be when, not if). Right now I think this is about iterating over an optimal solution to try and find a minimum that satisfies as many people as possible - ie, we’re close enough, let’s see if we can get closer. + +@farmdawgnation (Lift) +>Anyway, @mdedetrich, thanks for the work so far on this. I think there’s a lot of value in unifying AST’s. Can’t guarantee we’ll adopt it for lift-json, but it would be nice to know as a Scala developer that I could use any of the Big JSON Libs and they can interoperate without too much work. + The current status quo has (unfortunately) bit me in the butt a few times. + +@rossabaker (json4s/http4s) +>Two ASTs sounds complicated, but I guess it makes nobody happy to have a lack of safety paired with middling performance. + +> I think there's a lot of room for disagreements and innovation in codecs and traversals and DSLs and which sum types get returned. That's why there are so many JSON projects today. + There seems to be a pretty near consensus on how to map the JSON AST to Scala case classes. If this minimal AST is of any use to getting projects to standardize on that before adding their own library around it, cool. + If not, the proposal is at least a much better foundation for a json4s reboot. + +@JRudolf (Spray/Akka) +> @non yes, of course, it makes sense to keep additional features per library. AFAIU the goal is to find a common denominator between json libraries that is more structured than "just Scala". On top of that diversity has its advantages. + +@non (jawn) +> @jrudolph so -- just to be clear, jawn will definitely support parsing to this AST, but i'm not necessarily committing to removing my own mutable AST. + +@sirthias (Spray) +> I imagine that once we have this common AST and the AST really is minimal in what it provides (as it should be) end-users will turn to convenience tools that provide XPath-like querying, lenses, (de)serialisation, etc. + +> Well, @mdedetrich took this on. And @non, @eed3si9n, @bryce-anderson, myself and others merely chipped in. + +> Gentlemen, + I’ve compiled a quick benchmark to compare the new AST proposals with what we currently have in spray-json and other JSON libs. + The interesting questions for me are: Does an Array-based “basic” AST really yield better parsing performance over what we have now? How does it compare to a Vector-based “basic” AST? + +@propensive (Rapture) +> Again, I don't have much to constrain me (and hence to contribute to the discussion). However the AST ends up, I'll have a typeclass layer, and that should be quite trivial to implement. +> Agreed. If the AST changes more frequently than that, it undermines some of its purpose anyway... +> It wouldn't be a terrible thing to separate the JSON4S AST code into a minimal library. But there's more of a community/political job of getting other libraries to start using it... + +@eed3si9n (SBT) +> @rossabaker yes. I was able to pull out json4s-core dependency after some copy-pasta from ParserUtil. I'd like to have binary-compatible long term AST project that encodes version number in the package name. + +> it's the number 1 concern for me. that's why I use jawn + but in the case of sbt, i don't think we deal with numbers much + +> I'd actually say port all that good stuff from Yoshida-san into the new json4s-ast project and start over + +> right. lean AST jar that's versioned should allow sbt plugins to use whatever version of json4s + +## References + +1. [Existing project (json4s-ast)][1] +2. [Issues][2] +3. [Gitter Channel][3] +4. [Casualjim][4] +5. [JSON organisation][5] + +[1]: https://github.com/json4s/json4s-ast +[2]: https://github.com/json4s/json4s-ast/issues "Issues" +[3]: https://gitter.im/json4s/json4s +[4]: https://github.com/casualjim +[5]: https://github.com/json4s