-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Task] Improve datatype mapping from RDF/XSD to JSON #175
Comments
WG discussion 2022-09-08: Topic large values for unbounded numeric values:
Topic INF/-INF:
Resolution: Postpone decision for now, gather more input |
In admin-shell-io/aas-specs#236 the statement is that also for xs:long Number is not a valid mapping but should be mapped to string as well. |
See here for arguments from https://github.com/mristin why to use ONLY string in JSON serializations: https://github.com/admin-shell-io/aas-specs/blob/9c342305f04ecb35baa050a17cad6928ba0ba519/schemas/json/README.md |
For xsd:base64Binary please add that it is not just mapped to string but additionally encoded with base64. |
@atextor (I'm the author of the comments on admin-shell-io/aas-specs#236.) I'm sharing here a couple of more edge cases that you might want to consider. There must be more, but those were apparent when I looked into the XSD specification.
A couple more remarks:
A note about JSON with regard to ECMA: In Section 6 "Numbers" of RFC 8259, they are not specific about the number limits:
They do note, however, that IEEE 745 is widespread and should be considered:
If you read from JSON, than you have to be careful! From Section 6 "Numbers" of RFC 8259:
As far as I know, you can not easily check if you loose precision when you parse a number from a string. For example, C# and Python give you infinity if the number is too large for IEEE 754 double, but you obtain only a rounded number if it is too precise. Example in Python: >>> import json
>>> json.loads(
... '1e123456789123456789123456789123456789123456789'
... )
inf Note: >>> json.loads(
... '123456789123456789123456789123456789123456789'
... )
123456789123456789123456789123456789123456789 The precision is silently lost in Python (notice the loss of precision after roughly 17 decimal points): import json
>>> "{:.50f}".format(
... json.loads("123456789123456789123456789123456789123456789"))
'0.12345678912345678379658409085095627233386039733887'
# No exception This is expected as IEEE 754 can only represent up to 17 decimal points exactly. However, you are not notified by the library that your JSON had a higher precision. Here is a similar example with large numbers (notice again the loss of precision roughly after 17th digit): >>> "{:.50f}".format(
... json.loads(
... "123456789123456789123456789123456789123456789"
... )
... )
'123456789123456789439311560846449175093575680.00000000000000000000000000000000000000000000000000' You have to check the behavior language by language and library by library. |
@atextor I just remembered one more point against Namely, when you use reflection-based JSON libraries to parse the data, you can not distinguish between properties whose values were set to While in many cases this does not matter, there are scenarios where you want to distinguish the two. Here's an admittedly constructed example so you get an idea. In some situations the measurement was not performed at all (e.g., the sensor was turned off -> the property not defined). In other situations the sensor was turned on, but could not precisely measure (the property set to I don't know if this projects well to your use case, but it is definitely a good test to check if there are no such distinctions in the semantics in your model. Of course, you can always use additional properties, but having a |
Hi @mristin, thank you for your valuable input! I think the most important question needs to be clarified first: In short, the payloads should consider the following:
That being said, we should of course try to cover as much ground as possible. In particular, we must make sure to not have undefined behaviour (i.e., if we do end up with different value ranges compared to XSD, they must be properly documented), but IMO it's not our duty to solve JSON's shortcomings at all costs. This might warrant some more discussion.
Is there an agreed-upon solution for this other than "always send this particular numeric property as a string"?
I think we must not confuse a value with its lexical representation. The values of xsd:int "1" or xsd:int "01" are identical and can both be represented in JSON by
Birgit and you are correct and this is an imporant omission; xsd:long should be handled in the same way as xsd:integer etc.
If the model specifies xsd:float as a type, this defines the contract: The Aspect sending data should also adhere to the contract and should not send data with an allegedly higher precision. Neither sender (Aspect) nor may (client) should assume anything else.
This is easily solved on the model level and needs no solution on the level of data serialization. You even mentioned it yourself: If the range specified in the model is "up to 1000°C" then every value larger than that, including INF, is by definition an error. This means that the Aspect must not send such an error because it would otherwise violate the contract. If it still does, it is undefined behaviour and the client program should definitely not ring the alarm bell.
This should be adressed (see my question above) but not because of the lexical representations.
Yes, there is no standard serialization, but we could define it like this for Aspect payloads. This is what I meant with "cover as much ground as possible": We can easily define
Since one of the main places where Aspect data is consumed is in web apps, we can certainly not ignore JavaScript/ECMA script. Known limitations of the JSON spec itself, but also of well-known languages and frameworks that will likely be used in consuming the data (this includes Python of course), should be taken into account if it is reasonable.
Yes, this differs from language to language; this is to be expected with IEEE 754. This is why you would use xsd:decimal instead of xsd:float/xsd:double if you require such precision; in the sds-sdk where Java code is generated for properties with such types, the type
This is why the Aspect payload mapping forbids
This could (and should!) also be cleanly modeled in the Aspect model instead of trying to cram everything into the JSON serialization, for example using an Entity in combination with a corresponding enumeration of the possible states: :sensorValue a bamm:Property ;
bamm:characteristic [
a bamm:SingleEntity ;
bamm:dataType :SensorValue ;
] .
:SensorValue a bamm:Entity ;
bamm:properties ( :sensorState [ bamm:property :sensorMeasurement ; bamm:optional true ] ) .
:sensorState a bamm:Property ;
bamm:characteristic [
a bamm-c:Enumeration ;
bamm:dataType :SensorState ;
bamm-c:values ( :SensorStateOffline :SensorStateMeasurementFailure :SensorStateMeasurementSuccess ) ;
] .
:SensorState a bamm:Entity ;
bamm:properties ( :stateCode [ bamm:property :stateDescription ; bamm:notInPayload true ] ) .
:stateCode a bamm:Property ;
bamm:characteristic [
a bamm-c:Code ;
bamm:dataType xsd:string ;
] .
:stateDescription a bamm:Property ;
bamm:characteristic bamm-c:Text .
:SensorStateOffline a :SensorState ;
:stateCode "OFFLINE" ;
:stateDescription "The sensor is offline" .
:SensorStateMeasurementFailure a :SensorState ;
:stateCode "FAILURE" ;
:stateDescription "The sensor is online, but reading a measurement failed" .
:SensorStateMeasurementSuccess a :SensorState ;
:stateCode "SUCCESS" ;
:stateDescription "The sensor is online and reading a measurement succeeded" .
:sensorMeasurement a bamm:Property ;
bamm:dataType xsd:float ;
bamm:characteristic [
a bamm-c:Measurement ;
bamm-c:unit unit:degreeCelsius ;
] . And then the valid payloads would look like the following: {
"sensorValue": {
"sensorState": { "stateCode": "OFFLINE" }
}
} or {
"sensorValue": {
"sensorState": { "stateCode": "FAILURE" }
} or {
"sensorValue": {
"sensorState": { "stateCode": "SUCCESS" },
"sensorMeasurement": "23.5"
}
} Making intention and context semantics explicit is better than encoding as |
@atextor thanks for the replies! Just a small clarification: I didn't mean to ignore JavaScript in general; just that it is a poor choice for a north star of the design. Your solution needs to live at the intersection of all major languages -- if you only considered JavaScript, you would miss a lot of intricacies and issues with other languages. After all that said, I'd advise you to take a step back and think about the data types you use in your model in a more holistic manner. They are the very foundation -- thus the typing system needs to be a solid one. Instead of making a Frankenstein where you insist on XML data types, but then introduce tons of unnecessary leaky abstractions depending on which serialization is used, I'd recommend you to determine the core set of primitive types that you need, and support only that set. That way you will avoid the problems with leaky abstractions, but also keep round-trips sane, which is super important for testing (fuzzing) & correctness (static and runtime invariants). The core set of types would be also much easier to digest and reason about, avoiding the confusion for the reader. |
P.S. sorry, I forgot to clarify the point related to floats and doubles. Please take into account that the chain The problem is exacerbated in the chain |
Is your task related to a problem? Please describe.
The section Payloads of the specification texts describes how values with the datatypes defined in an Aspect Model are to be serialized in JSON in an Aspect's payload. The mapping generally works as follows:
xsd:boolean
is turned to JSON boolean, numeric types are turned to JSON numbers and everything else is turned to JSON string.This mapping however can be problematic due to a loss of information: In particular, large values for unbounded numeric types such as
xsd:decimal
,xsd:integer
andxsd:positiveInteger
can be problematic. Although JSON by definition does not limit numbers' length (see the "number" production rule in the JSON's grammar in the spec), effective use is limited by the limits imposed by ECMA script. This is also referred to in the Data type mappings subsection of SAMM's Payload section.Furthermore, the Payload mapping must specify how "special" numeric values Inf, -Inf and NaN are handled, that are valid for
xsd:float
andxsd:double
, but can not be represented in JSON.Describe the solution you'd like
The Payload mapping should be changed so that the JSON data type that is used to represent values for the unbounded XSD types is string instead of number. This way any large number can be represented without loss of information, while keeping the "principle of least suprise" for users by sticking to native number types for
xsd:int
,xsd:long
,xsd:short
as well asxsd:boolean
.Regarding "special" numeric values, the JSON number type should be kept for
xsd:float
andxsd:double
values to not add major inconveniences (and inconsistency with existing REST APIs) to handle these seldomly occuring corner cases. I would propose to definenull
to stand forNaN
(as is customary in purely JSON-based applications) and to no try to fix JSON's shortcomings regarding infinity values (i.e. define that those values can not be represented in an Aspect payload).The text was updated successfully, but these errors were encountered: