-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: resolve XSD aliasing and parse missed elements #643
Conversation
…nto xml_namespaces
The original script wrapped every element inside a choice block; however - as this is only a single element - the choice block has essentially no effect.
Unfortunately, we didn't have the resources to review this PR in October or November and it looks like we we may not be able to in the coming couple of months. My apologies for the delay, but I wanted to let you know it's still in our radar. |
@FirefoxMetzger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also working on a Python script for the conversion and took the opportunity to review your code, as it already familiarized me with SDF -> XSD conversion, but your approach is more promising than my simple Ruby -> Python conversion. 😃
cmd_arg_parser.add_argument( | ||
"-i", | ||
"--in", | ||
dest="source", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe sdf_file
would be more suitable as variable name? In the case of source
it is not clear whether it is a file, a path or a directory name.
raise RuntimeError(f"Unknown simple type: {in_type}") from None | ||
|
||
|
||
# collect existing namespaces |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better put all declarations above and the instruction code into a
if __name__ == "__main__":
cmd_arg_parser = argparse.ArgumentParser(
description="Create an XML schema from a SDFormat file."
)
...
xsd_schema = setup_schema(used_ns, use_default_ns=False)
xsd_schema.append(element)
with open(out_dir / (source.stem + ".xsd"), "wb") as out_file:
ElementTree.ElementTree(xsd_schema).write(out_file, encoding="UTF-8", xml_declaration=True)
as this will increase the readability (and allows to reuse your classes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Bi0T1N Hmm, I think this is mainly a stylistic choice.
The if __name__ == "__main__"
guard is useful if your have double-barreled files that - in parallel - function as both a script and as a module. Here, the purpose is to be a script only (there is no intention towards reusing the classes/functions elsewhere), so I chose not to use the guard. In my experience, that leads to cleaner scripts, because you can write (and debug) them sequentially rather than having to jump around the file, because classes live at the top and execution logic at the bottom.
el.set("type", f"{other_file.stem}:{name}Type") | ||
|
||
# TODO: remove this line | ||
declared.setdefault("imports", set()).add(other_file.stem) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove it? declared
variable isn't used. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It used to be the variable in which the imported namespaces were collected, but that changed with a refactor. This comment is mainly a reminder to clean up once there is general approval on the approach this script takes and we are moving towards an actual merge.
"*": ("0", "unbounded"), | ||
"-1": ("0", "0"), | ||
} | ||
min_occurs, max_occurs = required_codes[self.element.attrib["required"]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This already shows up above so make it a function to reduce code duplication
else: | ||
# XSD 1.1 allows maxOccurs="unbound" within xs:all, so | ||
# we could drop this entire else block if we could upgrade | ||
# to XSD 1.1. (xmllint only supports XSD 1.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't find an issue for it on GitLab, just an older issue in the closed bug tracker. So maybe report it and hope for the best?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't an issue with XSD but rather an issue with sdformat. XSD v1.0 (the schema version used here) is limited in that it doesn't allow us to represent the true capabilities of SDF.
In SDF, child elements may appear 0, 1, or infinitely many times depending on the element and - this is what kills XSD 1.0 - may do so in any order. In XSD you may choose exactly one element from a set of elements (xs:choice
), you may specify a block in which elements may occur in any order and with a specified (and finite) min/max count for each element (xs:all
), or you may specify a block in which elements must occur in the given order but may do so the given (potentially infinite) number of times. Neither of these choices fits.
The way the ruby generator solved this is by aggregating all valid children into a xs:choice
element which may repeat 0 to inf many times. This works for elements with infinite counts but doesn't enforce maxOccurs for elements that may only occur once. If we, thus, validate SDF files with the resulting schema, it may produce false negatives by saying that a SDF is valid even though a unique element occurs twice.
The way the commented out code solves this is by grouping elements into a group with finite count (<=1
) and a group with infinite count. It then inserts the xs:choice
block discussed above (for infinite-count elements only) between every two fine-count elements. It then does this for every permutation of finite-count elements (because in SDF elements may occur in any order, but we need to validate that unique elements don't reoccur). This will result in XSD that doesn't produce above false negatives, but it comes at the cost of massive schemata (GBs in magnitude).
So for now I just do the same here as the ruby script (accept that the result produces false negatives), because we can't change how SDF works, and we also can't bump the XSD version without prior feedback from one of the maintainers.
In XSD 1.1 this is less of a problem because above mentioned xs:all
block allows elements with infinite counts. This means that we can simply use an xs:all
block instead of a xs:choice
block and be happy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanation. 👍
My comment was just about xmllint only supports XSD 1.0
which is probably a reason why we can't simply use XSD 1.1 as there is no tool for linting. An option might be to only use XSD 1.1 if there is no other choice (= you need XSD 1.1 to represent it properly) and otherwise stick with XSD 1.0 - I don't know if mixing the two is even supported.
|
||
def setup_schema(used_ns: list, use_default_ns: bool = True) -> ElementTree.Element: | ||
xsd_schema = ElementTree.Element(_to_qname("xs:schema")) | ||
xsd_schema.set("version", "1.1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be "1.0"
? From above:
# XSD 1.1 allows maxOccurs="unbound" within xs:all, so
# we could drop this entire else block if we could upgrade
# to XSD 1.1. (xmllint only supports XSD 1.0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure but I guess <?xml version='1.0' encoding='UTF-8'?>
is what you meant with your comment by not supporting XSD 1.1 and thus we use 1.0
?
However, the file generated from sensor.sdf
contains
<?xml version='1.0' encoding='UTF-8'?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" version="1.1" ...
but wouldn't it make more sense that the version number is the version of the SDF file (directory)? So that following command would set version to 1.8
:
python3 tools/xmlschema.py -i sensor.sdf -o /ws/sdformat/test_output -s /ws/sdformat/sdf/1.8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be "1.0"?
You are right; it should be 1.0 here.
but wouldn't it make more sense that the version number is the version of the SDF file (directory)?
No, because its references to different things. The first line (the XML header) specifies the version of the XML in which the schema file is written. The second line (the XSD namespace) specified the version of XSD to use when interpreting this schema file.
To specify the SDFormat version to use when validating one would (traditionally) set the version in the namespace (here is an example). I recall that there was an issue in this repo where this feature was requested before, and I am in favor of doing so. This would, however, need prior consent from one of the maintainers, and - because this is a new feature instead of a bugfix - I left it out of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue you mentioned might be #545 - something which should be fixed as all validation is done with the files on the website that might not represent the SDF version of your file.
xsd_schema = setup_schema(used_ns, use_default_ns=False) | ||
xsd_schema.append(element) | ||
with open(out_dir / (source.stem + ".xsd"), "wb") as out_file: | ||
ElementTree.ElementTree(xsd_schema).write(out_file, encoding="UTF-8", xml_declaration=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The biggest problem I see is that the generated files are not readable by a human because the formatting is bad.
However, I also started to work on a Python version but gave it up after I found your PR because your approach is much better than mine.
To fix the problem with the unformatted output I used
from xml.dom import minidom
with open(out_dir / (source.stem + "TypeF.xsd"), "w") as out_file:
xmlstr = minidom.parseString(ElementTree.tostring(xsd_schema, xml_declaration=True, encoding='utf-8')).toprettyxml(indent=" ", encoding='utf-8').decode("utf-8")
# replacing of < and > is needed for <![CDATA[text]]> but you don't use it
#xmlstr = xmlstr.replace("<", "<")
#xmlstr = xmlstr.replace(">", ">")
# fixing text like 'The "ray", "gpu_ray", and "gps" types are...' after formatting
xmlstr = xmlstr.replace(""", '"')
out_file.write(xmlstr)
to format the XML of Python's xml.etree.ElementTree
. As mentioned in the formatting part your code doesn't use <![CDATA[text]]>
while the current files created with the Ruby script do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also not sure if <xs:documentation xml:lang="en">
is required instead of <xs:documentation>
since the language will always be English?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The biggest problem I see is that the generated files are not readable by a human because the formatting is bad.
Yes, we can definitely make the files more pretty. Once this gets picked up by one of the maintainers, I will have a closer look into that.
As mentioned in the formatting part your code doesn't use while the current files created with the Ruby script do.
Yes, CDATA
is one way of escaping the comments. It has been a while since I've written the initial code, but I remember looking into it and deciding that we can also not have it here ... I have forgotten the reason for this, though, and I could have been wrong.
I'm also not sure if <xs:documentation xml:lang="en"> is required instead of xs:documentation since the language will always be English?
Both options should work fine so I'm happy either way.
Thanks for the review @Bi0T1N . I won't get around to looking at this for another couple of days, because I have a deadline coming up and am tied up elsewhere. That said, I have a more up-to-date version of this script that lives here. I never pushed the changes upstream, because it uses features from a more recent version of python (dataclasses), and schema generation currently doesn't seem to have high priority here (which is completely fine by me). You might also be interested in the generated schemata (currently up to v1.8). |
Hi @FirefoxMetzger , unfortunately we haven't had time to go over this PR yet. Edifice ( |
Co-authored-by: Bi0T1N <[email protected]>
@chapulina Sure that can be done; however, I don't feel very motivated to add more work to this PR just for it to keep sitting around waiting for a reviewer. Once someone from the team has the time to review and we can resolve the action items that need a decision for this PR I'm happy to pick it up again. For reference these are the open points:
|
That's very fair 👍🏽 We'll let you know when we can put some time into this. Thanks for the patience |
If we found:
There is a recent-ish list here of options: |
Maybe. #1377 is also another direction--basically rename the tags that are causing problems. |
🦟 Bug fix
Fixes #642 #637 #633 #632
Summary
This PR adds a new generator for XSD from the current SDFormat. It is currently set up to convert SDFormat v1.8, but can easily be backported to the other versions by updating the corresponding CMakeLists.
It fixes multiple issues with the old parser:
types:type
intypes.xsd
from double to string and add the respective restriction (see: naming conflicts in generated schema files (xsd) #632)It also handles nested models and nested model states. This was apparently a blocking issue in previous attempts to refactor this script. I didn't encounter this problem directly; however, I had to guess the desired meaning of the
ref=X
attribute used in this case. I left a comment in the appropriate place.On a high level, the generator is based around named types. It creates two files for each filename.sdf: one filename.xsd and one filenameType.xsd (the actual workhorse), e.g.
model.xsd
andmodelType.xsd
. FilenameType.xsd holds the complex type used by the root element of filename.sdf and filename.xsd is a shim to help parsers to make sense of partial sdf.Checklist
codecheck
passed (See contributing)Note to maintainers: Remember to use Squash-Merge