Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parser and serializer for the OWL2 Functional-style syntax #68

Merged
merged 8 commits into from
Apr 19, 2024

Conversation

althonos
Copy link
Collaborator

@althonos althonos commented Mar 1, 2024

This still needs some work, but I have most of the types in place.

  • The grammars are in a separate folder because if I get to add the Manchester parser as well, some of them can be shared between the two parsers.
  • The writer will return with an error when trying to serialize an ontology that contains more than one OntologyID component, and will not serialize DocIRI components.

To-do list:

  • Some things need to be updated in the curie crate first so that I can handle the case of default prefixes properly, I will open a PR there.
  • The round-tripping tests fail at the moment because some files contain both a prefix and a default prefix for the same IRI, I need to edit the files first.
  • Figure out what to do with empty prefixes: the way the syntax allows CURIEs and the way the curie crate treat default namespace is a bit misleading so I need to figure out how to reconcile the two. -- Abbreviated IRIs with empty prefixes are now using Curie(Some(""), local) rather than Curie(None, local); the default prefix is not used anywhere.
  • Make the parser support iteration rather than reading the full ontology, requires a bit more work because pest can only lex complete strings.

@phillord
Copy link
Owner

phillord commented Mar 5, 2024

Did you make the test files in src/ont via hand?

I realise that setting up tawny.owl and bubo for the purpose is a bit of a PITA (which is why I commited all the src/ont files to the repo!), but it is a good way of getting all the OWL files in different formats.

@althonos
Copy link
Collaborator Author

althonos commented Mar 5, 2024

I got them with robot and then formatted them by hand, yes...

@phillord
Copy link
Owner

phillord commented Mar 5, 2024

Ah, okay, generated from the RDF format or equivalent?

My inclination would be to add owl-function to the Makefile in bubo -- not because it like my code better than Robot, but it does give a single source of truth for the OWL files. But, you will lose the hand formatting, alas.

What do you think? Happy to add support for this myself.

@phillord
Copy link
Owner

phillord commented Mar 5, 2024

Hmmm. Interestingly, some of the OWL test files do not have a bubo equivalent (for example, two-class-with-some) and are not used anywhere in code (except in your OFN parser). And there are a few manual files (family and family-other) which really should be in the "manual" directory because I have generated them in an entirely different way.

It looks like I need a bit of a clean up of the test files.

@phillord
Copy link
Owner

phillord commented Mar 5, 2024

Ah, figured it out. I remember getting bored with the one-class.clj naming scheme that I introduced at some point. So, we have files like one-or.owl and or.owl which are largely identical (semantically anyway). Both what have been produced by what is now or.clj, but are a bit different because they will have used different versions of the OWL API or tawny bubo.

What is the easiest way forward? I could add a fix to devel and you could pull it across to this branch? Or just fix it here?

@althonos
Copy link
Collaborator Author

althonos commented Mar 5, 2024

Yes I generated them from the OWL/XML sources, IIRC. I'm happy if you want to generate them with anything else, you can do so in devel and I'll pull them here (I'll have to do rebase anyway to include the changes from #67 which I think should be merged first).

@phillord
Copy link
Owner

phillord commented Mar 7, 2024

I have checked now. There are actually quite a few files in OWL that do not exist as bubo files. I think that this is a bit of a mess and needs cleaning. But I don't think you need to wait for me to do that. So, feel free to merge, and I will fix the test files after.

@filippodebortoli filippodebortoli added this to the 1.0 milestone Mar 25, 2024
@althonos althonos force-pushed the ofn-parser branch 2 times, most recently from 5edc0e6 to f63d789 Compare March 28, 2024 15:27
@althonos althonos marked this pull request as ready for review March 28, 2024 15:48
@althonos
Copy link
Collaborator Author

Made a new branch from scratch based on the devel, now should compile without issue. I'll make an iterative parser when I get the time but I'd rather have this merged now to avoid another rebase hell 😅

@phillord
Copy link
Owner

Looks good.

Could you replace the test macros with the test-generator macro? It's should be quite straight-forward and will ensure that all the test files are being tested. This was why I caused you all the rebase pain -- might as well use it.

@althonos
Copy link
Collaborator Author

Done, sorry for the delay 👍

@phillord
Copy link
Owner

In your tests, how are you checking to see whether the document has been parsed semantically correctly? As far as I can see the reader tests look to see whether the reader works, and the writer checks that the roundtrip is semantically equivalent to the input. But I think that any paired errors would not be picked up.

The RDF reader compares to the XML reader, while the XML reader does hard-coded checks against known semantics (so effectively the RDF reader does as well). Could we add that as well for ofn?

@althonos
Copy link
Collaborator Author

I'm testing against the XML reader as well now. The only thing I'm not testing is the prefix mapping, because it seems to behave differently (but also in the bubo output) with regards to the # separator.

@phillord
Copy link
Owner

I think that this is fine. The prefix mapping is a bit erratic at the moment -- not helped by the inability to get this information out of quick_xml. Feel free to merge!

I am afraid this is going to break when I finished SWRL support -- I might need some help with fixing this.

@althonos
Copy link
Collaborator Author

Great, thanks! I'll have to come back to this as well to write a line-based parser when I have the time. Currently the parser can parse SWRL syntax but just ignores it, this will only need a FromPair implementation for the different SWRL types.

@althonos althonos merged commit bfa4943 into phillord:devel Apr 19, 2024
1 check passed
@phillord
Copy link
Owner

I haven't quite sorted out all the types for SWRL yet, but I think that they are close now. Still deciding what to do with Variable. Once that is done, hopefully, it should be plain sailing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants