RDF Parser considered harmful (or at least the iteration part is really ugly) #98

phillord · 2024-05-31T16:38:07Z

The RDF parser makes multiple passes through both bnode and simple triples many times. And each time
it does so the code is repeated -- something like this.

for (k, v) in std::mem::take(&mut self.bnode) {
            match v.as_slice() {
                [[_, Term::RDF(VRDF::First), val],
                 [_, Term::RDF(VRDF::Rest), Term::BNode(bnode_id)],
                 // Some sequences have a Type List, some do not
                 ..
                ] => {
                    let some_seq = self.bnode_seq.remove(bnode_id);
                    if let Some(mut seq) = some_seq {
                        seq.push(val.clone());
                        self.bnode_seq.insert(k.clone(), seq);
                        extended = true;
                    } else {
                        self.bnode.insert(k, v);
                    }
                }
                _ => {
                    self.bnode.insert(k, v);
                }
            };
        }

I think we can replace this whole thing with a partition_map (or a
map followed by a partition which would avoid bringing in itertools as a dependency).

Partition map looks like this:

partition_map(Iterator<A>, fn -> Either<L, R>) -> (Iterator<L>, Iterator<R>)

L would be the thing (Atom, Axiom what ever)
R would be Result<A>

L would then be added to whatever,
R would be added back to where ever thing in self it came from in the first place

Would have then:

let (axiom, triples) = triples.partition_map(
   |t|

   match &t {
      [blah, blah, blah] => {
          // some_ok! would need to return Left or Right(Err), which should be okay
          some_ok!(Left(Axiom(blah, blah, blah))
      }
      None => {
          // Can we take the t here? 
          Ok(Right(&t))
      }
   }
)

self.axioms.extend(axiom);
// We would have to clone the triple here because we only have a reference
self.triples.extend(triples.map(|e| e?.clone()));

// or if we want to capture errors rather than crap out which is something
// we can't do at the moment
let triples, errors = self.triples.partition(is_okay)
self.triples.extend(triples)
self.errors.extend(errors)

which is rather neater than what we have at the moment

The text was updated successfully, but these errors were encountered:

filippodebortoli · 2024-06-04T10:36:21Z

To get myself acquainted with the rdf::reader module, I am currently refactoring a few bits that could be replaced by e.g. deriving traits or using methods already implemented in vocab (see #97 and #99).

I agree with your observation.
I am also wondering if the code could be refactored to leverage the fact that the triple parser is streaming, by keeping a HashMap of triples sets indexed by subject and deserializing whenever the set contains e.g. all the triples necessary to instantiate a declaration.

phillord · 2024-06-04T12:03:29Z

https://github.com/phillord/horned-owl/tree/feature/rdf-with-partition-cleanup

I've started looking at this. There are definately some cleanups that I can do. The idea of partition_map may or may not work; so far it makes some things simpler, but a lot of loops are modifying other data structures by side-effect. I think I will re-write a few more functions so I'm clear whether it makes sense or not.

The problem with streaming, is that there are many places where you have to parse the whole stream first, to make the interpretation of later triples. This is, for example, why I parse all declarations (in the entire import closure) first, because many of the axiom patterns depend on understanding whether for example, the kind of a property. So I suspect that streaming RDF not going to bring a lot of benefits.

This was linked to pull requests Jun 11, 2024

RDF reader: Deduplicates code for fetching and processing of triple sequences associated to a BNode. #102

Draft

Lookup for vocab terms without hashmap in RDF reader #99

Merged

filippodebortoli closed this as completed in #99 Jun 11, 2024

filippodebortoli reopened this Jun 12, 2024

filippodebortoli mentioned this issue Jun 12, 2024

Examples with parsed ontology #104

Open

filippodebortoli added enhancement New feature or request API This concerns horned as a library refactor labels Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RDF Parser considered harmful (or at least the iteration part is really ugly) #98

RDF Parser considered harmful (or at least the iteration part is really ugly) #98

phillord commented May 31, 2024

filippodebortoli commented Jun 4, 2024

phillord commented Jun 4, 2024

RDF Parser considered harmful (or at least the iteration part is really ugly) #98

RDF Parser considered harmful (or at least the iteration part is really ugly) #98

Comments

phillord commented May 31, 2024

filippodebortoli commented Jun 4, 2024

phillord commented Jun 4, 2024