Feature: Append new submission data by chunks to the triple store #122
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is an optimization PR.
Currently, in the parsing process after the RDF generation step, we do a "delete and append to triple store"
In the append triples step, we transform the XRDF to Turtle in a temporary file
Then we do a single "post" request to the triple store containing the turtle file as the request body.
The issue is when we have a big file (>= 1GB) (like in our use case here ontoportal-lirmm/ontologies_linked_data#15) is not an efficient way to submit all the content file in a unique HTTP request
The PR changes the function
append_triples_no_bnodes
to do the append by chunks of 500 000 lines (triples) per requestWith the use case of TAXREF-LD
Before the change we had
After the change, it worked and we have the following benchmark
Objects Freed: 572924847
Time: 734.6 seconds
Memory usage: 618.36 MB (Before the memory usage was dependent and equal to the append Turtle version of the file, now it will never exceed 700MB)
Reference : https://tjay.dev/howto-working-efficiently-with-large-files-in-ruby/