-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What XML parsing library to use for parsing OVAL files #7108
Comments
For potentially big datasets like this I would prefer using streams: all DOM-based solutions are bound to explode sooner or later as it's just a matter of number of objects in the end. Even though StAX is not as convenient as the other ones I would go with it as it is a tradeoff between the plain (efficient) SAX and easy (hungry) DOM APIs. I'm not sure we have to care about the execution time, but the memory consumption can bite us hard. |
@cbosdo Thank you for your valuable input. Right now, I'm convinced of using StAX because as you said, with DOM, it's a matter of the number of objects. It's no longer a question of whether it will explode, but rather when it will explode. I will keep the issue open for a few more days in case someone has more input. About the execution time, it takes around 6 seconds on my computer to parse a 250MB OVAL file. Assuming the OVAL data will be synced once a day, I think it's negligible. This is the DOM parser though, StAX should be slightly faster. |
Decision: Use StAX because it provides a middle ground between the memory-hungry/easy-to-use DOM APIs and the efficient/complicated SAX. |
Question
I'm working on the OVAL consumption GSoC project and I need to select a library to parse OVAL files (which are essentially XML files). I wrote an ADR to compare the XML parsing libraries used in Uyuni and based on the result of the comparison, I find JAXB to be the most suitable for the project use case. However, I would like to hear your input on JAXB and whether you think I should use another library.
My biggest concern with JAXB is memory consumption. Unlike StAx parsers, where we can read one XML element at a time and write it to the database directly, with JAXB, we need to wait for it to parse the whole file and then we can store it in the database. Keep in mind that the space allocated by JAXB will be released once the parsed object is written to the database.
openSUSE OVAL files allocate around 250 MB at a maximum. I created a PoC app that parses an OVAL file of 284 MB with JAXB and the app took 517 MB of heap memory at peak time.
The text was updated successfully, but these errors were encountered: