What XML parsing library to use for parsing OVAL files #7108

HoussemNasri · 2023-06-07T14:17:42Z

Question

I'm working on the OVAL consumption GSoC project and I need to select a library to parse OVAL files (which are essentially XML files). I wrote an ADR to compare the XML parsing libraries used in Uyuni and based on the result of the comparison, I find JAXB to be the most suitable for the project use case. However, I would like to hear your input on JAXB and whether you think I should use another library.

My biggest concern with JAXB is memory consumption. Unlike StAx parsers, where we can read one XML element at a time and write it to the database directly, with JAXB, we need to wait for it to parse the whole file and then we can store it in the database. Keep in mind that the space allocated by JAXB will be released once the parsed object is written to the database.

openSUSE OVAL files allocate around 250 MB at a maximum. I created a PoC app that parses an OVAL file of 284 MB with JAXB and the app took 517 MB of heap memory at peak time.

admd · 2023-06-09T09:48:34Z

@rjmateus @cbosdo @mackdk can you please help Houssem make this decision here?

cbosdo · 2023-06-13T13:06:10Z

For potentially big datasets like this I would prefer using streams: all DOM-based solutions are bound to explode sooner or later as it's just a matter of number of objects in the end. Even though StAX is not as convenient as the other ones I would go with it as it is a tradeoff between the plain (efficient) SAX and easy (hungry) DOM APIs. I'm not sure we have to care about the execution time, but the memory consumption can bite us hard.

HoussemNasri · 2023-06-15T20:10:46Z

@cbosdo Thank you for your valuable input. Right now, I'm convinced of using StAX because as you said, with DOM, it's a matter of the number of objects. It's no longer a question of whether it will explode, but rather when it will explode. I will keep the issue open for a few more days in case someone has more input.

About the execution time, it takes around 6 seconds on my computer to parse a 250MB OVAL file. Assuming the OVAL data will be synced once a day, I think it's negligible. This is the DOM parser though, StAX should be slightly faster.

HoussemNasri · 2023-06-20T16:42:56Z

Decision: Use StAX because it provides a middle ground between the memory-hungry/easy-to-use DOM APIs and the efficient/complicated SAX.

HoussemNasri added the question Further information is requested label Jun 7, 2023

HoussemNasri closed this as completed Jun 20, 2023

HoussemNasri mentioned this issue Jul 1, 2023

[GSoC23] - A - Implement OVAL parser #7227

Closed

15 tasks

rjmateus mentioned this issue Oct 24, 2023

[GSOC23] - A - Implement a fully functional CVE auditing feature based on OVAL data #7466

Merged

17 tasks

admd mentioned this issue Feb 20, 2024

Remove simple-xml dependecy in favor of JAXB #7855

Draft

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What XML parsing library to use for parsing OVAL files #7108

What XML parsing library to use for parsing OVAL files #7108

HoussemNasri commented Jun 7, 2023 •

edited

Loading

admd commented Jun 9, 2023

cbosdo commented Jun 13, 2023

HoussemNasri commented Jun 15, 2023 •

edited

Loading

HoussemNasri commented Jun 20, 2023

What XML parsing library to use for parsing OVAL files #7108

What XML parsing library to use for parsing OVAL files #7108

Comments

HoussemNasri commented Jun 7, 2023 • edited Loading

Question

admd commented Jun 9, 2023

cbosdo commented Jun 13, 2023

HoussemNasri commented Jun 15, 2023 • edited Loading

HoussemNasri commented Jun 20, 2023

HoussemNasri commented Jun 7, 2023 •

edited

Loading

HoussemNasri commented Jun 15, 2023 •

edited

Loading