Xlsx Reader Optionally Ignore Rows With No Cells #4035
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix #3982. A number of issues submitted about Xlsx read performance have a common theme, namely that row 1,048,576 and a few rows before it are defined in the worksheet Xml with no cells attached to them. These might be the work of a third party product. While these extraneous rows do not cause any problems for the cells that are actually used on the worksheet, they can lead to excessive memory use. This PR provides an option for the application to ignore rows with no cells when loading.
Recent changes to the load logic had already made a significant difference to memory consumption and load time. For the spreadsheet attached to issue 3982, which had caused out-of-memory errors on the user's system, peak memory usage was already reduced to 40-odd MB. With the new option, this is drastically reduced again, to just over 9MB. Specifying the new option is very easy:
Note that there are cases where you might not want this (non-default) behavior. For example, if you set a row height on a row with no cells, the height would be lost with this option. Unfortunately, the extraneous row definitions in the problematic spreadsheets claim to have a custom height, so I can't just use "no custom row styles" as an additional filter.
This is:
Checklist:
Why this change is needed?
Provide an explanation of why this change is needed, with links to any Issues (if appropriate).
If this is a bugfix or a new feature, and there are no existing Issues, then please also create an issue that will make it easier to track progress with this PR.