Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Completely rewrite excel sheet parsing #88

Merged
merged 58 commits into from
Oct 3, 2024
Merged

Conversation

WorkingRobot
Copy link
Contributor

@WorkingRobot WorkingRobot commented Aug 12, 2024

Currently a draft; I've added documentation for all the public facing API changes. Lumina.Excel.GeneratedSheets is still TBD on this, but I'm planning on working with @lmcintyre to upstream ExdSheets changes to EXDSchema and GeneratedSheets.

Given the nature of this rewrite, this involved breaking changes and some possible drawbacks:

  • Now that Lumina uses FrozenDictionaries, I've migrated to only supporting .NET 8 (though since it's an LTS release, I hope it's not a big deal) and dropping support for 6 and 7.

Right now, I'm unsure of a few things:

  • What types/functions should be marked as a purely implementation detail and hidden from Intellisense/autocomplete? Likely candidates that I think should be hidden are ExcelPage and ExcelModule.GetSheetGeneric.
  • LazyRow was renamed to RowRef to better express its new functionality, particularly since RowRef doesn't lazily create a new row object. Following that, I'm unsure of what to name LazyCollection. It's used to reference a struct within a row and it's behavior isn't necessarily "lazy" either. It's more of a "FunctorCollection", but that would be really confusing for the end user.

@WorkingRobot
Copy link
Contributor Author

If you're looking for a hands-on example of this, I already use this in the wild with Craftimizer via the ExdSheets nuget package

@NotAdam
Copy link
Owner

NotAdam commented Aug 16, 2024

I left most of the 'internal' apis public on purpose so people could use them for more interesting things as they need/want, but obviously that's within reason - if it's purely internal and there's not a case where it would actually work to be public it probably shouldn't be, but the idea generally has been to keep implementation details public and (re)usable so that subsets or otherwise can be used ad-hoc. I get this is a non-answer, but sort of the methodology behind why things are the way they are.

RowRef change is fine to me given the change in behaviour. Does Row(Ref)Collection work? I suck at naming things though so ¯_(ツ)_/¯

Soreepeong and others added 10 commits August 16, 2024 22:36
* `IExcelRow.RowId` and `IExcelRow.SubrowId` exist to implement `ICollection<T>.Contains`.
  * `System.Collections.Generic.EnumerableHelpers.ToArray` and alike in LINQ has an optimization for `ICollection<T>`. As it requires `Contains` to be implemented, exposing `RowId` and `SubrowId` in a generic way will make it possible to implement that in O(1).
* `ExcelSheet` constructor now preallocates lookup lookup tables.
  * `.exh` file comes with information on how many rows are there, so we know the exact number of items that needs to be allocated.
  * Using an array directly bypassing list wrappers may provide an additional speed boost.
  * In case `.exh` file contains a wrong information on number of rows, which is an unlikely case, `Array.Resize` is used to reallocate the array.
* `ExcelSheet.UnsafeCreateRow/Subrow/At` has been added.
  * These functions assume that boundary checks are done by callers.
  * As enumerators always work inside the boundary, especially when the collection is immutable, `IEnumerator{T}.Current` can skip boundary checks.
* `DefaultExcelSheet<T>` and `SubrowsExcelSheet<T>` has been added.
  * As sheets are usually not meaningful without knowing what is in it in the first place, it would be better to specialize for each variants.
  * This effectively hides subrow operations from sheets of default variants.
  * This removes `HasSubrows` check from getter functions.
* Added `SubrowsExcelSheet.Try/GetRow/OrDefault` variants that returns `SubrowCollection<T>` instead.
  * This makes it convenient to iterate over subrows under one row ID.
  * This makes it faster to access multiple subrows under one row ID, as lookup operation is done on obtaining the collection. Once the collection is constructed, accessing subrows is an O(1) operation.
* `ExcelModule.GetSheet` uses static lambda in place of `SheetCache.GetOrAdd`.
  * This will avoid heap allocation if a corresponding sheet is already loaded.
* Named value tuples in `ExcelSheet` are replaced with `record struct`.
  * This reduces the size of each lookup element from 16 bytes to 12 bytes.
Most of sheets do not have large gaps across items. That fact can be
used to make a lookup array instead of lookup dictionary, which will
even further reduce the time spent translating row ID to row index.
* `MethodInfo.Invoke` throws `TargetInvocationException` if the method
  throws an exception; changed to handle that.
* Added comments for some functions.
Making `IEnumerator<T>.Current` evaluate on demand can let an invalid
value get passed to UnsafeCreate functions. Creating them on `MoveNext`
will guarantee that UnsafeCreate functions are called only from the
context where the preconditions are met.
Reformat code, documentation/enumerator correctness fixes
@NotAdam
Copy link
Owner

NotAdam commented Sep 20, 2024

thanks - looks good! do you want me to fix the conflicts or would you like to?

@WorkingRobot
Copy link
Contributor Author

I'd appreciate it if you could, ty :)

@NotAdam
Copy link
Owner

NotAdam commented Sep 20, 2024

no worries, will sort when home

@NotAdam NotAdam merged commit ee723ff into NotAdam:master Oct 3, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants