Completely rewrite excel sheet parsing #88

WorkingRobot · 2024-08-12T08:53:06Z

Currently a draft; I've added documentation for all the public facing API changes. Lumina.Excel.GeneratedSheets is still TBD on this, but I'm planning on working with @lmcintyre to upstream ExdSheets changes to EXDSchema and GeneratedSheets.

Given the nature of this rewrite, this involved breaking changes and some possible drawbacks:

Now that Lumina uses FrozenDictionaries, I've migrated to only supporting .NET 8 (though since it's an LTS release, I hope it's not a big deal) and dropping support for 6 and 7.

Right now, I'm unsure of a few things:

What types/functions should be marked as a purely implementation detail and hidden from Intellisense/autocomplete? Likely candidates that I think should be hidden are ExcelPage and ExcelModule.GetSheetGeneric.
LazyRow was renamed to RowRef to better express its new functionality, particularly since RowRef doesn't lazily create a new row object. Following that, I'm unsure of what to name LazyCollection. It's used to reference a struct within a row and it's behavior isn't necessarily "lazy" either. It's more of a "FunctorCollection", but that would be really confusing for the end user.

WorkingRobot · 2024-08-12T08:55:39Z

If you're looking for a hands-on example of this, I already use this in the wild with Craftimizer via the ExdSheets nuget package

Use a plain delegate instead and resolve by default

NotAdam · 2024-08-16T02:12:31Z

I left most of the 'internal' apis public on purpose so people could use them for more interesting things as they need/want, but obviously that's within reason - if it's purely internal and there's not a case where it would actually work to be public it probably shouldn't be, but the idea generally has been to keep implementation details public and (re)usable so that subsets or otherwise can be used ad-hoc. I get this is a non-answer, but sort of the methodology behind why things are the way they are.

RowRef change is fine to me given the change in behaviour. Does Row(Ref)Collection work? I suck at naming things though so ¯_(ツ)_/¯

* `IExcelRow.RowId` and `IExcelRow.SubrowId` exist to implement `ICollection<T>.Contains`. * `System.Collections.Generic.EnumerableHelpers.ToArray` and alike in LINQ has an optimization for `ICollection<T>`. As it requires `Contains` to be implemented, exposing `RowId` and `SubrowId` in a generic way will make it possible to implement that in O(1). * `ExcelSheet` constructor now preallocates lookup lookup tables. * `.exh` file comes with information on how many rows are there, so we know the exact number of items that needs to be allocated. * Using an array directly bypassing list wrappers may provide an additional speed boost. * In case `.exh` file contains a wrong information on number of rows, which is an unlikely case, `Array.Resize` is used to reallocate the array. * `ExcelSheet.UnsafeCreateRow/Subrow/At` has been added. * These functions assume that boundary checks are done by callers. * As enumerators always work inside the boundary, especially when the collection is immutable, `IEnumerator{T}.Current` can skip boundary checks. * `DefaultExcelSheet<T>` and `SubrowsExcelSheet<T>` has been added. * As sheets are usually not meaningful without knowing what is in it in the first place, it would be better to specialize for each variants. * This effectively hides subrow operations from sheets of default variants. * This removes `HasSubrows` check from getter functions. * Added `SubrowsExcelSheet.Try/GetRow/OrDefault` variants that returns `SubrowCollection<T>` instead. * This makes it convenient to iterate over subrows under one row ID. * This makes it faster to access multiple subrows under one row ID, as lookup operation is done on obtaining the collection. Once the collection is constructed, accessing subrows is an O(1) operation. * `ExcelModule.GetSheet` uses static lambda in place of `SheetCache.GetOrAdd`. * This will avoid heap allocation if a corresponding sheet is already loaded. * Named value tuples in `ExcelSheet` are replaced with `record struct`. * This reduces the size of each lookup element from 16 bytes to 12 bytes.

Most of sheets do not have large gaps across items. That fact can be used to make a lookup array instead of lookup dictionary, which will even further reduce the time spent translating row ID to row index.

Suggestions on Excel

* `MethodInfo.Invoke` throws `TargetInvocationException` if the method throws an exception; changed to handle that. * Added comments for some functions.

Making `IEnumerator<T>.Current` evaluate on demand can let an invalid value get passed to UnsafeCreate functions. Creating them on `MoveNext` will guarantee that UnsafeCreate functions are called only from the context where the preconditions are met.

Reformat code, documentation/enumerator correctness fixes

types

Use constructors directly on Subrow/ExcelSheet, specialize exception

NotAdam · 2024-09-20T04:22:27Z

thanks - looks good! do you want me to fix the conflicts or would you like to?

WorkingRobot · 2024-09-20T04:23:55Z

I'd appreciate it if you could, ty :)

NotAdam · 2024-09-20T05:20:46Z

no worries, will sort when home

WorkingRobot added 3 commits August 11, 2024 22:56

Update dependencies and .net version to 8.0

faa1585

Add new rewritten excel row types

de6a8c3

Fix compile errors

66a4d96

Soreepeong and others added 2 commits August 12, 2024 22:26

Minor doc fixes

64efaf8

Allow direct indexing of rows and subrows

27bb2e2

WorkingRobot force-pushed the master branch from 4d27978 to 27bb2e2 Compare August 13, 2024 01:37

WorkingRobot added 12 commits August 13, 2024 21:43

Fix iteration bug

0fe084e

docs changes, add more properties to IExcelSheet

96dceeb

Add subrow indexer to ExcelSheet

604b7c9

Rewrite RSV resolution

f688b05

RowLookup performance change

6fb7725

Rename rsv folder

db7e8b2

Revert removal of docs file

1acb0d6

Change API to be more C# like

339aea8

Add ExcelPage documentation

271cfde

Publicize column information

a092322

Formatting changes

c9b9bb1

Refactor RSV resolution

fc7fb2e

Use a plain delegate instead and resolve by default

Soreepeong and others added 10 commits August 16, 2024 22:36

Add row index lookup array

79c90a5

Most of sheets do not have large gaps across items. That fact can be used to make a lookup array instead of lookup dictionary, which will even further reduce the time spent translating row ID to row index.

Merge pull request #1 from Soreepeong/wrl2

754bf9a

Suggestions on Excel

Additional changes

105fd19

Reformat code

0956b1f

Correctness and documents

5af0b02

* `MethodInfo.Invoke` throws `TargetInvocationException` if the method throws an exception; changed to handle that. * Added comments for some functions.

Extra format

2ad083e

Remove unnecessary code

2a3cee2

Merge pull request #2 from Soreepeong/wrl2

924b733

Reformat code, documentation/enumerator correctness fixes

WorkingRobot and others added 7 commits August 19, 2024 02:06

Make sheet name optional

0b3082f

Use constructors directly on Subrow/ExcelSheet, specialize exception

73a0c6f

types

Merge pull request #4 from Soreepeong/wrl2

4f57f67

Use constructors directly on Subrow/ExcelSheet, specialize exception

Formatting changes

b07aa32

Create exceptions namespace

030bfd3

Fix compiler error & exception handling

e3a39d8

Extra exception docs formatting

b70f89e

WorkingRobot force-pushed the master branch from 4e09b8f to b70f89e Compare August 19, 2024 19:58

WorkingRobot added 14 commits August 23, 2024 01:12

Additional sheet changes (ty kizer)

0a13740

Refactor SheetNames property

130f036

Fix invalid cast

78a00f0

Add rowref benchmarks

ddce871

More benchmarks

6a28cd2

Benchmark whitespace

46dbd7b

RowRef performance updates

a562e31

Fix RowRef.GetValueOrDefault behavior

4c00073

Increase generic RowRef resolution speed through interval trees

1de6d40

Add RowRefIntervalTree tests

651599d

Fix IntervalTree construction

6b55c21

Change hashing to be consistent with Lumina.Excel

2c844d9

Add custom language support to RowRefs

9b0aab3

Formatting changes

843f976

Merge branch 'master' into master

9088fe3

NotAdam approved these changes Sep 28, 2024

View reviewed changes

Fix compiler errors

2ce372a

WorkingRobot force-pushed the master branch from c3f3869 to 2ce372a Compare September 29, 2024 23:15

NotAdam merged commit ee723ff into NotAdam:master Oct 3, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Completely rewrite excel sheet parsing #88

Completely rewrite excel sheet parsing #88

WorkingRobot commented Aug 12, 2024 •

edited

Loading

WorkingRobot commented Aug 12, 2024

NotAdam commented Aug 16, 2024

NotAdam commented Sep 20, 2024

WorkingRobot commented Sep 20, 2024

NotAdam commented Sep 20, 2024

Completely rewrite excel sheet parsing #88

Completely rewrite excel sheet parsing #88

Conversation

WorkingRobot commented Aug 12, 2024 • edited Loading

WorkingRobot commented Aug 12, 2024

NotAdam commented Aug 16, 2024

NotAdam commented Sep 20, 2024

WorkingRobot commented Sep 20, 2024

NotAdam commented Sep 20, 2024

WorkingRobot commented Aug 12, 2024 •

edited

Loading