Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestions on Excel #1

Merged
merged 2 commits into from
Aug 17, 2024
Merged

Suggestions on Excel #1

merged 2 commits into from
Aug 17, 2024

Conversation

Soreepeong
Copy link

  • IExcelRow.RowId and IExcelRow.SubrowId exist to implement ICollection<T>.Contains.
    • System.Collections.Generic.EnumerableHelpers.ToArray and alike in LINQ has an optimization for ICollection<T>. As it requires Contains to be implemented, exposing RowId and SubrowId in a generic way will make it possible to implement that in O(1).
  • ExcelSheet constructor now preallocates lookup lookup tables.
    • .exh file comes with information on how many rows are there, so we know the exact number of items that needs to be allocated.
    • Using an array directly bypassing list wrappers may provide an additional speed boost.
    • In case .exh file contains a wrong information on number of rows, which is an unlikely case, Array.Resize is used to reallocate the array.
  • ExcelSheet.UnsafeCreateRow/Subrow/At has been added.
    • These functions assume that boundary checks are done by callers.
    • As enumerators always work inside the boundary, especially when the collection is immutable, IEnumerator{T}.Current can skip boundary checks.
  • DefaultExcelSheet<T> and SubrowsExcelSheet<T> has been added.
    • As sheets are usually not meaningful without knowing what is in it in the first place, it would be better to specialize for each variants.
    • This effectively hides subrow operations from sheets of default variants.
    • This removes HasSubrows check from getter functions.
  • Added SubrowsExcelSheet.Try/GetRow/OrDefault variants that returns SubrowCollection<T> instead.
    • This makes it convenient to iterate over subrows under one row ID.
    • This makes it faster to access multiple subrows under one row ID, as lookup operation is done on obtaining the collection. Once the collection is constructed, accessing subrows is an O(1) operation.
  • ExcelModule.GetSheet uses static lambda in place of SheetCache.GetOrAdd.
    • This will avoid heap allocation if a corresponding sheet is already loaded.
  • Named value tuples in ExcelSheet are replaced with record struct.
    • This reduces the size of each lookup element from 16 bytes to 12 bytes.

Benchmark

Before After
image image

Test Code

* `IExcelRow.RowId` and `IExcelRow.SubrowId` exist to implement `ICollection<T>.Contains`.
  * `System.Collections.Generic.EnumerableHelpers.ToArray` and alike in LINQ has an optimization for `ICollection<T>`. As it requires `Contains` to be implemented, exposing `RowId` and `SubrowId` in a generic way will make it possible to implement that in O(1).
* `ExcelSheet` constructor now preallocates lookup lookup tables.
  * `.exh` file comes with information on how many rows are there, so we know the exact number of items that needs to be allocated.
  * Using an array directly bypassing list wrappers may provide an additional speed boost.
  * In case `.exh` file contains a wrong information on number of rows, which is an unlikely case, `Array.Resize` is used to reallocate the array.
* `ExcelSheet.UnsafeCreateRow/Subrow/At` has been added.
  * These functions assume that boundary checks are done by callers.
  * As enumerators always work inside the boundary, especially when the collection is immutable, `IEnumerator{T}.Current` can skip boundary checks.
* `DefaultExcelSheet<T>` and `SubrowsExcelSheet<T>` has been added.
  * As sheets are usually not meaningful without knowing what is in it in the first place, it would be better to specialize for each variants.
  * This effectively hides subrow operations from sheets of default variants.
  * This removes `HasSubrows` check from getter functions.
* Added `SubrowsExcelSheet.Try/GetRow/OrDefault` variants that returns `SubrowCollection<T>` instead.
  * This makes it convenient to iterate over subrows under one row ID.
  * This makes it faster to access multiple subrows under one row ID, as lookup operation is done on obtaining the collection. Once the collection is constructed, accessing subrows is an O(1) operation.
* `ExcelModule.GetSheet` uses static lambda in place of `SheetCache.GetOrAdd`.
  * This will avoid heap allocation if a corresponding sheet is already loaded.
* Named value tuples in `ExcelSheet` are replaced with `record struct`.
  * This reduces the size of each lookup element from 16 bytes to 12 bytes.
Most of sheets do not have large gaps across items. That fact can be
used to make a lookup array instead of lookup dictionary, which will
even further reduce the time spent translating row ID to row index.
@Soreepeong
Copy link
Author

Soreepeong commented Aug 16, 2024

Added lookup array to use instead of dictionary, if the wasted memory for lookup array items that resolve to nonexistence are not too huge.

image

Code used to evaluate wasted memory usage, to be run as a test.
    [RequiresGameInstallationFact]
    public void TestAllSheets()
    {
        System.Threading.Thread.CurrentThread.Priority = System.Threading.ThreadPriority.Highest;
        var gameData = new GameData( @"C:\Program Files (x86)\SquareEnix\FINAL FANTASY XIV - A Realm Reborn\game\sqpack" );

        var gaps = new List< uint >();
        foreach( var sheetName in gameData.Excel.SheetNames )
        {
            if( gameData.GetFile< ExcelHeaderFile >( $"exd/{sheetName}.exh" ) is not { } headerFile )
                continue;
            var lang = headerFile.Languages.Contains( Language.English ) ? Language.English : Language.None;
            switch( headerFile.Header.Variant )
            {
                case ExcelVariant.Default:
                {
                    var sheet = (DefaultExcelSheet<Addon>) ExcelSheet.From< Addon >( gameData.Excel, lang, sheetName );
                    gaps.Add( sheet.Count == 0 ? 0 : sheet.GetRowAt( sheet.Count - 1 ).RowId - sheet.GetRowAt( 0 ).RowId + 1 - (uint)sheet.Count );
                    break;
                }
                case ExcelVariant.Subrows:
                {
                    var sheet = (SubrowsExcelSheet<QuestLinkMarker>) ExcelSheet.From< QuestLinkMarker >( gameData.Excel, lang, sheetName );
                    gaps.Add( sheet.Count == 0 ? 0 : sheet.GetRowAt( sheet.Count - 1 ).RowId - sheet.GetRowAt( 0 ).RowId + 1 - (uint)sheet.Count );
                    break;
                }
            }
        }

        gaps.Sort();
        var countAcc = 0;
        var wasteAcc = 0;
        var test = string.Join(
            "\n",
            gaps
                .GroupBy( static x => x / 1024, static x => x )
                .Select( x =>
                    $"{( x.Key + 1 ) * 1024,8}, {( countAcc += x.Count() ) * 100f / gaps.Count,6:00.00}%, {( wasteAcc += x.Sum( static y => (int) y ) * 4 ) / 1024,5}KB" ) );
    }

@WorkingRobot WorkingRobot merged commit 754bf9a into WorkingRobot:master Aug 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants