Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ListWorksheetInfo/Names for Html/Csv/Slk #3709

Merged
merged 5 commits into from
Sep 8, 2023

Conversation

oleibman
Copy link
Collaborator

@oleibman oleibman commented Sep 6, 2023

Fix #3706. ListWorksheetInfo is implemented for all Readers except Html. For most (not all), ListWorksheetInfo is more efficient than reading the spreadsheet. I can't think of a way to make that so for Html, but that shouldn't be a reason to leave it unimplemented.

ListWorksheetNames is not implemented for Html, Csv, or Slk. It isn't terribly useful for those formats, but that isn't a reason to omit it. The requester's use case consists of using IOFactory to create a reader for a file of unknown format and determining the first sheet name. That seems legitimate, but it is currently not possible without extra user code if the file is Html, Csv, or Slk; this PR will make it possible.

When Excel opens a Slk or Csv file, the sheet name is based on the file name. PhpSpreadsheet does this for Slk, but it uses a default name for Csv. I am not interested in creating a break for that behavior, but I have added a new boolean property sheetNameIsFileName with a setter to Csv Reader. The requester actually mentioned that possibility in our discussion, although it is not essential to the request.

As an adjunct to the issue, the requester wishes to use the worksheet name in setLoadSheetsOnly. That is already possible for Html, Csv, and Slk, but that particular property is ignored for those formats. I do not see a reason to change that behavior. This treatment is now explicitly noted in the documentation for property loadSheetsOnly.

There had been no tests for what happens when loadSheetsOnly is specified but no sheets match the criteria for the formats for which this makes sense (Xlsx, Xls, Ods, Gnumeric, Xml). The behavior was not consistent - some formats threw an Exception while others continued with a single empty worksheet. All cases attempt to set the active sheet, and they will now all throw identical Exceptions when they attempt to do so in this situation. Tests are added for each.

There also had been no tests for loadSheetsOnly returning more than one sheet. One is added.

This is:

  • a bugfix
  • a new feature
  • refactoring
  • additional unit tests

Checklist:

  • Changes are covered by unit tests
    • Changes are covered by existing unit tests
    • New unit tests have been added
  • Code style is respected
  • Commit message explains why the change is made (see https://github.com/erlang/otp/wiki/Writing-good-commit-messages)
  • CHANGELOG.md contains a short summary of the change and a link to the pull request if applicable
  • Documentation is updated as necessary

Why this change is needed?

Provide an explanation of why this change is needed, with links to any Issues (if appropriate).
If this is a bugfix or a new feature, and there are no existing Issues, then please also create an issue that will make it easier to track progress with this PR.

Fix PHPOffice#3706. ListWorksheetInfo is implemented for all Readers except Html. For most (not all), ListWorksheetInfo is more efficient than reading the spreadsheet. I can't think of a way to make that so for Html, but that shouldn't be a reason to leave it unimplemented.

ListWorksheetNames is not implemented for Html, Csv, or Slk. It isn't terribly useful for those formats, but that isn't a reason to omit it. The requester's use case consists of using IOFactory to create a reader for a file of unknown format and determining the first sheet name. That seems legitimate, but it is currently not possible without extra user code if the file is Html, Csv, or Slk; this PR will make it possible.

When Excel opens a Slk or Csv file, the sheet name is based on the file name. PhpSpreadsheet does this for Slk, but it uses a default name for Csv. I am not interested in creating a break for that behavior, but I have added a new boolean property `sheetNameIsFileName` with a setter to Csv Reader. The requester actually mentioned that possibility in our discussion, although it is not essential to the request.

As an adjunct to the issue, the requester wishes to use the worksheet name in `setLoadSheetsOnly`. That is already possible for Html, Csv, and Slk, but that particular property is ignored for those formats. I do not see a reason to change that behavior. This treatment is now explicitly noted in the documentation for property `loadSheetsOnly`.

There had been no tests for what happens when `loadSheetsOnly` is specified but no sheets match the criteria for the formats for which this makes sense (Xlsx, Xls, Ods, Gnumeric, Xml). The behavior was not consistent - some formats threw an Exception while others continued with a single empty worksheet. All cases attempt to set the active sheet, and they will now all throw identical Exceptions when they attempt to do so in this situation. Tests are added for each.

There also had been no tests for `loadSheetsOnly` returning more than one sheet. One is added.
@oleibman
Copy link
Collaborator Author

oleibman commented Sep 6, 2023

No concern with Scrutinizer "complexity" message.

Add strict types to this new test, consistent with work being done in PR PHPOffice#3718.
Add strict types to this new test, consistent with work being done in PR PHPOffice#3718.
@oleibman oleibman merged commit 0d1c9e4 into PHPOffice:master Sep 8, 2023
10 checks passed
Comment on lines +420 to +421
/** @var bool */
private $activeSheetSet = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refrain from introducing non-native typing. Now that we require PHP 8.0 we should be able to natively type almost everything, especially properties.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

@oleibman oleibman deleted the issue3706 branch November 13, 2023 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Create interface for support of multiple worksheets per file
2 participants