Skip to content

Commit

Permalink
ListWorksheetInfo/Names for Html/Csv/Slk (#3709)
Browse files Browse the repository at this point in the history
* ListWorksheetInfo/Names for Html/Csv/Slk

Fix #3706. ListWorksheetInfo is implemented for all Readers except Html. For most (not all), ListWorksheetInfo is more efficient than reading the spreadsheet. I can't think of a way to make that so for Html, but that shouldn't be a reason to leave it unimplemented.

ListWorksheetNames is not implemented for Html, Csv, or Slk. It isn't terribly useful for those formats, but that isn't a reason to omit it. The requester's use case consists of using IOFactory to create a reader for a file of unknown format and determining the first sheet name. That seems legitimate, but it is currently not possible without extra user code if the file is Html, Csv, or Slk; this PR will make it possible.

When Excel opens a Slk or Csv file, the sheet name is based on the file name. PhpSpreadsheet does this for Slk, but it uses a default name for Csv. I am not interested in creating a break for that behavior, but I have added a new boolean property `sheetNameIsFileName` with a setter to Csv Reader. The requester actually mentioned that possibility in our discussion, although it is not essential to the request.

As an adjunct to the issue, the requester wishes to use the worksheet name in `setLoadSheetsOnly`. That is already possible for Html, Csv, and Slk, but that particular property is ignored for those formats. I do not see a reason to change that behavior. This treatment is now explicitly noted in the documentation for property `loadSheetsOnly`.

There had been no tests for what happens when `loadSheetsOnly` is specified but no sheets match the criteria for the formats for which this makes sense (Xlsx, Xls, Ods, Gnumeric, Xml). The behavior was not consistent - some formats threw an Exception while others continued with a single empty worksheet. All cases attempt to set the active sheet, and they will now all throw identical Exceptions when they attempt to do so in this situation. Tests are added for each.

There also had been no tests for `loadSheetsOnly` returning more than one sheet. One is added.

* Update LoadSheetsOnlyTest.php

Add strict types to this new test, consistent with work being done in PR #3718.

* Update LoadSheetsOnlyTest.php

Add strict types to this new test, consistent with work being done in PR #3718.
  • Loading branch information
oleibman authored Sep 8, 2023
1 parent bd633b1 commit 0d1c9e4
Show file tree
Hide file tree
Showing 20 changed files with 369 additions and 25 deletions.
36 changes: 36 additions & 0 deletions src/PhpSpreadsheet/Reader/BaseReader.php
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ abstract class BaseReader implements IReader
/**
* Restrict which sheets should be loaded?
* This property holds an array of worksheet names to be loaded. If null, then all worksheets will be loaded.
* This property is ignored for Csv, Html, and Slk.
*
* @var null|string[]
*/
Expand Down Expand Up @@ -203,4 +204,39 @@ protected function openFile(string $filename): void

$this->fileHandle = $fileHandle;
}

/**
* Return worksheet info (Name, Last Column Letter, Last Column Index, Total Rows, Total Columns).
*
* @param string $filename
*
* @return array
*/
public function listWorksheetInfo($filename)
{
throw new PhpSpreadsheetException('Reader classes must implement their own listWorksheetInfo() method');
}

/**
* Returns names of the worksheets from a file,
* possibly without parsing the whole file to a Spreadsheet object.
* Readers will often have a more efficient method with which
* they can override this method.
*
* @param string $filename
*
* @return array
*/
public function listWorksheetNames($filename)
{
$returnArray = [];
$info = $this->listWorksheetInfo($filename);
foreach ($info as $infoArray) {
if (isset($infoArray['worksheetName'])) {
$returnArray[] = $infoArray['worksheetName'];
}
}

return $returnArray;
}
}
20 changes: 19 additions & 1 deletion src/PhpSpreadsheet/Reader/Csv.php
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
use PhpOffice\PhpSpreadsheet\Shared\StringHelper;
use PhpOffice\PhpSpreadsheet\Spreadsheet;
use PhpOffice\PhpSpreadsheet\Style\NumberFormat;
use PhpOffice\PhpSpreadsheet\Worksheet\Worksheet;

class Csv extends BaseReader
{
Expand Down Expand Up @@ -106,6 +107,9 @@ class Csv extends BaseReader
/** @var bool */
private $preserveNullString = false;

/** @var bool */
private $sheetNameIsFileName = false;

/**
* Create a new CSV Reader instance.
*/
Expand Down Expand Up @@ -220,8 +224,12 @@ protected function inferSeparator(): void

/**
* Return worksheet info (Name, Last Column Letter, Last Column Index, Total Rows, Total Columns).
*
* @param string $filename
*
* @return array
*/
public function listWorksheetInfo(string $filename): array
public function listWorksheetInfo($filename)
{
// Open file
$this->openFileOrMemory($filename);
Expand Down Expand Up @@ -381,6 +389,9 @@ private function loadStringOrFile(string $filename, Spreadsheet $spreadsheet, bo
$spreadsheet->createSheet();
}
$sheet = $spreadsheet->setActiveSheetIndex($this->sheetIndex);
if ($this->sheetNameIsFileName) {
$sheet->setTitle(substr(basename($filename, '.csv'), 0, Worksheet::SHEET_TITLE_MAXIMUM_LENGTH));
}

// Set our starting row based on whether we're in contiguous mode or not
$currentRow = 1;
Expand Down Expand Up @@ -643,4 +654,11 @@ public function getPreserveNullString(): bool
{
return $this->preserveNullString;
}

public function setSheetNameIsFileName(bool $sheetNameIsFileName): self
{
$this->sheetNameIsFileName = $sheetNameIsFileName;

return $this;
}
}
25 changes: 25 additions & 0 deletions src/PhpSpreadsheet/Reader/Html.php
Original file line number Diff line number Diff line change
Expand Up @@ -1183,4 +1183,29 @@ private function setBorderStyle(Style $cellStyle, $styleValue, $type): void
],
]);
}

/**
* Return worksheet info (Name, Last Column Letter, Last Column Index, Total Rows, Total Columns).
*
* @param string $filename
*
* @return array
*/
public function listWorksheetInfo($filename)
{
$info = [];
$spreadsheet = new Spreadsheet();
$this->loadIntoExisting($filename, $spreadsheet);
foreach ($spreadsheet->getAllSheets() as $sheet) {
$newEntry = ['worksheetName' => $sheet->getTitle()];
$newEntry['lastColumnLetter'] = $sheet->getHighestDataColumn();
$newEntry['lastColumnIndex'] = Coordinate::columnIndexFromString($sheet->getHighestDataColumn()) - 1;
$newEntry['totalRows'] = $sheet->getHighestDataRow();
$newEntry['totalColumns'] = $newEntry['lastColumnIndex'] + 1;
$info[] = $newEntry;
}
$spreadsheet->disconnectWorksheets();

return $info;
}
}
5 changes: 2 additions & 3 deletions src/PhpSpreadsheet/Reader/Ods.php
Original file line number Diff line number Diff line change
Expand Up @@ -240,6 +240,7 @@ protected function loadSpreadsheetFromFile(string $filename): Spreadsheet
{
// Create new Spreadsheet
$spreadsheet = new Spreadsheet();
$spreadsheet->removeSheetByIndex(0);

// Load into this instance
return $this->loadIntoExisting($filename, $spreadsheet);
Expand Down Expand Up @@ -345,9 +346,7 @@ public function loadIntoExisting($filename, Spreadsheet $spreadsheet)
$worksheetStyleName = $worksheetDataSet->getAttributeNS($tableNs, 'style-name');

// Create sheet
if ($worksheetID > 0) {
$spreadsheet->createSheet(); // First sheet is added by default
}
$spreadsheet->createSheet();
$spreadsheet->setActiveSheetIndex($worksheetID);

if ($worksheetName || is_numeric($worksheetName)) {
Expand Down
2 changes: 1 addition & 1 deletion src/PhpSpreadsheet/Reader/Slk.php
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,7 @@ public function listWorksheetInfo($filename)

break;
case 'Y':
$rowIndex = substr($rowDatum, 1);
$rowIndex = (int) substr($rowDatum, 1);

break;
}
Expand Down
8 changes: 8 additions & 0 deletions src/PhpSpreadsheet/Reader/Xls.php
Original file line number Diff line number Diff line change
Expand Up @@ -417,6 +417,9 @@ class Xls extends BaseReader
*/
private $baseCell;

/** @var bool */
private $activeSheetSet = false;

/**
* Create a new Xls Reader instance.
*/
Expand Down Expand Up @@ -829,6 +832,7 @@ protected function loadSpreadsheetFromFile(string $filename): Spreadsheet
}

// Parse the individual sheets
$this->activeSheetSet = false;
foreach ($this->sheets as $sheet) {
if ($sheet['sheetType'] != 0x00) {
// 0x00: Worksheet, 0x02: Chart, 0x06: Visual Basic module
Expand Down Expand Up @@ -1240,6 +1244,9 @@ protected function loadSpreadsheetFromFile(string $filename): Spreadsheet
}
}
}
if ($this->activeSheetSet === false) {
$this->spreadsheet->setActiveSheetIndex(0);
}

// add the named ranges (defined names)
foreach ($this->definedname as $definedName) {
Expand Down Expand Up @@ -4401,6 +4408,7 @@ private function readWindow2(): void
$isActive = (bool) ((0x0400 & $options) >> 10);
if ($isActive) {
$this->spreadsheet->setActiveSheetIndex($this->spreadsheet->getIndex($this->phpSheet));
$this->activeSheetSet = true;
}

// bit: 11; mask: 0x0800; 0 = normal view, 1 = page break view
Expand Down
3 changes: 0 additions & 3 deletions src/PhpSpreadsheet/Reader/Xlsx/WorkbookView.php
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,6 @@ public function __construct(Spreadsheet $spreadsheet)
*/
public function viewSettings(SimpleXMLElement $xmlWorkbook, $mainNS, array $mapSheetId, bool $readDataOnly): void
{
if ($this->spreadsheet->getSheetCount() == 0) {
$this->spreadsheet->createSheet();
}
// Default active sheet index to the first loaded worksheet from the file
$this->spreadsheet->setActiveSheetIndex(0);

Expand Down
5 changes: 0 additions & 5 deletions tests/PhpSpreadsheetTests/Reader/BaseNoLoad.php
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,4 @@ public function canRead(string $filename): bool
{
return $filename !== '';
}

public function loadxxx(string $filename): void
{
$this->loadSpreadsheetFromFile($filename);
}
}
10 changes: 9 additions & 1 deletion tests/PhpSpreadsheetTests/Reader/BaseNoLoadTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,14 @@ public function testBaseNoLoad(): void
$this->expectException(SpreadsheetException::class);
$this->expectExceptionMessage('Reader classes must implement their own loadSpreadsheetFromFile() method');
$reader = new BaseNoLoad();
$reader->loadxxx('unknown.file');
$reader->load('unknown.file');
}

public function testBaseNoLoadInfo(): void
{
$this->expectException(SpreadsheetException::class);
$this->expectExceptionMessage('Reader classes must implement their own listWorksheetInfo() method');
$reader = new BaseNoLoad();
$reader->listWorksheetInfo('unknown.file');
}
}
8 changes: 8 additions & 0 deletions tests/PhpSpreadsheetTests/Reader/Csv/CsvCallbackTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,14 @@ public function testCallbackDoNothing(): void
$spreadsheet = $reader->load($filename);
$sheet = $spreadsheet->getActiveSheet();
self::assertEquals('Å', $sheet->getCell('A1')->getValue());
$spreadsheet->disconnectWorksheets();
}

public function callbackSetFallbackEncoding(Csv $reader): void
{
$reader->setFallbackEncoding('ISO-8859-2');
$reader->setInputEncoding(Csv::GUESS_ENCODING);
$reader->setSheetNameIsFileName(true);
$reader->setEscapeCharacter('');
}

Expand All @@ -48,6 +50,7 @@ public function testFallbackEncodingDefltIso2(): void
$sheet = $spreadsheet->getActiveSheet();
self::assertEquals('premičre', $sheet->getCell('A1')->getValue());
self::assertEquals('sixičme', $sheet->getCell('C2')->getValue());
$spreadsheet->disconnectWorksheets();
}

public function testIOFactory(): void
Expand All @@ -58,6 +61,7 @@ public function testIOFactory(): void
$sheet = $spreadsheet->getActiveSheet();
self::assertEquals('premičre', $sheet->getCell('A1')->getValue());
self::assertEquals('sixičme', $sheet->getCell('C2')->getValue());
$spreadsheet->disconnectWorksheets();
}

public function testNonFallbackEncoding(): void
Expand All @@ -69,6 +73,7 @@ public function testNonFallbackEncoding(): void
$sheet = $spreadsheet->getActiveSheet();
self::assertEquals('première', $sheet->getCell('A1')->getValue());
self::assertEquals('sixième', $sheet->getCell('C2')->getValue());
$spreadsheet->disconnectWorksheets();
}

public function testDefaultEscape(): void
Expand All @@ -79,6 +84,7 @@ public function testDefaultEscape(): void
$sheet = $spreadsheet->getActiveSheet();
// this is not how Excel views the file
self::assertEquals('a\"hello', $sheet->getCell('A1')->getValue());
$spreadsheet->disconnectWorksheets();
}

public function testBetterEscape(): void
Expand All @@ -89,5 +95,7 @@ public function testBetterEscape(): void
$sheet = $spreadsheet->getActiveSheet();
// this is how Excel views the file
self::assertEquals('a\"hello;hello;hello;\"', $sheet->getCell('A1')->getValue());
self::assertSame('escape', $sheet->getTitle(), 'callback set sheet title to use file name rather than default');
$spreadsheet->disconnectWorksheets();
}
}
15 changes: 10 additions & 5 deletions tests/PhpSpreadsheetTests/Reader/Csv/CsvEncodingTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,13 @@ public function testWorkSheetInfo($filename, $encoding): void
$reader = new Csv();
$reader->setInputEncoding($encoding);
$info = $reader->listWorksheetInfo($filename);
self::assertEquals('Worksheet', $info[0]['worksheetName']);
self::assertEquals('B', $info[0]['lastColumnLetter']);
self::assertEquals(1, $info[0]['lastColumnIndex']);
self::assertEquals(2, $info[0]['totalRows']);
self::assertEquals(2, $info[0]['totalColumns']);
self::assertCount(1, $info);
self::assertSame('Worksheet', $info[0]['worksheetName']);
self::assertSame('B', $info[0]['lastColumnLetter']);
self::assertSame(1, $info[0]['lastColumnIndex']);
self::assertSame(2, $info[0]['totalRows']);
self::assertSame(2, $info[0]['totalColumns']);
self::assertSame(['Worksheet'], $reader->listWorksheetNames($filename));
}

public static function providerEncodings(): array
Expand Down Expand Up @@ -78,6 +80,9 @@ public function testSurrogate(): void
$filename = 'tests/data/Reader/CSV/premiere.utf16le.csv';
$reader = new Csv();
$reader->setInputEncoding(Csv::guessEncoding($filename));
$names = $reader->listWorksheetNames($filename);
// Following ignored, just make sure it's executable.
$reader->setLoadSheetsOnly([$names[0]]);
$spreadsheet = $reader->load($filename);
$sheet = $spreadsheet->getActiveSheet();
self::assertEquals('𐐀', $sheet->getCell('A3')->getValue());
Expand Down
11 changes: 11 additions & 0 deletions tests/PhpSpreadsheetTests/Reader/Gnumeric/GnumericLoadTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
namespace PhpOffice\PhpSpreadsheetTests\Reader\Gnumeric;

use DateTimeZone;
use PhpOffice\PhpSpreadsheet\Exception as PhpSpreadsheetException;
use PhpOffice\PhpSpreadsheet\Reader\Exception as ReaderException;
use PhpOffice\PhpSpreadsheet\Reader\Gnumeric;
use PhpOffice\PhpSpreadsheet\Shared\Date;
Expand Down Expand Up @@ -170,6 +171,16 @@ public function testLoadSelectedSheets(): void
$spreadsheet->disconnectWorksheets();
}

public function testLoadNoSelectedSheets(): void
{
$this->expectException(PhpSpreadsheetException::class);
$this->expectExceptionMessage('You tried to set a sheet active by the out of bounds index');
$filename = 'samples/templates/GnumericTest.gnumeric';
$reader = new Gnumeric();
$reader->setLoadSheetsOnly(['Unknown Sheet', 'xReport Data']);
$reader->load($filename);
}

public function testLoadNotGnumeric(): void
{
$this->expectException(ReaderException::class);
Expand Down
26 changes: 26 additions & 0 deletions tests/PhpSpreadsheetTests/Reader/Html/Issue2942Test.php
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ public function testLoadFromString(): void
$spreadsheet = $reader->loadFromString($content);
$sheet = $spreadsheet->getActiveSheet();
self::assertSame('éàâèî', $sheet->getCell('A1')->getValue());
$spreadsheet->disconnectWorksheets();
}

public function testLoadFromFile(): void
Expand All @@ -32,5 +33,30 @@ public function testLoadFromFile(): void
self::assertSame('അആ', $sheet->getCell('B3')->getValue());
self::assertSame('กขฃ', $sheet->getCell('C3')->getValue());
self::assertSame('✀✐✠', $sheet->getCell('D3')->getValue());
$spreadsheet->disconnectWorksheets();
}

public function testInfo(): void
{
$file = 'tests/data/Reader/HTML/utf8chars.charset.html';
$reader = new Html();
$info = $reader->listWorksheetInfo($file);
self::assertCount(1, $info);
$info0 = $info[0];
self::assertSame('Test Utf-8 characters voilà', $info0['worksheetName']);
self::assertSame('D', $info0['lastColumnLetter']);
self::assertSame(3, $info0['lastColumnIndex']);
self::assertSame(7, $info0['totalRows']);
self::assertSame(4, $info0['totalColumns']);
$names = $reader->listWorksheetNames($file);
self::assertCount(1, $names);
self::assertSame('Test Utf-8 characters voilà', $names[0]);

// Following ignored, just make sure it's executable.
$reader->setLoadSheetsOnly([$names[0]]);
$spreadsheet = $reader->load($file);
$sheet = $spreadsheet->getActiveSheet();
self::assertSame('✀✐✠', $sheet->getCell('D3')->getValue());
$spreadsheet->disconnectWorksheets();
}
}
Loading

0 comments on commit 0d1c9e4

Please sign in to comment.