Skip to content

Commit

Permalink
HyperlinkBase Property, and Html Handling of Properties (#3589)
Browse files Browse the repository at this point in the history
* HyperlinkBase Property, and Html Handling of Properties

Fix #3573. The original issue concerned non-support of Document Properties in Xml spreadsheets. However, most of the Properties mentioned there were already supported. But the investigation revealed some gaps in Html coverage.

HyperlinkBase is the one property mentioned in the issue that was not supported for Xml, nor indeed for any other format. All the other document properties are 'meta'; but HyperlinkBase is functional - if you supply a relative address for a link, Excel will use HyperlinkBase, if supplied, to convert to an absolute address. (Default is directory where spreadsheet is located.) Here's a summary of how this PR will handle this property for various formats:
- Support is added for Xlsx read and write.
- Support is added for Xml read (there is no Xml writer). Ironically, Excel messes up this processing when reading an Xml spreadsheet; however, PhpSpreadsheet will get it right.
- HyperlinkBase is supported for Xls, but I have no idea how to read or write this property. For now, when writing hyperlinked cells, PhpSpreadsheet will be changed to convert any relative addresses that it can detect to absolute references by adding HyperlinkBase to the relative address. In a similar vein, Xls supports custom properties, but PhpSpreadsheet does not know how to read or write those.
- Gnumeric has no equivalent property, so nothing needs to be done to its reader. Since we don't have a Gnumeric writer, that's not really a problem for us.
- Odt has no equivalent property, so nothing needs to be done to its reader. The Odt writer does not have any special logic for hyperlinks, so, at least for now, will remain unchanged.
- Csv has no equivalent property, so nothing needs to be done to its reader. The Csv writer does not have any special logic for hyperlinks, so, at least for now, will remain unchanged.
- Html allows for an equivalent `base` tag in the head section. Support for this is added to Html reader and writer.

Html Writer was only handling 8 of the 11 'core' properties. Support is added for `created`, `modified`, and `lastModifiedBy`. Custom properties were not supported at all, and now are.

Html Reader did not support any properties. It will now support all of them.

* Scrutinizer

Remove one dead reference.
  • Loading branch information
oleibman authored Jun 3, 2023
1 parent 3aab263 commit a0a9b2b
Show file tree
Hide file tree
Showing 9 changed files with 445 additions and 10 deletions.
14 changes: 14 additions & 0 deletions src/PhpSpreadsheet/Document/Properties.php
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,8 @@ class Properties
*/
private $customProperties = [];

private string $hyperlinkBase = '';

/**
* Create a new Document Properties instance.
*/
Expand Down Expand Up @@ -534,4 +536,16 @@ public static function convertPropertyType(string $propertyType): string
{
return self::PROPERTY_TYPE_ARRAY[$propertyType] ?? self::PROPERTY_TYPE_UNKNOWN;
}

public function getHyperlinkBase(): string
{
return $this->hyperlinkBase;
}

public function setHyperlinkBase(string $hyperlinkBase): self
{
$this->hyperlinkBase = $hyperlinkBase;

return $this;
}
}
89 changes: 88 additions & 1 deletion src/PhpSpreadsheet/Reader/Html.php
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
use DOMText;
use PhpOffice\PhpSpreadsheet\Cell\Coordinate;
use PhpOffice\PhpSpreadsheet\Cell\DataType;
use PhpOffice\PhpSpreadsheet\Document\Properties;
use PhpOffice\PhpSpreadsheet\Helper\Dimension as CssDimension;
use PhpOffice\PhpSpreadsheet\Reader\Security\XmlScanner;
use PhpOffice\PhpSpreadsheet\Spreadsheet;
Expand Down Expand Up @@ -685,10 +686,94 @@ public function loadIntoExisting($filename, Spreadsheet $spreadsheet)
if ($loaded === false) {
throw new Exception('Failed to load ' . $filename . ' as a DOM Document', 0, $e ?? null);
}
self::loadProperties($dom, $spreadsheet);

return $this->loadDocument($dom, $spreadsheet);
}

private static function loadProperties(DOMDocument $dom, Spreadsheet $spreadsheet): void
{
$properties = $spreadsheet->getProperties();
foreach ($dom->getElementsByTagName('meta') as $meta) {
$metaContent = (string) $meta->getAttribute('content');
if ($metaContent !== '') {
$metaName = (string) $meta->getAttribute('name');
switch ($metaName) {
case 'author':
$properties->setCreator($metaContent);

break;
case 'category':
$properties->setCategory($metaContent);

break;
case 'company':
$properties->setCompany($metaContent);

break;
case 'created':
$properties->setCreated($metaContent);

break;
case 'description':
$properties->setDescription($metaContent);

break;
case 'keywords':
$properties->setKeywords($metaContent);

break;
case 'lastModifiedBy':
$properties->setLastModifiedBy($metaContent);

break;
case 'manager':
$properties->setManager($metaContent);

break;
case 'modified':
$properties->setModified($metaContent);

break;
case 'subject':
$properties->setSubject($metaContent);

break;
case 'title':
$properties->setTitle($metaContent);

break;
default:
if (preg_match('/^custom[.](bool|date|float|int|string)[.](.+)$/', $metaName, $matches) === 1) {
switch ($matches[1]) {
case 'bool':
$properties->setCustomProperty($matches[2], (bool) $metaContent, Properties::PROPERTY_TYPE_BOOLEAN);

break;
case 'float':
$properties->setCustomProperty($matches[2], (float) $metaContent, Properties::PROPERTY_TYPE_FLOAT);

break;
case 'int':
$properties->setCustomProperty($matches[2], (int) $metaContent, Properties::PROPERTY_TYPE_INTEGER);

break;
case 'date':
$properties->setCustomProperty($matches[2], $metaContent, Properties::PROPERTY_TYPE_DATE);

break;
default: // string
$properties->setCustomProperty($matches[2], $metaContent, Properties::PROPERTY_TYPE_STRING);
}
}
}
}
}
if (!empty($dom->baseURI)) {
$properties->setHyperlinkBase($dom->baseURI);
}
}

private static function replaceNonAscii(array $matches): string
{
return '&#' . mb_ord($matches[0], 'UTF-8') . ';';
Expand Down Expand Up @@ -719,8 +804,10 @@ public function loadFromString($content, ?Spreadsheet $spreadsheet = null): Spre
if ($loaded === false) {
throw new Exception('Failed to load content as a DOM Document', 0, $e ?? null);
}
$spreadsheet = $spreadsheet ?? new Spreadsheet();
self::loadProperties($dom, $spreadsheet);

return $this->loadDocument($dom, $spreadsheet ?? new Spreadsheet());
return $this->loadDocument($dom, $spreadsheet);
}

/**
Expand Down
3 changes: 3 additions & 0 deletions src/PhpSpreadsheet/Reader/Xlsx/Properties.php
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,9 @@ public function readExtendedProperties(string $propertyData): void
if (isset($xmlCore->Manager)) {
$this->docProps->setManager((string) $xmlCore->Manager);
}
if (isset($xmlCore->HyperlinkBase)) {
$this->docProps->setHyperlinkBase((string) $xmlCore->HyperlinkBase);
}
}
}

Expand Down
19 changes: 11 additions & 8 deletions src/PhpSpreadsheet/Reader/Xml/Properties.php
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,10 @@ protected function processStandardProperty(
case 'Manager':
$docProps->setManager($stringValue);

break;
case 'HyperlinkBase':
$docProps->setHyperlinkBase($stringValue);

break;
case 'Keywords':
$docProps->setKeywords($stringValue);
Expand All @@ -110,17 +114,10 @@ protected function processCustomProperty(
?SimpleXMLElement $propertyValue,
SimpleXMLElement $propertyAttributes
): void {
$propertyType = DocumentProperties::PROPERTY_TYPE_UNKNOWN;

switch ((string) $propertyAttributes) {
case 'string':
$propertyType = DocumentProperties::PROPERTY_TYPE_STRING;
$propertyValue = trim((string) $propertyValue);

break;
case 'boolean':
$propertyType = DocumentProperties::PROPERTY_TYPE_BOOLEAN;
$propertyValue = (bool) $propertyValue;
$propertyValue = (bool) (string) $propertyValue;

break;
case 'integer':
Expand All @@ -134,9 +131,15 @@ protected function processCustomProperty(

break;
case 'dateTime.tz':
case 'dateTime.iso8601tz':
$propertyType = DocumentProperties::PROPERTY_TYPE_DATE;
$propertyValue = trim((string) $propertyValue);

break;
default:
$propertyType = DocumentProperties::PROPERTY_TYPE_STRING;
$propertyValue = trim((string) $propertyValue);

break;
}

Expand Down
42 changes: 41 additions & 1 deletion src/PhpSpreadsheet/Writer/Html.php
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,11 @@
use PhpOffice\PhpSpreadsheet\Cell\Cell;
use PhpOffice\PhpSpreadsheet\Cell\Coordinate;
use PhpOffice\PhpSpreadsheet\Chart\Chart;
use PhpOffice\PhpSpreadsheet\Document\Properties;
use PhpOffice\PhpSpreadsheet\RichText\RichText;
use PhpOffice\PhpSpreadsheet\RichText\Run;
use PhpOffice\PhpSpreadsheet\Settings;
use PhpOffice\PhpSpreadsheet\Shared\Date;
use PhpOffice\PhpSpreadsheet\Shared\Drawing as SharedDrawing;
use PhpOffice\PhpSpreadsheet\Shared\File;
use PhpOffice\PhpSpreadsheet\Shared\Font as SharedFont;
Expand Down Expand Up @@ -342,13 +344,21 @@ public function writeAllSheets()

private static function generateMeta(?string $val, string $desc): string
{
return $val
return ($val || $val === '0')
? (' <meta name="' . $desc . '" content="' . htmlspecialchars($val, Settings::htmlEntityFlags()) . '" />' . PHP_EOL)
: '';
}

public const BODY_LINE = ' <body>' . PHP_EOL;

private const CUSTOM_TO_META = [
Properties::PROPERTY_TYPE_BOOLEAN => 'bool',
Properties::PROPERTY_TYPE_DATE => 'date',
Properties::PROPERTY_TYPE_FLOAT => 'float',
Properties::PROPERTY_TYPE_INTEGER => 'int',
Properties::PROPERTY_TYPE_STRING => 'string',
];

/**
* Generate HTML header.
*
Expand All @@ -374,6 +384,36 @@ public function generateHTMLHeader($includeStyles = false)
$html .= self::generateMeta($properties->getCategory(), 'category');
$html .= self::generateMeta($properties->getCompany(), 'company');
$html .= self::generateMeta($properties->getManager(), 'manager');
$html .= self::generateMeta($properties->getLastModifiedBy(), 'lastModifiedBy');
$date = Date::dateTimeFromTimestamp((string) $properties->getCreated());
$date->setTimeZone(Date::getDefaultOrLocalTimeZone());
$html .= self::generateMeta($date->format(DATE_W3C), 'created');
$date = Date::dateTimeFromTimestamp((string) $properties->getModified());
$date->setTimeZone(Date::getDefaultOrLocalTimeZone());
$html .= self::generateMeta($date->format(DATE_W3C), 'modified');

$customProperties = $properties->getCustomProperties();
foreach ($customProperties as $customProperty) {
$propertyValue = $properties->getCustomPropertyValue($customProperty);
$propertyType = $properties->getCustomPropertyType($customProperty);
$propertyQualifier = self::CUSTOM_TO_META[$propertyType] ?? null;
if ($propertyQualifier !== null) {
if ($propertyType === Properties::PROPERTY_TYPE_BOOLEAN) {
$propertyValue = $propertyValue ? '1' : '0';
} elseif ($propertyType === Properties::PROPERTY_TYPE_DATE) {
$date = Date::dateTimeFromTimestamp((string) $propertyValue);
$date->setTimeZone(Date::getDefaultOrLocalTimeZone());
$propertyValue = $date->format(DATE_W3C);
} else {
$propertyValue = (string) $propertyValue;
}
$html .= self::generateMeta($propertyValue, "custom.$propertyQualifier.$customProperty");
}
}

if (!empty($properties->getHyperlinkBase())) {
$html .= ' <base href="' . $properties->getHyperlinkBase() . '" />' . PHP_EOL;
}

$html .= $includeStyles ? $this->generateStyles(true) : $this->generatePageDeclarations(true);

Expand Down
7 changes: 7 additions & 0 deletions src/PhpSpreadsheet/Writer/Xls/Worksheet.php
Original file line number Diff line number Diff line change
Expand Up @@ -503,6 +503,8 @@ public function close(): void
$this->writeMergedCells();

// Hyperlinks
$phpParent = $phpSheet->getParent();
$hyperlinkbase = ($phpParent === null) ? '' : $phpParent->getProperties()->getHyperlinkBase();
foreach ($phpSheet->getHyperLinkCollection() as $coordinate => $hyperlink) {
[$column, $row] = Coordinate::indexesFromString($coordinate);

Expand All @@ -513,6 +515,11 @@ public function close(): void
$url = str_replace('sheet://', 'internal:', $url);
} elseif (preg_match('/^(http:|https:|ftp:|mailto:)/', $url)) {
// URL
} elseif (!empty($hyperlinkbase) && preg_match('~^([A-Za-z]:)?[/\\\\]~', $url) !== 1) {
$url = "$hyperlinkbase$url";
if (preg_match('/^(http:|https:|ftp:|mailto:)/', $url) !== 1) {
$url = 'external:' . $url;
}
} else {
// external (local file)
$url = 'external:' . $url;
Expand Down
3 changes: 3 additions & 0 deletions src/PhpSpreadsheet/Writer/Xlsx/DocProps.php
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,9 @@ public function writeDocPropsApp(Spreadsheet $spreadsheet)
// SharedDoc
$objWriter->writeElement('SharedDoc', 'false');

// HyperlinkBase
$objWriter->writeElement('HyperlinkBase', $spreadsheet->getProperties()->getHyperlinkBase());

// HyperlinksChanged
$objWriter->writeElement('HyperlinksChanged', 'false');

Expand Down
Loading

0 comments on commit a0a9b2b

Please sign in to comment.