Skip to content

Commit

Permalink
[BUGFIX] Handle if some tags are upper-case and strict-comparison
Browse files Browse the repository at this point in the history
The method getTagContent in class HtmlContentExtractor should also for
work for tags that are not lower case.

Resolves: #3940
  • Loading branch information
thomashohn authored and dkd-friedrich committed Feb 23, 2024
1 parent 9cc8ff0 commit 6254379
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion Classes/HtmlContentExtractor.php
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,8 @@ public function getTagContent(): array
foreach ($matches[1] as $key => $tag) {
// We don't want to index links auto-generated by the url filter.
$pattern = '@(?:http://|https://|ftp://|mailto:|smb://|afp://|file://|gopher://|news://|ssl://|sslv2://|sslv3://|tls://|tcp://|udp://|www\.)[a-zA-Z0-9]+@';
if ($tag != 'a' || !preg_match($pattern, $matches[2][$key])) {
$tag = strtolower((string)$tag);
if ($tag !== 'a' || !preg_match($pattern, $matches[2][$key])) {
$fieldName = $this->tagToFieldMapping[$tag];
$hasContentForFieldName = empty($result[$fieldName]);
$separator = ($hasContentForFieldName) ? '' : ' ';
Expand Down

0 comments on commit 6254379

Please sign in to comment.