Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xls::loadSpreadsheetFromFile: trim(): Passing null to parameter #1 ($string) of type string is deprecated #3935

Closed
1 task done
rx80 opened this issue Mar 6, 2024 · 11 comments · Fixed by #3942
Closed
1 task done

Comments

@rx80
Copy link

rx80 commented Mar 6, 2024

This is:

- [x] a bug report
- [x] a feature request
- [ ] **not** a usage question (ask them on https://stackoverflow.com/questions/tagged/phpspreadsheet or https://gitter.im/PHPOffice/PhpSpreadsheet)

What is the expected behavior?

No error, loading should be allowed so data can still be extracted.

What is the current behavior?

PHP error: trim(): Passing null to parameter #1 ($string) of type string is deprecated

What are the steps to reproduce?

I have received a xls file which i can not share, and trying to remove confidential data and then saving corrects the error.
The error occurs here:

$sheetName = trim($explodes[0], "'");

The contents of $this->definedname are:

[
  0 => [
    'isBuiltInName' => 1,
    'name' => '
',
    'formula' => '',
    'scope' => 1,
  ],
  1 => [
    'isBuiltInName' => 0,
    'name' => '_xlfn.IFERROR',
    'formula' => '#NAME?',
    'scope' => 0,
  ],
  2 => [
    'isBuiltInName' => 1,
    'name' => '',
    'formula' => '',
    'scope' => 1,
  ],
]

because 'forumla' is empty, the call to Worksheet::extractSheetTitle returns [null, null].
Simply skipping over the offending entry allows the xls to load and display contents:

if ($explodes[0] === null) continue;

However, i don't know if this can have other unintended consequences.

What features do you think are causing the issue

  • Reader

Does an issue affect all spreadsheet file formats? If not, which formats are affected?

Xls

Which versions of PhpSpreadsheet and PHP are affected?

This is in PhpSpreadsheet main branch, PHP 8.3.3

@oleibman
Copy link
Collaborator

oleibman commented Mar 7, 2024

It is difficult to make a fix without a test case. But let's see if we can get more information. The following code in Reader/Xlsx function readDefinedName catches my eye.

            try {
                $formula = $this->getFormulaFromStructure($formulaStructure);
            } catch (PhpSpreadsheetException) {
                $formula = '';
            }

Can you add some debugging code to see if we catch an error here (that seems like one reasonable way for formula to get its null-string value)? And, if you can, can you also capture the values of $formulaStructure (probably converted via bin2hex), $isBuiltInName, $string['value'], and $scope? My sense is that it seems logical to set $isBuiltInName to 0 (it presumably starts life as 32) when we set $formula to null-string, thereby possibly avoiding the later exception, and maybe that is sufficient to fix your problem, so you may as well try that too.

@rx80
Copy link
Author

rx80 commented Mar 7, 2024

@oleibman Of course, here are the values you requested:

[
    [exception] => 'Unrecognized token 3D in formula'
    [bin2hex($recordData)] => '210000010b000000010000000000000d3d0000dd000000dd000000'
    [bin2hex($formulaStructure)] => '0b003d0000dd000000dd000000'
    [$isBuiltInName] => 1
    [bin2hex($string["value"])] => '0d'
    [$string["size"]] => 2
    [$scope] => 1,
]

... edited to add full $recordData

@oleibman
Copy link
Collaborator

oleibman commented Mar 8, 2024

Okay, it looks like my theory was right. Now, if you add $isBuiltInName = 0; to the catch, are you able to run without error? If so, that is probably sufficient. But let's see if we can figure out what the missing token represents. If you just load (with the additional statement in the catch) and save the spreadsheet to a different file name, can you compare the defined names in the original and the "copy" and see which ones are missing? Perhaps this will enable us to come up with a more useful solution (and a test case).

@rx80
Copy link
Author

rx80 commented Mar 8, 2024

Changing the section of the code to

  try {
      $formula = $this->getFormulaFromStructure($formulaStructure);
  } catch (PhpSpreadsheetException) {
      $formula = '';
      $isBuiltInName = 0;
  }

Gets rid of the deprecation warning, as you suggested.

With that change i did:

$reader = IOFactory::createReader(IOFactory::identify($file));
$xls = $reader->load($file);
$writer = new \PhpOffice\PhpSpreadsheet\Writer\Xls($xls);
$writer->save($file . '.new.xls');

To dump the defined names on reload, i added in Reader\Xls on line 1100:

var_dump($this->definedname);

Which now dumps an empty array: array(0) { }

@oleibman
Copy link
Collaborator

oleibman commented Mar 9, 2024

When you open up the original Xls spreadsheet in Excel, what defined names exist (Formulas ribbon, Name Manager)? (If you use an app other than Excel, I imagine there's still a way to get that information.) I assume that when you open the copy, you'll see that there are no defined names, but please confirm that as well.

@rx80
Copy link
Author

rx80 commented Mar 9, 2024

I use LibreOffice, but i imagine the original was created with Excel.
Opening it in LibreOffice i see 2 items in "Range names":

  • Excel_BuiltIn_Print_Area
  • Excel_BuiltIn__FilterDatabase

Here is a screenshot from LibreOffice's DevTools: https://imgur.com/a/van323Z
There are loads of properties on those objects, so i don't know which ones could be helpful, but it is no problem for me to look at more.

@oleibman
Copy link
Collaborator

oleibman commented Mar 9, 2024

Thank you for the information. Disappointingly, I was able to create an Xls spreadsheet which, when opened with LibreOffice, shows Excel_Builtin_FilterDatabase and Excel_Builtin_Print_Area, but PhpSpreadsheet has no problem reading that file. So I will go ahead with the fix you have tested. I would prefer to have a test file, but ... Expect a PR in a day or two.

oleibman added a commit to oleibman/PhpSpreadsheet that referenced this issue Mar 9, 2024
Fix PHPOffice#3935. Xls Reader cannot parse user's spreadsheet, failing on a token of 3d. I believe that, according to https://msopenspecs.azureedge.net/files/MS-XLS/%5bMS-XLS%5d.pdf, this represents a PtgAreaErr3d, i.e. an invalid reference. User cannot provide spreadsheet, but was quite forthcoming in providing additional debugging information. I am usually reluctant to make changes without a test case, however, in this case, the action being taken (treat "builtin" defined name as "not builtin" when it cannot be parsed) makes sense, and it satisfies the user's processing.
@rx80
Copy link
Author

rx80 commented Mar 10, 2024

I will try and get a buggy xls file with redacted cells, but at the moment i do not have one. Opening and saving the buggy file always removes the problems. Maybe i need to get my hands on the exact Excel version that produced that thing.

If you want me to show any other info from the LibreOffice DevTools in that file, let me know. There's tons of other properties and objects, but i see no "Export" to export the whole Named Ranges section.

@oleibman
Copy link
Collaborator

Thanks, I will keep the ticket open for about a week before installing. Better with a test file, but I'll make sure it goes in regardless.

@rx80
Copy link
Author

rx80 commented Mar 12, 2024

I tested with #3942
Can confirm it fixes the issue.
If you want, you can close this issue.
I will work on providing you with a test xls. What is the proper way to submit a test xls, in case i get it after you close this issue?
Do i just open a new issue and reference this issue?

@oleibman
Copy link
Collaborator

Thank you for the confirmation. Yes, opening a new issue referencing this one would be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants