Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cells with the quote prefix style are treated as quote prefixed in PHPSpreadsheet but as formuals in Excel, causing #VALUE issues when referenced #3495

Closed
1 of 8 tasks
fdjohnston opened this issue Mar 30, 2023 · 17 comments · Fixed by #3497
Labels
calculation engine reader/xlsx Reader for MS OfficeOpenXML-format (xlsx) spreadsheet files style writer/xlsx Writer for MS OfficeOpenXML-format (xlsx) spreadsheet files

Comments

@fdjohnston
Copy link
Contributor

This is:

- [x] a bug report
- [ ] a feature request
- [ ] **not** a usage question (ask them on https://stackoverflow.com/questions/tagged/phpspreadsheet or https://gitter.im/PHPOffice/PhpSpreadsheet)

I suspect this is related to #3335 which I raised previously.

What is the expected behavior?

A cell that contains a formula that is not quote prefixed, but have a quote prefixed style applied to them should be treated as a formulae by the formula engine.

What is the current behavior?

The cell is treated as being quote prefixed by the formula engine, resulting in #VALUE errors when the cell is reference in another formula.

What are the steps to reproduce?

Please provide a Minimal, Complete, and Verifiable example of code that exhibits the issue without relying on an external Excel file or a web server:

<?php

require __DIR__ . '/vendor/autoload.php';

$spreadsheet = new Spreadsheet();
$sheet1 = $spreadsheet->getActiveSheet();
$sheet1->getStyle('A1:B1')->setQuotePrefix(true);
$sheet1->getCell('A1')->setValueExplicit(17,  \PhpOffice\PhpSpreadsheet\Cell\DataType::TYPE_NUMERIC);
$sheet1->getCell('B1')->setValueExplicit('=A1+10',  \PhpOffice\PhpSpreadsheet\Cell\DataType::TYPE_FORMULA);
$sheet1->getCell('B2')->setValue('=B1+5');
var_dump($sheet1->getCell('B2')->getCalculatedValue());
die();

I know this seems like a contrived example, but Excel treats this differently.

I've attached an Excel file that illustrates how this situation can arise:
Test.xlsx

In this file, I created a quote prefixed value in A4. I then used the Format Painter tool to apply the format to A1:B1. I then selected A1:B1 and set their formats to "Number".
If Test.xlsx is opened using PHPSpreadsheet and the value of B2 is calculated, we get #VALUE rather than 32.

What features do you think are causing the issue

  • Reader
  • Writer
  • Styles
  • Data Validations
  • Formula Calculations
  • Charts
  • AutoFilter
  • Form Elements

The formula calculation engine relies on the style of the cell to determine if it is quote prefixed or not:

//  Quote-Prefixed cell values cannot be formulae, but are treated as strings
if ($cell !== null && $ignoreQuotePrefix === false && $cell->getStyle()->getQuotePrefix() === true) {
	return self::wrapResult((string) $formula);
}

However, it seems possible to assign a quote prefixed style to a cell, but still have Excel treat it as a number/formula.

I can make the error go away by editing the first line of the try in calculateCellValue (Calculation.php) to use the $ignoreQuotePrefix param that was added in #3336:

try {
	$result = self::unwrapResult($this->_calculateFormulaValue($cell->getValue(), $cell->getCoordinate(), $cell, true));
	if ($this->spreadsheet === null) {
		throw new Exception('null spreadsheet in calculateCellValue');
	}

However, I'm not sure that's the right fix as it may cause other issues with actual quote prefixed values.

I suspect the actual fix will have to look at the contents of the cell to determine if it is, in fact, quote prefixed, despite what the associated style is telling us.

Does an issue affect all spreadsheet file formats? If not, which formats are affected?

XLSX

Which versions of PhpSpreadsheet and PHP are affected?

1.28 & master

@MarkBaker MarkBaker added reader/xlsx Reader for MS OfficeOpenXML-format (xlsx) spreadsheet files writer/xlsx Writer for MS OfficeOpenXML-format (xlsx) spreadsheet files calculation engine style labels Mar 30, 2023
@MarkBaker
Copy link
Member

This feels like a complete contradiction. If a cell is quote prefixed, then it should be treated as a value and never as a formula. If I manually quote prefix B2, then it's behaviour is as expected. If I look at your sample in Excel, I can't see any indication to suggest that cell B2 should be quote prefixed; only if I look in the file itself, when it has an additional attribute applyNumberFormat that is ignored by PhpSpreadsheet.
So the rule should be that the value is treated as a formula unless it is set with quotePrefix true; but only if applyNumberFormat is false, when it is still treated as a formula.

We don't currently read/store the applyNumberFormat style attribute: to implement it, we'd need to modify the style classes to store that attribute value (defaulting to false); modify the Reader to store it when loading a file; modify the Writers to save it; and modify the Calculation Engine code to check whether it was set (alongside the existing check for quotePrefix).
We'd also need to check how this worked with Xls, Ods and Gnumeric formats as well; to see if there is an equivalent property or attribute.

That's not going to be a quick solution, and won't be high priority for what seems very much an extreme edge case.

@fdjohnston
Copy link
Contributor Author

Completely agree this makes no sense. There is absolutely no visual indication in Excel that this situation exists. It took me a few hours to get to the bottom of what's going on.

I wasn't aware of the applyNumberFormat attribute; I figured this was another case of Excel doing something "magic" (similar to the weird currency and percentage issues we fixed previously), seeing there wasn't actually a quote in the cell and treating it as a formula.

Out of curiosity, are you aware of the reason that PHPSpreadsheet ignores the applyNumberFormat attribute? It appears in the DocumentFormat.OpenXml.Spreadsheet/CellFormat spec along with quotePrefix, so presumably other spreadsheet software that implements that spec should support it? Not trying to be critical, just curious if there is some context or history there.

This is definitely an edge case. I'm working with some very old files that started their lives back in the late 90s/early 2000s as XLS files and have been worked on by many different people over the years. I can't imagine all the odd formatting they have been subjected to in that time.

I can fix the issue on my end by adjusting the formatting in the file, but I thought this was worth capturing since it is an inconsistency with how Excel works.

Since I think it's an interesting challenge I may put some time into a PR if you're open to it.

@MarkBaker
Copy link
Member

Completely agree this makes no sense. There is absolutely no visual indication in Excel that this situation exists. It took me a few hours to get to the bottom of what's going on.

I wasn't aware of the applyNumberFormat attribute; I figured this was another case of Excel doing something "magic" (similar to the weird currency and percentage issues we fixed previously), seeing there wasn't actually a quote in the cell and treating it as a formula.

At least the currency/percentages was documented behaviour that we simply hadn't previously implemented; but I can't find any documentation that explains this behaviour with applyNumberFormat and quotePrefix.
Even MS don't seem to know about it; I found a few references in MS support/help where the support engineers themselves couldn't really explain.

Out of curiosity, are you aware of the reason that PHPSpreadsheet ignores the applyNumberFormat attribute? It appears in the DocumentFormat.OpenXml.Spreadsheet/CellFormat spec along with quotePrefix, so presumably other spreadsheet software that implements that spec should support it? Not trying to be critical, just curious if there is some context or history there.

It ignores it because I never wrote it: this is the first case where I've ever found it used; I've never seen a file with the attribute value before. Priority when writing PhpSpreadsheet was always given to functionality/features that a significant number of end-user developers actually used, encountered or wanted in real spreadsheets; that could easily be replicated; that added value; and that were achievable without a partial implementation crashing the library. When you work a full 40-hr week, then a further 50+ hours on PhpOffice, you quickly learn to prioritise.

Something like Pivot Tables has the demand and value, but is very difficult to implement, and breaks the library if not fully implemented, so a medium priority. I'd love to work on the new DataTypes for Excel, but high complexity and so far no demand, so lowest priority.

I wouldn't guarantee that other libraries do support it; I've just taken a quick look at OpenPyXl and can't find it there (although OpenPyXl was originally based on PhpExcel); nor in Apache POI. But it is at least partly supported in other Office GUIs like OpenOffice.

This is definitely an edge case. I'm working with some very old files that started their lives back in the late 90s/early 2000s as XLS files and have been worked on by many different people over the years. I can't imagine all the odd formatting they have been subjected to in that time.

I can fix the issue on my end by adjusting the formatting in the file, but I thought this was worth capturing since it is an inconsistency with how Excel works.

Since I think it's an interesting challenge I may put some time into a PR if you're open to it.

If you like; though I'd recommend several PRs for the different aspects (storing against the Style and Xlsx Reader as one PR, then changes to the Calc Engine, and a third PR for the Xlsx Writer). Any further PRs for other file formats (OASIS, Gnumeric, SpreadsheetML) will require reading specs and experimentation to see if it's supported in those formats.

@fdjohnston
Copy link
Contributor Author

At least the currency/percentages was documented behaviour that we simply hadn't previously implemented; but I can't find any documentation that explains this behaviour with applyNumberFormat and quotePrefix. Even MS don't seem to know about it; I found a few references in MS support/help where the support engineers themselves couldn't really explain.

Agreed. This is a very odd one, and I can't find any documentation on how that attribute is supposed to behave.

It ignores it because I never wrote it: this is the first case where I've ever found it used; I've never seen a file with the attribute value before. Priority when writing PhpSpreadsheet was always given to functionality/features that a significant number of end-user developers actually used, encountered or wanted in real spreadsheets; that could easily be replicated; that added value; and that were achievable without a partial implementation crashing the library. When you work a full 40-hr week, then a further 50+ hours on PhpOffice, you quickly learn to prioritise.

Completely understand. I was curious if there was some technical reason why (e.g. extremely complex to implement, etc), but totally understand the need to prioritize and focus on high demand/high return features.

I wouldn't guarantee that other libraries do support it; I've just taken a quick look at OpenPyXl and can't find it there (although OpenPyXl was originally based on PhpExcel); nor in Apache POI. But it is at least partly supported in other Office GUIs like OpenOffice.

I figured if it was part of the spec, other software like Apache or Libre Office would include it... but perhaps their engineers have done a better job than the Excel folks and haven't created this odd situation.

If you like; though I'd recommend several PRs for the different aspects (storing against the Style and Xlsx Reader as one PR, then changes to the Calc Engine, and a third PR for the Xlsx Writer). Any further PRs for other file formats (OASIS, Gnumeric,

Sounds good. I'm probably missing something, but I've made some good progress on this already and am just writing some tests. Should be able to push something out in a day or so.

SpreadsheetML) will require reading specs and experimentation to see if it's supported in those formats.

Interestingly, quotePrefix doesn't appear in the readers or writers for the other formats. I haven't done any looking into their specs, but I was expecting to see it somewhere other than in XLSX. If quotePrefix is confined to the XLSX reader and writer, perhaps it is reasonable to assume that we can do the same for applyNumberFormat?

@MarkBaker
Copy link
Member

I figured if it was part of the spec, other software like Apache or Libre Office would include it... but perhaps their engineers have done a better job than the Excel folks and haven't created this odd situation.

Apache POI reads/writes all attributes as "generic", and provides getters/setters; but using it would be "caveat emptor"; it isn't used anywhere in their code logic, so it's entirely dependent on the end-user developer understanding how to use it.
I'll need to do tests with Open/LibreOffice to see how their calculation engine handles it.

Interestingly, quotePrefix doesn't appear in the readers or writers for the other formats. I haven't done any looking into their specs, but I was expecting to see it somewhere other than in XLSX. If quotePrefix is confined to the XLSX reader and writer, perhaps it is reasonable to assume that we can do the same for applyNumberFormat?

I don't think I ever checked if it was used by other Readers/Writers; in most cases we implement for Xlsx first, then we might add it for other formats later. Selected Worksheet and Cells were only added for the Ods Reader/Writer last year, even though they've always been there since day 1 for Xlsx/Xlsx.

@oleibman
Copy link
Collaborator

I'm coming late to the party, but I don't think applyNumberFormat is the cause here. I created the attached spreadsheet by entering a quote-prefixed value in B1. Then I overwrote it with a non-quote-prefixed formula. The style assigned to the cell still specifies quotePrefix=1 but it omits applyNumberFormat. Nevertheless the cell is evaluated as a formula when Excel opens it.
issue.3495b.xlsx

@fdjohnston
Copy link
Contributor Author

Great catch @oleibman. So in that case, could the solution be as simple as:

//  Quote-Prefixed cell values cannot be formulae, but are treated as strings
if ($cell !== null && $ignoreQuotePrefix === false && substr($formula, 0, 1) === "'") {
    return self::wrapResult((string) $formula);
}

Maybe Excel is ignoring the quotePrefix attribute altogether and looking for an actual quote in the cell, similar to how it looks in strings for $ and %?

@MarkBaker
Copy link
Member

Maybe Excel is ignoring the quotePrefix attribute altogether and looking for an actual quote in the cell, similar to how it looks in strings for $ and %?

No, the quote prefix isn't part of the cell value, but a style setting that tells Excel that the value should be treated as a string and not as a formula; that much can be verified, and PhpSpreadsheet's check is correct... as far as it goes. We need to identify what the exception is to that rule.

I'm trying to remember the case I'd identified where $ignoreQuotePrefix was required; it was something to do with Defined Names, which did require an evaluation of the formula even if the cell was quote prefixed... ah! Issue #3355, it was a "workround" flag.

@fdjohnston
Copy link
Contributor Author

I think you mean #3335

In that case it was named cells being referenced on other sheets, though I suspect the root cause is likely the same as this case.
The workaround flag was added because the named cell didn't have a style, so the formula engine was falling back on the style for A1 which has previously been quotedPrefixed and then cleared out.

No, the quote prefix isn't part of the cell value, but a style setting that tells Excel that the value should be treated as a string and not as a formula; that much can be verified, and PhpSpreadsheet's check is correct... as far as it goes. We need to identify what the exception is to that rule.

You're right. After adding a quote prefixed value to @oleibman 's example we get:
issue.3495c.xlsx

<sheetData>
	<row r="1" spans="1:2" x14ac:dyDescent="0.25">
		<c r="A1">
			<v>1</v>
		</c>
		<c r="B1" s="1">
			<f>2+3</f>
			<v>5</v>
		</c>
	</row>
	<row r="2" spans="1:2" x14ac:dyDescent="0.25">
		<c r="A2" s="1" t="s">
			<v>0</v>
		</c>
	</row>
</sheetData>

It looks like A2, the cell with the quote prefixed string in it, has a type of string. B1 and A2 share the same style, which is quote prefixed.
Could it be the presence of the string type? Or the lack of it on B1 and the existence of a formula?

@MarkBaker
Copy link
Member

It looks like A2, the cell with the quote prefixed string in it, has a type of string. B1 and A2 share the same style, which is quote prefixed.
Could it be the presence of the string type? Or the lack of it on B1 and the existence of a formula?

I don't think so; I still think that it's a style over-ride: the other style difference that I see in your file is numFmtId="2" in the style settings for B1 in your file

@oleibman
Copy link
Collaborator

I'm guessing, but I think it might be reasonable in Reader/Xlsx.php to set the cell's quotePrefix style to false inside castToFormula. In the cases where quotePrefix should be applied, I think there will not be a f tag, so no invocation of castToFormula. But in the cases that we're looking at, there is an f tag, so it will be invoked. If it works, I don't think this will break anything in any meaningful way.

@fdjohnston
Copy link
Contributor Author

I don't think so; I still think that it's a style over-ride: the other style difference that I see in your file is numFmtId="2" in the style settings for B1 in your file

I can see that in my original file, Test.xlsx, but in my most recent file, issue.3495c.xlsx, based on @oleibman's much simpler file, all I can find is numFmtId="0" everywhere.

Interestingly there is an entry in calcChain.xml that references B1. I don't know what the purpose of that file is, though.

I'm guessing, but I think it might be reasonable in Reader/Xlsx.php to set the cell's quotePrefix style to false inside castToFormula. In the cases where quotePrefix should be applied, I think there will not be a f tag, so no invocation of castToFormula. But in the cases that we're looking at, there is an f tag, so it will be invoked. If it works, I don't think this will break anything in any meaningful way.

Makes sense to me. Other than the entry in calcChain.xml, the presence of the f tag for B1 appears to be the only difference.

@MarkBaker
Copy link
Member

Interestingly there is an entry in calcChain.xml that references B1. I don't know what the purpose of that file is, though.

We don't load the calcChain because it's a big memory drain, and doesn't really add value to PhpSpreadsheet. It's a grid of calculation dependencies, that allows Excel to see that if a change is made to the value in cell X1, then it needs to update the calculated values for cells Y1 and X2, which in turn require updates to calculated cells Z1 and X3, etc.

Excel doesn't need it either, because it can always recreate it on opening a file

oleibman added a commit to oleibman/PhpSpreadsheet that referenced this issue Apr 1, 2023
Fix PHPOffice#3495. This seems to be a bug in Excel, one which it manages to cover up but PhpSpreadsheet is affected. User enters a formula preceded by an apostrophe into a cell. Excel turns on `quotePrefix` style and stores the data as a string rather than a formula. User now enters a formula not preceded by an apostrophe into the same cell. Excel stores it is a formula but does not turn `quotePrefix` off. When the spreadsheet is saved, the cell's style specifies `quotePrefix`, but the cell's content indicates it's a formula. Till now, PhpSpreadsheet sees that quotePrefix is set, and therefore treats the cell's contents as a string rather than a formula. This PR will change that behavior so that quotePrefix is automatically turned off when Xlsx Reader sees that the cell indicates that it is a formula.
@oleibman
Copy link
Collaborator

oleibman commented Apr 1, 2023

Please test with 3497 if possible.

@fdjohnston
Copy link
Contributor Author

Please test with 3497 if possible.

Will do today. Thanks for this! I'll let you know how it goes.

@fdjohnston
Copy link
Contributor Author

Please test with 3497 if possible.

Just tested your PR and it fixes the issue in both my original (very old) spreadsheet and in my sample file. I agree with your remarks on in the PR that this does appear to be a bug in Excel.
Glad you joined the conversation. It seems we were going down a bit of a rabbit hole. Thanks so much for taking the time to look at this.

@oleibman
Copy link
Collaborator

oleibman commented Apr 3, 2023

Thank you for confirming. I will merge this in a day or two.

oleibman added a commit that referenced this issue Apr 4, 2023
Fix #3495. This seems to be a bug in Excel, one which it manages to cover up but PhpSpreadsheet is affected. User enters a formula preceded by an apostrophe into a cell. Excel turns on `quotePrefix` style and stores the data as a string rather than a formula. User now enters a formula not preceded by an apostrophe into the same cell. Excel stores it is a formula but does not turn `quotePrefix` off. When the spreadsheet is saved, the cell's style specifies `quotePrefix`, but the cell's content indicates it's a formula. Till now, PhpSpreadsheet sees that quotePrefix is set, and therefore treats the cell's contents as a string rather than a formula. This PR will change that behavior so that quotePrefix is automatically turned off when Xlsx Reader sees that the cell indicates that it is a formula.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
calculation engine reader/xlsx Reader for MS OfficeOpenXML-format (xlsx) spreadsheet files style writer/xlsx Writer for MS OfficeOpenXML-format (xlsx) spreadsheet files
Development

Successfully merging a pull request may close this issue.

3 participants