-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cells with the quote prefix style are treated as quote prefixed in PHPSpreadsheet but as formuals in Excel, causing #VALUE issues when referenced #3495
Comments
This feels like a complete contradiction. If a cell is quote prefixed, then it should be treated as a value and never as a formula. If I manually quote prefix B2, then it's behaviour is as expected. If I look at your sample in Excel, I can't see any indication to suggest that cell B2 should be quote prefixed; only if I look in the file itself, when it has an additional attribute We don't currently read/store the That's not going to be a quick solution, and won't be high priority for what seems very much an extreme edge case. |
Completely agree this makes no sense. There is absolutely no visual indication in Excel that this situation exists. It took me a few hours to get to the bottom of what's going on. I wasn't aware of the Out of curiosity, are you aware of the reason that PHPSpreadsheet ignores the This is definitely an edge case. I'm working with some very old files that started their lives back in the late 90s/early 2000s as XLS files and have been worked on by many different people over the years. I can't imagine all the odd formatting they have been subjected to in that time. I can fix the issue on my end by adjusting the formatting in the file, but I thought this was worth capturing since it is an inconsistency with how Excel works. Since I think it's an interesting challenge I may put some time into a PR if you're open to it. |
At least the currency/percentages was documented behaviour that we simply hadn't previously implemented; but I can't find any documentation that explains this behaviour with
It ignores it because I never wrote it: this is the first case where I've ever found it used; I've never seen a file with the attribute value before. Priority when writing PhpSpreadsheet was always given to functionality/features that a significant number of end-user developers actually used, encountered or wanted in real spreadsheets; that could easily be replicated; that added value; and that were achievable without a partial implementation crashing the library. When you work a full 40-hr week, then a further 50+ hours on PhpOffice, you quickly learn to prioritise. Something like Pivot Tables has the demand and value, but is very difficult to implement, and breaks the library if not fully implemented, so a medium priority. I'd love to work on the new DataTypes for Excel, but high complexity and so far no demand, so lowest priority. I wouldn't guarantee that other libraries do support it; I've just taken a quick look at OpenPyXl and can't find it there (although OpenPyXl was originally based on PhpExcel); nor in Apache POI. But it is at least partly supported in other Office GUIs like OpenOffice.
If you like; though I'd recommend several PRs for the different aspects (storing against the Style and Xlsx Reader as one PR, then changes to the Calc Engine, and a third PR for the Xlsx Writer). Any further PRs for other file formats (OASIS, Gnumeric, SpreadsheetML) will require reading specs and experimentation to see if it's supported in those formats. |
Agreed. This is a very odd one, and I can't find any documentation on how that attribute is supposed to behave.
Completely understand. I was curious if there was some technical reason why (e.g. extremely complex to implement, etc), but totally understand the need to prioritize and focus on high demand/high return features.
I figured if it was part of the spec, other software like Apache or Libre Office would include it... but perhaps their engineers have done a better job than the Excel folks and haven't created this odd situation.
Sounds good. I'm probably missing something, but I've made some good progress on this already and am just writing some tests. Should be able to push something out in a day or so.
Interestingly, |
Apache POI reads/writes all attributes as "generic", and provides getters/setters; but using it would be "caveat emptor"; it isn't used anywhere in their code logic, so it's entirely dependent on the end-user developer understanding how to use it.
I don't think I ever checked if it was used by other Readers/Writers; in most cases we implement for Xlsx first, then we might add it for other formats later. Selected Worksheet and Cells were only added for the Ods Reader/Writer last year, even though they've always been there since day 1 for Xlsx/Xlsx. |
I'm coming late to the party, but I don't think applyNumberFormat is the cause here. I created the attached spreadsheet by entering a quote-prefixed value in B1. Then I overwrote it with a non-quote-prefixed formula. The style assigned to the cell still specifies |
Great catch @oleibman. So in that case, could the solution be as simple as: // Quote-Prefixed cell values cannot be formulae, but are treated as strings
if ($cell !== null && $ignoreQuotePrefix === false && substr($formula, 0, 1) === "'") {
return self::wrapResult((string) $formula);
} Maybe Excel is ignoring the |
No, the quote prefix isn't part of the cell value, but a style setting that tells Excel that the value should be treated as a string and not as a formula; that much can be verified, and PhpSpreadsheet's check is correct... as far as it goes. We need to identify what the exception is to that rule. I'm trying to remember the case I'd identified where |
I think you mean #3335 In that case it was named cells being referenced on other sheets, though I suspect the root cause is likely the same as this case.
You're right. After adding a quote prefixed value to @oleibman 's example we get: <sheetData>
<row r="1" spans="1:2" x14ac:dyDescent="0.25">
<c r="A1">
<v>1</v>
</c>
<c r="B1" s="1">
<f>2+3</f>
<v>5</v>
</c>
</row>
<row r="2" spans="1:2" x14ac:dyDescent="0.25">
<c r="A2" s="1" t="s">
<v>0</v>
</c>
</row>
</sheetData> It looks like A2, the cell with the quote prefixed string in it, has a type of string. B1 and A2 share the same style, which is quote prefixed. |
I don't think so; I still think that it's a style over-ride: the other style difference that I see in your file is |
I'm guessing, but I think it might be reasonable in Reader/Xlsx.php to set the cell's quotePrefix style to false inside |
I can see that in my original file, Test.xlsx, but in my most recent file, issue.3495c.xlsx, based on @oleibman's much simpler file, all I can find is Interestingly there is an entry in
Makes sense to me. Other than the entry in |
We don't load the calcChain because it's a big memory drain, and doesn't really add value to PhpSpreadsheet. It's a grid of calculation dependencies, that allows Excel to see that if a change is made to the value in cell X1, then it needs to update the calculated values for cells Y1 and X2, which in turn require updates to calculated cells Z1 and X3, etc. Excel doesn't need it either, because it can always recreate it on opening a file |
Fix PHPOffice#3495. This seems to be a bug in Excel, one which it manages to cover up but PhpSpreadsheet is affected. User enters a formula preceded by an apostrophe into a cell. Excel turns on `quotePrefix` style and stores the data as a string rather than a formula. User now enters a formula not preceded by an apostrophe into the same cell. Excel stores it is a formula but does not turn `quotePrefix` off. When the spreadsheet is saved, the cell's style specifies `quotePrefix`, but the cell's content indicates it's a formula. Till now, PhpSpreadsheet sees that quotePrefix is set, and therefore treats the cell's contents as a string rather than a formula. This PR will change that behavior so that quotePrefix is automatically turned off when Xlsx Reader sees that the cell indicates that it is a formula.
Please test with 3497 if possible. |
Will do today. Thanks for this! I'll let you know how it goes. |
Just tested your PR and it fixes the issue in both my original (very old) spreadsheet and in my sample file. I agree with your remarks on in the PR that this does appear to be a bug in Excel. |
Thank you for confirming. I will merge this in a day or two. |
Fix #3495. This seems to be a bug in Excel, one which it manages to cover up but PhpSpreadsheet is affected. User enters a formula preceded by an apostrophe into a cell. Excel turns on `quotePrefix` style and stores the data as a string rather than a formula. User now enters a formula not preceded by an apostrophe into the same cell. Excel stores it is a formula but does not turn `quotePrefix` off. When the spreadsheet is saved, the cell's style specifies `quotePrefix`, but the cell's content indicates it's a formula. Till now, PhpSpreadsheet sees that quotePrefix is set, and therefore treats the cell's contents as a string rather than a formula. This PR will change that behavior so that quotePrefix is automatically turned off when Xlsx Reader sees that the cell indicates that it is a formula.
This is:
I suspect this is related to #3335 which I raised previously.
What is the expected behavior?
A cell that contains a formula that is not quote prefixed, but have a quote prefixed style applied to them should be treated as a formulae by the formula engine.
What is the current behavior?
The cell is treated as being quote prefixed by the formula engine, resulting in
#VALUE
errors when the cell is reference in another formula.What are the steps to reproduce?
Please provide a Minimal, Complete, and Verifiable example of code that exhibits the issue without relying on an external Excel file or a web server:
I know this seems like a contrived example, but Excel treats this differently.
I've attached an Excel file that illustrates how this situation can arise:
Test.xlsx
In this file, I created a quote prefixed value in A4. I then used the Format Painter tool to apply the format to A1:B1. I then selected A1:B1 and set their formats to "Number".
If Test.xlsx is opened using PHPSpreadsheet and the value of B2 is calculated, we get
#VALUE
rather than 32.What features do you think are causing the issue
The formula calculation engine relies on the style of the cell to determine if it is quote prefixed or not:
However, it seems possible to assign a quote prefixed style to a cell, but still have Excel treat it as a number/formula.
I can make the error go away by editing the first line of the
try
incalculateCellValue
(Calculation.php
) to use the$ignoreQuotePrefix
param that was added in #3336:However, I'm not sure that's the right fix as it may cause other issues with actual quote prefixed values.
I suspect the actual fix will have to look at the contents of the cell to determine if it is, in fact, quote prefixed, despite what the associated style is telling us.
Does an issue affect all spreadsheet file formats? If not, which formats are affected?
XLSX
Which versions of PhpSpreadsheet and PHP are affected?
1.28 & master
The text was updated successfully, but these errors were encountered: