Skip to content

Architecture of PicoXLSX (OOXML for spreadsheets)

Raphael Stoeckli edited this page Aug 25, 2018 · 10 revisions

Introduction

The architecture of PicoXLSX is based on the OOXML specifications for spreadsheets (spreadsheetML).
You can find the full documentation at: http://www.ecma-international.org/publications/standards/Ecma-376.htm An overview is available at: http://www.ecma-international.org/news/TC45_current_work/OpenXML%20White%20Paper.pdf
Also a good resource is the unofficial site: http://officeopenxml.com/

What is an XLSX file?

XLSX is the successor of the XLS format from Microsoft. XLS is a proprietary and binary format, whereas XLSX is based on XML and officially standardized by ECMA.
However, an XLSX file is not just one renamed XML file. It is a collection of several XML files, compiled by a packer. Simply said, an XLSX file is a renamed zip file that contains several XML files (and other formats in case of images or macros) and is organized in a specific folder structure. To explore an XLSX file, you simply can rename the XLSX file to the file extension .zip (test.xlsx → test.zip). Now you can unzip the file and browse the previously compiled (by the packer) XML files. Don't forget to make a backup of your file if it is important. Alternatively, you can use a tool like Media-Extractor to explore the content without tampering of the file.

What is the content of an XLSX file?

PicoXLSX adds only the mandatory minimum of XML files to the XLSX file. If you check a file generated by Microsoft Excel, you will find several additional files. Not mentioned here are the .rels files (maintaining relations between XML files) and the [Content_Types] file (summary of the content and its types). Furthermore, other files like .emf images or .bin files in case of macros may be embedded by Excel.

The mandatory files for a valid XLSX file are:

  • workbook.xml (in folder \xl): This file defines some global parameters for the whole workbook.
  • styles.xml (in folder \xl): This file defines all styles within the worksheets.
  • sheet[n].xml (in folder \xl\worksheets): Per worksheet is one single XML file defined. In each file are all cells and the contents defined.
  • sharedStrings.xml (in folder \xl): This file contains all strings of all worksheets in a list. The values in the worksheets are replaced with the corresponding ID of the string value in this document.
  • app.xml (in folder \docProps): This file defines application specific meta-data. This file is only mandatory if meta-data is used.
  • core.xml (in folder \docProps): This file defines document specific meta-data like keywords, creator or creation date/time. This file defines application specific meta-data. This file is only mandatory if meta-data is used.

Link between PicoXLSX and the OOXML standard

PicoXLSX uses the nomenclature of OOXML (most times) and defines classes based on the entities of OOXML.
Defined classes are:

  • Cell: Class representing a cell of a worksheet
  • Cell.Address: Class representing a cell address (e.g. "A20")
  • Cell.Range: Class representing a cell range (e.g. "C2:R150")
  • Cell.BasicFormulas (not in OOXML specs): Factory class for the most important Excel formulas (min, max, round, ceil, floor, median, average, sum and vlookup)
  • Column: Class representing a column of a worksheet
  • Metadata (not in OOXML specs): Class representing the meta data of a workbook
  • Style: Class representing a Style with sub classes within a style sheet. An instance of this class is only a container for the different sub-classes. These sub-classes contain the actual styling information.
  • Style.BasicStyles (not in OOXML specs): Factory class with the most important predefined styles
  • Style.Border: Class representing a Border entry. The Border entry is used to define frames and cell borders
  • Style.CellXf: Class representing an XF entry. The XF entry is used to make reference to other style instances like Border or Fill and for the positioning of the cell content
  • Style.Fill: Class representing a Fill (background) entry. The Fill entry is used to define background colors and fill patterns
  • Style.Font: Class representing a Font entry. The Font entry is used to define text formatting
  • StyleManager (not in OOXML specs): Class to handle all aspects of styles. Since the styling of Excel, the several aspects (e.g. fonts or fills) respectively, can become quite complicated, the class manages the styles and their subsets automatically.
  • Workbook: Class representing a workbook
  • Workbook.Shortener (not in OOXML specs): Class to ease the use of PicoXLSX4
  • Worksheet: Class representing the style sheet of a workbook
  • LowLevel (not in OOXML specs): Class for the whole low level handling like assembling the XML files and packing

Exceptions are (not defined in the specifications but used to handle errors):

  • FormatException: Class for exceptions regarding format error incidents
  • IOException: Class for exceptions regarding stream or save error incidents
  • RangeException: Class for exceptions regarding out-of-range incidents
  • StyleException: Class for exceptions regarding styles
  • WorksheetException: Class for exceptions regarding worksheets