Skip to content
This repository has been archived by the owner on May 26, 2022. It is now read-only.

ODS Reader yields different results #184

Closed
madflow opened this issue Mar 13, 2016 · 7 comments
Closed

ODS Reader yields different results #184

madflow opened this issue Mar 13, 2016 · 7 comments

Comments

@madflow
Copy link
Contributor

madflow commented Mar 13, 2016

Hi there!

When playing with the Reader part of the library I came across the artifact, that the ODS reader treats input differently than its peers XLSX and CSV.

  1. 0 (Zeros) are treated as missing values - in a Spreadsheet these are perfectly valid values (PHP empty() in use...) I would argue, that even "empty" values are perfectly valid and should yield empty arrays in the row iterator (Like in CSV and XLSX).

Example:

use Box\Spout\Writer\WriterFactory;
use Box\Spout\Reader\ReaderFactory;
use Box\Spout\Common\Type;


$data = [
    ['A','B','C'],
    [1, 2, 3],
    [0, 0, 0]
];

// Write Data
$writerXlsx = WriterFactory::create(Type::XLSX);
$writerXlsx->openToFile('test.xlsx');
$writerXlsx->addRows($data);
$writerXlsx->close();

$writerOds = WriterFactory::create(Type::ODS);
$writerOds->openToFile('test.ods');
$writerOds->addRows($data);
$writerOds->close();

$writerCsv = WriterFactory::create(Type::CSV);
$writerCsv->openToFile('test.csv');
$writerCsv->addRows($data);
$writerCsv->close();

// Read files and dump the result
$result = [];
$reader = ReaderFactory::create(Type::XLSX);

$reader->open('test.xlsx');
foreach ($reader->getSheetIterator() as $sheet) {
    foreach ($sheet->getRowIterator() as $row) {
        $result['xlsx'][] = $row;
    }
}
$reader->close();

$reader = ReaderFactory::create(Type::ODS);
$reader->open('test.ods');
foreach ($reader->getSheetIterator() as $sheet) {
    foreach ($sheet->getRowIterator() as $row) {
        $result['ods'][] = $row;
    }
}
$reader->close();

$reader = ReaderFactory::create(Type::CSV);
$reader->open('test.csv');
foreach ($reader->getSheetIterator() as $sheet) {
    foreach ($sheet->getRowIterator() as $row) {
        $result['csv'][] = $row;
    }
}
$reader->close();

print_r($result);

Array
(
    [xlsx] => Array
        (
            [0] => Array
                (
                    [0] => A
                    [1] => B
                    [2] => C
                )

            [1] => Array
                (
                    [0] => 1
                    [1] => 2
                    [2] => 3
                )

            [2] => Array
                (
                    [0] => 0
                    [1] => 0
                    [2] => 0
                )

        )

    [ods] => Array
        (
            [0] => Array
                (
                    [0] => A
                    [1] => B
                    [2] => C
                )

            [1] => Array
                (
                    [0] => 1
                    [1] => 2
                    [2] => 3
                )

        )

    [csv] => Array
        (
            [0] => Array
                (
                    [0] => A
                    [1] => B
                    [2] => C
                )

            [1] => Array
                (
                    [0] => 1
                    [1] => 2
                    [2] => 3
                )

            [2] => Array
                (
                    [0] => 0
                    [1] => 0
                    [2] => 0
                )

        )

)
  1. The calculated values to show in a row are computed differently. ODS creates arrays with unequal length.

Example:

$data = [
    ['A','B','C'],
    [0, '', ''],
    [1, 1, '']
];

// Code

Array
(
    [xlsx] => Array
        (
            [0] => Array
                (
                    [0] => A
                    [1] => B
                    [2] => C
                )

            [1] => Array
                (
                    [0] => 0
                    [1] => 
                    [2] => 
                )

            [2] => Array
                (
                    [0] => 1
                    [1] => 1
                    [2] => 
                )

        )

    [ods] => Array
        (
            [0] => Array
                (
                    [0] => A
                    [1] => B
                    [2] => C
                )

            [1] => Array
                (
                    [0] => 0
                )

            [2] => Array
                (
                    [0] => 1
                    [1] => 1
                    [2] => 
                )

        )

    [csv] => Array
        (
            [0] => Array
                (
                    [0] => A
                    [1] => B
                    [2] => C
                )

            [1] => Array
                (
                    [0] => 0
                    [1] => 
                    [2] => 
                )

            [2] => Array
                (
                    [0] => 1
                    [1] => 1
                    [2] => 
                )

        )

)

Maybe this is wanted behavior: Then I would argue to change it and introduce a legacy flag for the old behavior to avoid a regression.

If this is a known thing and ODS needs some love - then this could be a PR.

Thanks!

@adrilo
Copy link
Collaborator

adrilo commented Mar 14, 2016

Hi @madflow, thanks for filing this issue.

This is definitely not the correct behavior. All readers should behave consistently.
If you have time and are interested to help, feel free to submit a PR. I'd be happy to guide you and get it approved

@madflow
Copy link
Contributor Author

madflow commented Mar 15, 2016

I am interested to help - of course time and knowledge about the spreadsheet internals are an issue. I will have a look at the tests first. I reckon there should be a set of shared tests that all readers should equally pass.

@adrilo
Copy link
Collaborator

adrilo commented Mar 15, 2016

Yes! Tests are in the "tests" directory (obviously). ODS files are composed by a bunch of XML files, zipped together. One way to get familiar with the ODS format is to create an ODS file with fake data, unzip the *.ods file and look at the extracted files. They all contain some pieces of information you have to merge together to get the actual data.

The test files used in the tests are located in "tests/resources". What I recommend you to do is create a test file with zeroes and empty strings, add a test with the expected behavior (it should fail) and then update the ODS Reader code to fix the issue. Once the issue is fixed, the test should now pass.

madflow pushed a commit to madflow/spout that referenced this issue Mar 16, 2016
madflow pushed a commit to madflow/spout that referenced this issue Mar 16, 2016
@adrilo
Copy link
Collaborator

adrilo commented Mar 18, 2016

Hey @madflow, your changes look good. Feel free to submit a pull request. I'll be happy to merge it

@madflow
Copy link
Contributor Author

madflow commented Mar 18, 2016

The current commits only fix bug number 1 (Zeros treated as missing values). The second Bug has not been fixed yet in my branch (tests are failing like they are supposed to) ;)

I can create a "Work in Progress" PR if you like.

@adrilo
Copy link
Collaborator

adrilo commented Mar 18, 2016

No need for a WIP PR. Just submit the PR when you are done :)

madflow pushed a commit to madflow/spout that referenced this issue Mar 19, 2016
madflow pushed a commit to madflow/spout that referenced this issue Mar 19, 2016
adrilo added a commit that referenced this issue Mar 21, 2016
@adrilo
Copy link
Collaborator

adrilo commented Mar 21, 2016

Fixed as part of #189

@adrilo adrilo closed this as completed Mar 21, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants