Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add writers for xdr and ascii files #40

Merged
merged 124 commits into from
Jun 20, 2024
Merged

Conversation

trossi
Copy link
Contributor

@trossi trossi commented Mar 4, 2024

References to issues or other PRs

Closes #31.

Describe the proposed changes

This PR will add basic writing functionality.

Demo (demo.py):

import numpy as np
from rdata.io import write
from rdata.conversion import convert_to_r_data

matrix = np.pi * np.arange(12.).reshape(3, 4)
data = [matrix, 'hello', dict(my_key='my_value'), None, 1, 2.2, 3.3+4.4j]
print(data)

r_data = convert_to_r_data(data)
write("demo.rds", r_data)

Output:

$ python3 demo.py 
[array([[ 0.        ,  3.14159265,  6.28318531,  9.42477796],
       [12.56637061, 15.70796327, 18.84955592, 21.99114858],
       [25.13274123, 28.27433388, 31.41592654, 34.55751919]]), 'hello', {'my_key': 'my_value'}, None, 1, 2.2, (3.3+4.4j)]

$ Rscript -e "readRDS('demo.rds')"
[[1]]
         [,1]      [,2]      [,3]      [,4]
[1,]  0.00000  3.141593  6.283185  9.424778
[2,] 12.56637 15.707963 18.849556 21.991149
[3,] 25.13274 28.274334 31.415927 34.557519

[[2]]
[1] "hello"

[[3]]
[[3]]$my_key
[1] "my_value"


[[4]]
NULL

[[5]]
[1] 1

[[6]]
[1] 2.2

[[7]]
[1] 3.3+4.4i

Additional information

Testing

Two tests looping over all test files in the repository have been added:

  1. rdata/tests/test_write.py::test_write: Test that read-parse-write results in bit-wise unchanged files (omitting compression)
  2. rdata/tests/test_write.py::test_convert_to_r: Test that R-to-Python-to-R conversion results in unchanged RData object.

Implemented features are indicated in the test output: XFAIL: writing/converting of the given type has not implemented in this PR, SKIPPED: R-to-Python-to-R conversion might be ambiguous (not enough information in Python object to construct the original R object).

Test output (click to expand)
rdata/tests/test_write.py::test_write[test_altrep_compact_intseq.rda] XFAIL (RObjectType.ALTREP)
rdata/tests/test_write.py::test_write[test_altrep_compact_intseq_asymmetric.rda] XFAIL (RObjectType.ALTREP)
rdata/tests/test_write.py::test_write[test_altrep_compact_realseq.rda] XFAIL (RObjectType.ALTREP)
rdata/tests/test_write.py::test_write[test_altrep_compact_realseq_asymmetric.rda] XFAIL (RObjectType.ALTREP)
rdata/tests/test_write.py::test_write[test_altrep_deferred_string.rda] XFAIL (RObjectType.ALTREP)
rdata/tests/test_write.py::test_write[test_altrep_wrap_logical.rda] XFAIL (RObjectType.ALTREP)
rdata/tests/test_write.py::test_write[test_altrep_wrap_real.rda] XFAIL (RObjectType.ALTREP)
rdata/tests/test_write.py::test_write[test_altrep_wrap_string.rda] XFAIL (RObjectType.ALTREP)
rdata/tests/test_write.py::test_write[test_ascii_v2.rda] PASSED
rdata/tests/test_write.py::test_write[test_ascii_v2.rds] PASSED
rdata/tests/test_write.py::test_write[test_ascii_v3.rda] PASSED
rdata/tests/test_write.py::test_write[test_ascii_v3.rds] PASSED
rdata/tests/test_write.py::test_write[test_ascii_win_v2.rda] PASSED
rdata/tests/test_write.py::test_write[test_ascii_win_v2.rds] PASSED
rdata/tests/test_write.py::test_write[test_ascii_win_v3.rda] PASSED
rdata/tests/test_write.py::test_write[test_ascii_win_v3.rds] PASSED
rdata/tests/test_write.py::test_write[test_builtin.rda] PASSED
rdata/tests/test_write.py::test_write[test_complex.rda] PASSED
rdata/tests/test_write.py::test_write[test_dataframe.rda] XFAIL (RObjectType.REF)
rdata/tests/test_write.py::test_write[test_dataframe.rds] XFAIL (RObjectType.REF)
rdata/tests/test_write.py::test_write[test_dataframe_rownames.rda] XFAIL (RObjectType.REF)
rdata/tests/test_write.py::test_write[test_dataframe_v3.rda] XFAIL (RObjectType.REF)
rdata/tests/test_write.py::test_write[test_dataframe_v3.rds] XFAIL (RObjectType.REF)
rdata/tests/test_write.py::test_write[test_empty_function.rda] XFAIL (RObjectType.CLO)
rdata/tests/test_write.py::test_write[test_empty_function_uncompiled.rda] XFAIL (RObjectType.CLO)
rdata/tests/test_write.py::test_write[test_empty_str.rda] PASSED
rdata/tests/test_write.py::test_write[test_emptyenv.rda] XFAIL (RObjectType.EMPTYENV)
rdata/tests/test_write.py::test_write[test_encodings.rda] PASSED
rdata/tests/test_write.py::test_write[test_encodings_v3.rda] PASSED
rdata/tests/test_write.py::test_write[test_environment.rda] XFAIL (RObjectType.ENV)
rdata/tests/test_write.py::test_write[test_expression.rda] PASSED
rdata/tests/test_write.py::test_write[test_file.rda] XFAIL (RObjectType.EXTPTR)
rdata/tests/test_write.py::test_write[test_full_named_matrix.rda] PASSED
rdata/tests/test_write.py::test_write[test_full_named_matrix.rds] PASSED
rdata/tests/test_write.py::test_write[test_function.rda] XFAIL (RObjectType.CLO)
rdata/tests/test_write.py::test_write[test_function_arg.rda] XFAIL (RObjectType.CLO)
rdata/tests/test_write.py::test_write[test_half_named_matrix.rda] PASSED
rdata/tests/test_write.py::test_write[test_list.rda] PASSED
rdata/tests/test_write.py::test_write[test_list_attrs.rda] PASSED
rdata/tests/test_write.py::test_write[test_logical.rda] PASSED
rdata/tests/test_write.py::test_write[test_matrix.rda] PASSED
rdata/tests/test_write.py::test_write[test_minimal_function.rda] XFAIL (RObjectType.CLO)
rdata/tests/test_write.py::test_write[test_minimal_function_uncompiled.rda] XFAIL (RObjectType.CLO)
rdata/tests/test_write.py::test_write[test_na_string.rda] PASSED
rdata/tests/test_write.py::test_write[test_named_matrix.rda] PASSED
rdata/tests/test_write.py::test_write[test_nullable_int.rda] PASSED
rdata/tests/test_write.py::test_write[test_nullable_logical.rda] PASSED
rdata/tests/test_write.py::test_write[test_s4.rda] XFAIL (RObjectType.S4)
rdata/tests/test_write.py::test_write[test_ts.rda] PASSED
rdata/tests/test_write.py::test_write[test_vector.rda] PASSED
rdata/tests/test_write.py::test_convert_to_r[test_altrep_compact_intseq.rda] SKIPPED (Type RObjectType.ALTREP not implemented)
rdata/tests/test_write.py::test_convert_to_r[test_altrep_compact_intseq_asymmetric.rda] SKIPPED (Type RObjectType.ALTREP not implemented)
rdata/tests/test_write.py::test_convert_to_r[test_altrep_compact_realseq.rda] SKIPPED (Type RObjectType.ALTREP not implemented)
rdata/tests/test_write.py::test_convert_to_r[test_altrep_compact_realseq_asymmetric.rda] SKIPPED (Type RObjectType.ALTREP not implemented)
rdata/tests/test_write.py::test_convert_to_r[test_altrep_deferred_string.rda] SKIPPED (Type RObjectType.ALTREP not implemented)
rdata/tests/test_write.py::test_convert_to_r[test_altrep_wrap_logical.rda] SKIPPED (Type RObjectType.ALTREP not implemented)
rdata/tests/test_write.py::test_convert_to_r[test_altrep_wrap_real.rda] SKIPPED (Type RObjectType.ALTREP not implemented)
rdata/tests/test_write.py::test_convert_to_r[test_altrep_wrap_string.rda] SKIPPED (Type RObjectType.ALTREP not implemented)
rdata/tests/test_write.py::test_convert_to_r[test_ascii_v2.rda] PASSED
rdata/tests/test_write.py::test_convert_to_r[test_ascii_v2.rds] PASSED
rdata/tests/test_write.py::test_convert_to_r[test_ascii_v3.rda] PASSED
rdata/tests/test_write.py::test_convert_to_r[test_ascii_v3.rds] PASSED
rdata/tests/test_write.py::test_convert_to_r[test_ascii_win_v2.rda] PASSED
rdata/tests/test_write.py::test_convert_to_r[test_ascii_win_v2.rds] PASSED
rdata/tests/test_write.py::test_convert_to_r[test_ascii_win_v3.rda] PASSED
rdata/tests/test_write.py::test_convert_to_r[test_ascii_win_v3.rds] PASSED
rdata/tests/test_write.py::test_convert_to_r[test_builtin.rda] XFAIL (<class 'rdata.conversion._conversion.RBuiltin'>)
rdata/tests/test_write.py::test_convert_to_r[test_complex.rda] PASSED
rdata/tests/test_write.py::test_convert_to_r[test_dataframe.rda] XFAIL (<class 'pandas.core.frame.DataFrame'>)
rdata/tests/test_write.py::test_convert_to_r[test_dataframe.rds] XFAIL (<class 'pandas.core.frame.DataFrame'>)
rdata/tests/test_write.py::test_convert_to_r[test_dataframe_rownames.rda] XFAIL (<class 'pandas.core.frame.DataFrame'>)
rdata/tests/test_write.py::test_convert_to_r[test_dataframe_v3.rda] XFAIL (<class 'pandas.core.frame.DataFrame'>)
rdata/tests/test_write.py::test_convert_to_r[test_dataframe_v3.rds] XFAIL (<class 'pandas.core.frame.DataFrame'>)
rdata/tests/test_write.py::test_convert_to_r[test_empty_function.rda] XFAIL (<class 'rdata.conversion._conversion.RFunction'>)
rdata/tests/test_write.py::test_convert_to_r[test_empty_function_uncompiled.rda] XFAIL (<class 'rdata.conversion._conversion.RFunction'>)
rdata/tests/test_write.py::test_convert_to_r[test_empty_str.rda] PASSED
rdata/tests/test_write.py::test_convert_to_r[test_emptyenv.rda] XFAIL (<class 'rdata.conversion._conversion.REnvironment'>)
rdata/tests/test_write.py::test_convert_to_r[test_encodings.rda] SKIPPED (ambiguous R->py->R transformation)
rdata/tests/test_write.py::test_convert_to_r[test_encodings_v3.rda] SKIPPED (ambiguous R->py->R transformation)
rdata/tests/test_write.py::test_convert_to_r[test_environment.rda] XFAIL (<class 'rdata.conversion._conversion.REnvironment'>)
rdata/tests/test_write.py::test_convert_to_r[test_expression.rda] PASSED
rdata/tests/test_write.py::test_convert_to_r[test_file.rda] SKIPPED (ambiguous R->py->R transformation)
rdata/tests/test_write.py::test_convert_to_r[test_full_named_matrix.rda] XFAIL (<class 'xarray.core.dataarray.DataArray'>)
rdata/tests/test_write.py::test_convert_to_r[test_full_named_matrix.rds] XFAIL (<class 'xarray.core.dataarray.DataArray'>)
rdata/tests/test_write.py::test_convert_to_r[test_function.rda] XFAIL (<class 'rdata.conversion._conversion.RFunction'>)
rdata/tests/test_write.py::test_convert_to_r[test_function_arg.rda] XFAIL (<class 'rdata.conversion._conversion.RFunction'>)
rdata/tests/test_write.py::test_convert_to_r[test_half_named_matrix.rda] XFAIL (<class 'xarray.core.dataarray.DataArray'>)
rdata/tests/test_write.py::test_convert_to_r[test_list.rda] PASSED
rdata/tests/test_write.py::test_convert_to_r[test_list_attrs.rda] SKIPPED (ambiguous R->py->R transformation)
rdata/tests/test_write.py::test_convert_to_r[test_logical.rda] PASSED
rdata/tests/test_write.py::test_convert_to_r[test_matrix.rda] PASSED
rdata/tests/test_write.py::test_convert_to_r[test_minimal_function.rda] XFAIL (<class 'rdata.conversion._conversion.RFunction'>)
rdata/tests/test_write.py::test_convert_to_r[test_minimal_function_uncompiled.rda] XFAIL (<class 'rdata.conversion._conversion.RFunction'>)
rdata/tests/test_write.py::test_convert_to_r[test_na_string.rda] PASSED
rdata/tests/test_write.py::test_convert_to_r[test_named_matrix.rda] XFAIL (<class 'xarray.core.dataarray.DataArray'>)
rdata/tests/test_write.py::test_convert_to_r[test_nullable_int.rda] PASSED
rdata/tests/test_write.py::test_convert_to_r[test_nullable_logical.rda] PASSED
rdata/tests/test_write.py::test_convert_to_r[test_s4.rda] XFAIL (<class 'types.SimpleNamespace'>)
rdata/tests/test_write.py::test_convert_to_r[test_ts.rda] XFAIL (<class 'pandas.core.series.Series'>)
rdata/tests/test_write.py::test_convert_to_r[test_vector.rda] PASSED

These tests do not cover all the functionality of writer, so extra tests need to be added still.

Documentation

No documentation in README or https://rdata.readthedocs.io is included in this PR yet.

Checklist before requesting a review

  • I have performed a self-review of my code
  • The code conforms to the style used in this package (checked with Ruff)
  • The code is fully documented and typed (type-checked with Mypy)
  • I have added thorough tests for the new/changed functionality

Copy link
Contributor Author

@trossi trossi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vnmabus Thank you for the review! Very useful comments that improved the code. I have addressed the comments and included a few extra tests too.

rdata/_write.py Outdated Show resolved Hide resolved
rdata/unparser/__init__.py Outdated Show resolved Hide resolved
rdata/_write.py Outdated Show resolved Hide resolved
rdata/_write.py Outdated Show resolved Hide resolved
rdata/conversion/to_r.py Outdated Show resolved Hide resolved
rdata/unparser/_ascii.py Outdated Show resolved Hide resolved
rdata/unparser/_unparser.py Outdated Show resolved Hide resolved
rdata/unparser/_unparser.py Outdated Show resolved Hide resolved
rdata/unparser/_unparser.py Show resolved Hide resolved
rdata/unparser/_unparser.py Show resolved Hide resolved
Copy link
Owner

@vnmabus vnmabus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is almost ready. Some small comments and questions.

rdata/tests/test_write.py Show resolved Hide resolved
rdata/tests/test_write.py Show resolved Hide resolved
rdata/conversion/to_r.py Outdated Show resolved Hide resolved
rdata/tests/test_write.py Outdated Show resolved Hide resolved
rdata/tests/test_write.py Outdated Show resolved Hide resolved
rdata/unparser/_ascii.py Outdated Show resolved Hide resolved
rdata/unparser/_unparser.py Outdated Show resolved Hide resolved
rdata/unparser/_unparser.py Show resolved Hide resolved
rdata/unparser/_unparser.py Show resolved Hide resolved
rdata/conversion/to_r.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

@trossi trossi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vnmabus Thank you for the review! I replied to comments and fixed also a NumPy 2.0 compatibility issue.

rdata/tests/test_write.py Show resolved Hide resolved
rdata/tests/test_write.py Outdated Show resolved Hide resolved
rdata/unparser/__init__.py Show resolved Hide resolved
rdata/conversion/to_r.py Outdated Show resolved Hide resolved
Copy link
Owner

@vnmabus vnmabus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think this can be merged now. If we have future improvements, they can be done in a different PR.

About the JOSS paper, there is an open discussion in pyOpenSci/software-submission#144 , because usually in order to be fast-tracked, there should be no changes between pyOpenSci acceptance and JOSS submission, and for this package we have made several changes in the middle. My hope is that at least we could submit it using the normal submission procedure (without fast-tracking), but we need to wait for answers as this situation is highly unusual.

@vnmabus vnmabus merged commit fcf0591 into vnmabus:develop Jun 20, 2024
15 checks passed
@trossi
Copy link
Contributor Author

trossi commented Jun 24, 2024

@vnmabus Thank you for the thorough review and the update regarding JOSS!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New features: faster xdr reader, ascii file reader, basic writers
3 participants