Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade WebAssembly build to Arrow 1.0.1 #1207

Merged
merged 1 commit into from
Oct 4, 2020
Merged

Upgrade WebAssembly build to Arrow 1.0.1 #1207

merged 1 commit into from
Oct 4, 2020

Conversation

texodus
Copy link
Member

@texodus texodus commented Sep 25, 2020

Upgrades Apache Arrow to v1.0.1 for all targets except Python 2.7. For the WebAssembly/Emscripten build, CSV loading and writing (via to_csv()) has been rewritten to use Arrow over C++, resulting in a ~10x load performance increase, depending on whether you provide a schema and what data types your columns are.

For Python 2.7, Arrow requirements are pyarrow<2 only, which is 0.16.1. I've done some limited testing to ensure this is cross-compatible with Arrow Table data generated or read-from 1.0.1, but this is something we should probably keep an eye on; especially regarding the JupyterLab plugin @finos/perspective-jupyterlab, whose client/server distributed mode would use engines compiled to both versions simultaneously when Python 2.7 runs the kernel.

Screen Shot 2020-09-24 at 2 30 43 AM

@texodus texodus force-pushed the csv-arrow branch 8 times, most recently from 50d9237 to 229f764 Compare September 26, 2020 22:01
@texodus texodus marked this pull request as ready for review September 26, 2020 22:21
if(${CMAKE_SYSTEM_NAME} MATCHES "Windows")
# windows its just "arrow.dll"
set(PYTHON_PYARROW_PYTHON_SHARED_LIBRARY "arrow_python")
set(PYTHON_PYARROW_ARROW_SHARED_LIBRARY "arrow")
set(PYTHON_PYARROW_LIBRARIES ${PYTHON_PYARROW_PYTHON_SHARED_LIBRARY} ${PYTHON_PYARROW_ARROW_SHARED_LIBRARY})
elseif (CMAKE_SYSTEM_NAME MATCHES "Darwin")
# Link against pre-built libarrow on MacOS
set(PYTHON_PYARROW_PYTHON_SHARED_LIBRARY ${PYTHON_PYARROW_LIBRARY_DIR}/${CMAKE_SHARED_LIBRARY_PREFIX}arrow_python.${PYARROW_VERSION_MINOR}.dylib)
set(PYTHON_PYARROW_ARROW_SHARED_LIBRARY ${PYTHON_PYARROW_LIBRARY_DIR}/${CMAKE_SHARED_LIBRARY_PREFIX}arrow.${PYARROW_VERSION_MINOR}.dylib)
set(PYTHON_PYARROW_PYTHON_SHARED_LIBRARY ${PYTHON_PYARROW_LIBRARY_DIR}/${CMAKE_SHARED_LIBRARY_PREFIX}arrow_python.100.dylib)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't hardcode the version, just use "arrow_python.dylib" etc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are going to have to cope with the conditional naming for these libraries, as they are not disted in this version under this name (and don't seem to exhibit any consistent version convention at all).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's link this into an apache arrow issue, for pyarrow < 1 they had symlinks and we'll need those for some platforms

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wesm this makes linking against these libraries super ugly. Since the version is managed by python anyway (e.g. in the version of the python package), wouldn't it be better to ship just the unversioned binary (e.g. libarrow.dylib)?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See http://arrow.apache.org/docs/python/extending.html#building-extensions-against-pypi-wheels. We had to make changes to prevent shared libraries from being duplicated in the wheels. It might be possible to change things to ship unversioned shared libraries instead, but it will require some surgery on the Cython modules to get them to link with the unversioned libs (because they look for the libraries with the ABI version when they link). Feel free to open a JIRA issue in Apache Arrow with a proposal -- I have definitely hit the limit for how much time I can personally spend on it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wesm

@timkpaine I am not particularly bothered by this, it is easy to fix and this file must be manually updated for new Arrow versions regardless.

cmake/modules/FindPyArrow.cmake Show resolved Hide resolved
cmake/modules/FindPyArrow.cmake Show resolved Hide resolved
cmake/modules/FindPyArrow.cmake Show resolved Hide resolved
cmake/modules/FindPyArrow.cmake Show resolved Hide resolved
target_link_libraries(binding psp)
target_compile_options(psp PRIVATE -Wno-deprecated-register)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are these for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silences python2 deprecation warnings which make our build logs un-readable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're doing this on windows too though, are you sure this flag will work there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know ... our windows azure builds do not work ...

@@ -10,6 +10,7 @@
#include <perspective/emscripten.h>
#include <perspective/arrow_loader.h>
#include <perspective/arrow_writer.h>
#include "arrow/csv/api.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

match style with <>

"python-dateutil>=2.8.0",
"six>=1.11.0",
"traitlets>=4.3.2",
]

if sys.version_info.major < 3:
requires.append("backports.shutil-which")
requires.append("pyarrow<2")
else:
requires.append("pyarrow>=1.0.1,<2")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should match pyproject.toml

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version is conditional on python version - does this not make the looser bounds "pyarrow<2" for pyproject.toml correct? Perspective under this PR will compile with either pyarrow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pyproject.toml specifies it as pyarrow<2 regardless of python version. So it could satisfy with pyarrow<0.16 even though that won't work.

You have a couple of options, something like:
"pyarrow>0.16.0,<2; python_version < '3'", "pyarrow>1.0.1,<2; python_version >= '3'" would work ok.

Stylistically, every place where we specify requirements should match exactly the behavior of pyproject.toml, otherwise we can very easily start to break our builds and wheels like the state we were in before.

@@ -181,7 +182,7 @@ jobs:
# displayName: "Which python"

# - script: |
# python -m pip install numpy "pyarrow>=0.16.0,<1"
# python -m pip install numpy "pyarrow>=1.0.1,<2"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

match pyproject.toml

@@ -338,8 +344,14 @@ jobs:
displayName: "Which python"

- script: |
python -m pip install delocate wheel numpy "pyarrow>=0.16.0,<1"
displayName: "Python deps"
python -m pip install delocate wheel numpy "pyarrow>=1.0.1,<2"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

match pyproject.toml

@timkpaine
Copy link
Member

@texodus the only issue really is that pyproject.toml has to match wherever you install pyarrow. So if you install <2 for python2, and >1<2 for python3, you need to reflect this in the pyproject.toml file so that the build system matches the setup.py requirements matches what we install in CI

@texodus
Copy link
Member Author

texodus commented Oct 4, 2020

@timkpaine I've moved the pyarrow upgrade to a new PR. What's left in the PR is just the C++ and Javascript changes necessary to upgrade, plus the new CSV parser. Since this was already compilable for ARROW_VERSION_MAJOR < 1, the PR continues to compile fine with perspective-python's pyarrow dependency bounds as they were.

@texodus texodus changed the title Upgrade to Arrow 1.0.1 Upgrade WebAssembly build to Arrow 1.0.1 Oct 4, 2020
@timkpaine
Copy link
Member

@texodus do you want me to make the python changes on that branch or do you want them in a separate PR?

@texodus
Copy link
Member Author

texodus commented Oct 4, 2020

I already made most of them - just the pyproject.toml one is left. I will set you as a reviewer when it is ready to merge.

@texodus texodus merged commit 3926623 into master Oct 4, 2020
@texodus texodus deleted the csv-arrow branch October 5, 2020 02:21
@texodus texodus added enhancement Feature requests or improvements JS labels Oct 7, 2020
@texodus texodus mentioned this pull request Oct 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests or improvements JS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants