Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Make pyarrow installable in Jupyterlite #34996

Open
MarcSkovMadsen opened this issue Apr 10, 2023 · 6 comments
Open

[Python] Make pyarrow installable in Jupyterlite #34996

MarcSkovMadsen opened this issue Apr 10, 2023 · 6 comments

Comments

@MarcSkovMadsen
Copy link

MarcSkovMadsen commented Apr 10, 2023

Describe the enhancement requested

I'm trying to use pyarrow with Panel in Panelite. Panelite is a custom build of Jupyterlite that works with Panel.

The issue is that piplite cannot install pyarrow. It would be great if pyarrow also worked with Pyodide and in Jupyterlite/ Panelite.

Thanks.


image

import piplite
await piplite.install(['panel', 'xlsxwriter', 'pyarrow'])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[16], line 2
      1 import piplite
----> 2 await piplite.install(['panel', 'xlsxwriter', 'pyarrow'])

File /lib/python3.10/asyncio/futures.py:284, in Future.__await__(self)
    282 if not self.done():
    283     self._asyncio_future_blocking = True
--> 284     yield self  # This tells Task to wait for completion.
    285 if not self.done():
    286     raise RuntimeError("await wasn't used with future")

File /lib/python3.10/asyncio/tasks.py:304, in Task.__wakeup(self, future)
    302 def __wakeup(self, future):
    303     try:
--> 304         future.result()
    305     except BaseException as exc:
    306         # This may also be a cancellation.
    307         self.__step(exc)

File /lib/python3.10/asyncio/futures.py:201, in Future.result(self)
    199 self.__log_traceback = False
    200 if self._exception is not None:
--> 201     raise self._exception
    202 return self._result

File /lib/python3.10/asyncio/tasks.py:234, in Task.__step(***failed resolving arguments***)
    232         result = coro.send(None)
    233     else:
--> 234         result = coro.throw(exc)
    235 except StopIteration as exc:
    236     if self._must_cancel:
    237         # Task is cancelled right before coro stops.

File /lib/python3.10/site-packages/piplite/piplite.py:102, in _install(requirements, keep_going, deps, credentials, pre)
    100 """Invoke micripip.install with a patch to get data from local indexes"""
    101 with patch("micropip._micropip._get_pypi_json", _get_pypi_json):
--> 102     return await _micropip.install(
    103         requirements=requirements,
    104         keep_going=keep_going,
    105         deps=deps,
    106         credentials=credentials,
    107         pre=pre,
    108     )

File /lib/python3.10/site-packages/micropip/_micropip.py:573, in install(requirements, keep_going, deps, credentials, pre)
    563 wheel_base = Path(getsitepackages()[0])
    565 transaction = Transaction(
    566     ctx=ctx,
    567     ctx_extras=[],
   (...)
    571     fetch_kwargs=fetch_kwargs,
    572 )
--> 573 await transaction.gather_requirements(requirements)
    575 if transaction.failed:
    576     failed_requirements = ", ".join([f"'{req}'" for req in transaction.failed])

File /lib/python3.10/site-packages/micropip/_micropip.py:333, in Transaction.gather_requirements(self, requirements)
    330 for requirement in requirements:
    331     requirement_promises.append(self.add_requirement(requirement))
--> 333 await gather(*requirement_promises)

File /lib/python3.10/asyncio/futures.py:284, in Future.__await__(self)
    282 if not self.done():
    283     self._asyncio_future_blocking = True
--> 284     yield self  # This tells Task to wait for completion.
    285 if not self.done():
    286     raise RuntimeError("await wasn't used with future")

File /lib/python3.10/asyncio/tasks.py:304, in Task.__wakeup(self, future)
    302 def __wakeup(self, future):
    303     try:
--> 304         future.result()
    305     except BaseException as exc:
    306         # This may also be a cancellation.
    307         self.__step(exc)

File /lib/python3.10/asyncio/futures.py:201, in Future.result(self)
    199 self.__log_traceback = False
    200 if self._exception is not None:
--> 201     raise self._exception
    202 return self._result

File /lib/python3.10/asyncio/tasks.py:232, in Task.__step(***failed resolving arguments***)
    228 try:
    229     if exc is None:
    230         # We use the `send` method directly, because coroutines
    231         # don't have `__iter__` and `__next__` methods.
--> 232         result = coro.send(None)
    233     else:
    234         result = coro.throw(exc)

File /lib/python3.10/site-packages/micropip/_micropip.py:340, in Transaction.add_requirement(self, req)
    337     return await self.add_requirement_inner(req)
    339 if not urlparse(req).path.endswith(".whl"):
--> 340     return await self.add_requirement_inner(Requirement(req))
    342 # custom download location
    343 wheel = WheelInfo.from_url(req)

File /lib/python3.10/site-packages/micropip/_micropip.py:435, in Transaction.add_requirement_inner(self, req)
    432 metadata = await _get_pypi_json(req.name, self.fetch_kwargs)
    434 try:
--> 435     wheel = find_wheel(metadata, req)
    436 except ValueError:
    437     self.failed.append(req)

File /lib/python3.10/site-packages/micropip/_micropip.py:303, in find_wheel(metadata, req)
    300     if best_wheel is not None:
    301         return wheel
--> 303 raise ValueError(
    304     f"Can't find a pure Python 3 wheel for '{req}'.\n"
    305     f"See: {FAQ_URLS['cant_find_wheel']}\n"
    306     "You can use `micropip.install(..., keep_going=True)`"
    307     "to get a list of all packages with missing wheels."
    308 )

ValueError: Can't find a pure Python 3 wheel for 'pyarrow'.
See: https://pyodide.org/en/stable/usage/faq.html#micropip-can-t-find-a-pure-python-wheel
You can use `micropip.install(..., keep_going=True)`to get a list of all packages with missing wheels.

Component(s)

Python

@MarcSkovMadsen MarcSkovMadsen changed the title Make pyarrow importable in Jupyterlite Make pyarrow installable in Jupyterlite Apr 10, 2023
@MarcSkovMadsen
Copy link
Author

Related to pyodide/pyodide#2933

@eliasdabbas
Copy link

+1

pyarrow is becoming more crucial with pandas 2.0 as well. Would be great to have it supported.

@westonpace
Copy link
Member

I'm not super familiar with pyodide. Is the only solution to have a pure python wheel? There have been other efforts to get arrow-c++ to be usable with emscripten. If this were done would the C++ then be usable with pyodide?

@westonpace westonpace changed the title Make pyarrow installable in Jupyterlite [Python] Make pyarrow installable in Jupyterlite May 8, 2023
@joemarshall
Copy link
Contributor

I am 90% there on arrow in pyodide. I have a running build of pyarrow here, but it needs a little work to be generally buildable by anyone. And a little more work so that it loads straight into pyodide nicely.

This PR:
#35471
is the bulk of the work, because for pyodide / browsers we require a build of arrow cpp that works okay without threads.

Once that is merged, there are a few minor build file changes to add support and cmake presets for emscripten & pyodide - this isn't that hard because we've done a bunch of work in pyodide-build which simplifies stuff. Oh and that all needs adding to the github CI here or else people will break it over time.

Then finally, once there is a sensible build process for emscripten pyarrow, it will probably make sense to add a recipe to pyodide so that it is distributed with core pyodide. That is pretty straightforward (I've done a few of them before) - basically pretty much a matter of pointing it at the git repository tag and telling it the build command to make arrow-cpp and pyarrow.

@elehcimd
Copy link

Are there any updates on this? Thanks for your incredible work

@kylebarron
Copy link
Contributor

It'll be included as an available package in the next pyodide release: pyodide/pyodide#4950. Or you can use wheels from https://github.com/joemarshall/pyarrow-pyodide/releases/tag/0.26.2 if you'd like to try it sooner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants