Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ImportError: lxml.html.clean module is now a separate project lxml_html_clean. #26

Closed
chenrui17 opened this issue Apr 9, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@chenrui17
Copy link

chenrui17 commented Apr 9, 2024

Describe the bug

from nemo_curator.download import download_common_crawl
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/NeMo-Curator/nemo_curator/download/__init__.py", line 16, in <module>
    from .commoncrawl import (
  File "/opt/NeMo-Curator/nemo_curator/download/commoncrawl.py", line 21, in <module>
    import justext
  File "/usr/local/lib/python3.10/dist-packages/justext/__init__.py", line 12, in <module>
    from .core import justext
  File "/usr/local/lib/python3.10/dist-packages/justext/core.py", line 21, in <module>
    from lxml.html.clean import Cleaner
  File "/usr/local/lib/python3.10/dist-packages/lxml/html/clean.py", line 18, in <module>
    raise ImportError(
ImportError: lxml.html.clean module is now a separate project lxml_html_clean.
Install lxml[html_clean] or lxml_html_clean directly.

Steps/Code to reproduce bug

#25 use this Dockerfile to build image, and create a container successfuly, but when i run from nemo_curator.download import download_common_crawl, it encountered an error above.

@chenrui17 chenrui17 added the bug Something isn't working label Apr 9, 2024
@ryantwolf
Copy link
Collaborator

Yeah we saw this too. It's a problem with jusText. I have found a workaround that I will make a PR for later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants