-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run upstream test suite #55
Conversation
needs data downloads & extra requirements, see https://github.com/huggingface/tokenizers/blob/python-v0.13.1/bindings/python/setup.py#L5
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( |
Not sure why this is picking up openssl3 already, which depends on arrow being migrated. |
…nda-forge-pinning 2022.10.23.10.16.33
…nda-forge-pinning 2022.10.24.01.16.58
@conda-forge/tokenizers PTAL 🙃 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @h-vetinari! This makes sense to me. 1 question (inline) to confirm why we're switching from PyPI to GitHub, but the rest looks good to me.
Happy to have you help maintain this package :).
version: {{ version }} | ||
|
||
source: | ||
url: https://pypi.io/packages/source/{{ name[0] }}/{{ name }}/{{ name }}-{{ version }}.tar.gz | ||
sha256: 3333d1cee5c8f47c96362ea0abc1f81c77c9b92c6c3d11cbf1d01985f0d5cf1d | ||
url: https://github.com/huggingface/tokenizers/archive/refs/tags/python-v{{ version }}.tar.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the goal to switch from PyPI to GitHub so that we can include and run the upstream test suite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because the tests are generally not part of the tarball (I don't specifically remember if I tested this for this feedstock, but I consider it cleaner than PyPI sources because there's one less layer where stuff can be spuriously changed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, makes sense.
{% set name = "tokenizers" %} | ||
{% set version = "0.13.1" %} | ||
|
||
package: | ||
name: {{ name|lower }} | ||
name: tokenizers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep these, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of these, because it's a "fake" variable - it can never be changed. It may help for the initial recipe generation, but serves no purpose afterward, and makes reading the recipe (and copying the source URL, e.g. for calculating a new hash) harder. If you insist, I'll change it, just very reluctantly 🙃
url: https://github.com/huggingface/tokenizers/archive/refs/tags/python-v{{ version }}.tar.gz | ||
sha256: 41cff8c8c87ba6dfbd9eb1d89b006aabb9c9823ffd09e281d6ddfb9ae695bd1a | ||
patches: | ||
- patches/0001-don-t-fork-on-windows.patch # [win] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
- patches/0001-don-t-fork-on-windows.patch # [win] | |
- patches/0001-dont-fork-on-windows.patch # [win] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is generated from the git commit messages with git format-patch
, it makes no sense to police these IMO (though if you want I can change the commit message of the patch).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not policing it, just recommending what I think is cleaner. But that's fine.
- {{ PYTHON }} -m pip install . -vv | ||
|
||
requirements: | ||
build: | ||
- python # [build_platform != target_platform] | ||
- cross-python_{{ target_platform }} # [build_platform != target_platform] | ||
- openssl # [build_platform != target_platform] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to pin the version here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that's taken care of by the global pinning. :)
requires: | ||
- pip | ||
- pytest | ||
- datasets | ||
- numpy * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't seen the *
notation in a recipe, TIL :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It keeps conda smithy from spuriously thinking it needs to compile the recipe against different numpy versions. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat!
# adapted from https://github.com/huggingface/tokenizers/blob/master/bindings/python/Makefile | ||
- mkdir data | ||
- curl https://norvig.com/big.txt > data/big.txt | ||
{% set tests_to_skip = "_not_a_real_test" %} | ||
# windows and expectation of forking -> not gonna happen | ||
{% set tests_to_skip = tests_to_skip + " or with_parallelism" %} # [win] | ||
- pytest -v tests -k "not ({{ tests_to_skip }})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! :)
version: {{ version }} | ||
|
||
source: | ||
url: https://pypi.io/packages/source/{{ name[0] }}/{{ name }}/{{ name }}-{{ version }}.tar.gz | ||
sha256: 3333d1cee5c8f47c96362ea0abc1f81c77c9b92c6c3d11cbf1d01985f0d5cf1d | ||
url: https://github.com/huggingface/tokenizers/archive/refs/tags/python-v{{ version }}.tar.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because the tests are generally not part of the tarball (I don't specifically remember if I tested this for this feedstock, but I consider it cleaner than PyPI sources because there's one less layer where stuff can be spuriously changed).
{% set name = "tokenizers" %} | ||
{% set version = "0.13.1" %} | ||
|
||
package: | ||
name: {{ name|lower }} | ||
name: tokenizers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of these, because it's a "fake" variable - it can never be changed. It may help for the initial recipe generation, but serves no purpose afterward, and makes reading the recipe (and copying the source URL, e.g. for calculating a new hash) harder. If you insist, I'll change it, just very reluctantly 🙃
url: https://github.com/huggingface/tokenizers/archive/refs/tags/python-v{{ version }}.tar.gz | ||
sha256: 41cff8c8c87ba6dfbd9eb1d89b006aabb9c9823ffd09e281d6ddfb9ae695bd1a | ||
patches: | ||
- patches/0001-don-t-fork-on-windows.patch # [win] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is generated from the git commit messages with git format-patch
, it makes no sense to police these IMO (though if you want I can change the commit message of the patch).
- {{ PYTHON }} -m pip install . -vv | ||
|
||
requirements: | ||
build: | ||
- python # [build_platform != target_platform] | ||
- cross-python_{{ target_platform }} # [build_platform != target_platform] | ||
- openssl # [build_platform != target_platform] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that's taken care of by the global pinning. :)
requires: | ||
- pip | ||
- pytest | ||
- datasets | ||
- numpy * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It keeps conda smithy from spuriously thinking it needs to compile the recipe against different numpy versions. :)
@setu4993 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, @h-vetinari! Thanks for contributing and glad to have you maintain this alongside.
version: {{ version }} | ||
|
||
source: | ||
url: https://pypi.io/packages/source/{{ name[0] }}/{{ name }}/{{ name }}-{{ version }}.tar.gz | ||
sha256: 3333d1cee5c8f47c96362ea0abc1f81c77c9b92c6c3d11cbf1d01985f0d5cf1d | ||
url: https://github.com/huggingface/tokenizers/archive/refs/tags/python-v{{ version }}.tar.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, makes sense.
url: https://github.com/huggingface/tokenizers/archive/refs/tags/python-v{{ version }}.tar.gz | ||
sha256: 41cff8c8c87ba6dfbd9eb1d89b006aabb9c9823ffd09e281d6ddfb9ae695bd1a | ||
patches: | ||
- patches/0001-don-t-fork-on-windows.patch # [win] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not policing it, just recommending what I think is cleaner. But that's fine.
requires: | ||
- pip | ||
- pytest | ||
- datasets | ||
- numpy * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat!
Sorry it took a few days. |
No worries! :) Thanks for the review & merging! |
Just saw that this broke due to a pretty banal
I think I'll fix it directly in #56 |
Pick up unmerged improvements from #40