Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run upstream test suite #55

Merged
merged 10 commits into from
Oct 28, 2022
Merged

Run upstream test suite #55

merged 10 commits into from
Oct 28, 2022

Conversation

h-vetinari
Copy link
Member

Pick up unmerged improvements from #40

@conda-forge-linter
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@h-vetinari
Copy link
Member Author

Not sure why this is picking up openssl3 already, which depends on arrow being migrated.

@h-vetinari
Copy link
Member Author

@conda-forge/tokenizers
I worked a bunch to get python 3.10 to work (i.e. huggingface/tokenizers#956), but it took a while for a new release to materialise. Now that #53 is in, I wanted to pick up the improvements from #40 again, and am willing to add myself as a maintainer (also happy to take care of the OpenSSL 3 migration).

PTAL 🙃

@h-vetinari h-vetinari mentioned this pull request Oct 24, 2022
Copy link
Member

@setu4993 setu4993 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @h-vetinari! This makes sense to me. 1 question (inline) to confirm why we're switching from PyPI to GitHub, but the rest looks good to me.

Happy to have you help maintain this package :).

version: {{ version }}

source:
url: https://pypi.io/packages/source/{{ name[0] }}/{{ name }}/{{ name }}-{{ version }}.tar.gz
sha256: 3333d1cee5c8f47c96362ea0abc1f81c77c9b92c6c3d11cbf1d01985f0d5cf1d
url: https://github.com/huggingface/tokenizers/archive/refs/tags/python-v{{ version }}.tar.gz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the goal to switch from PyPI to GitHub so that we can include and run the upstream test suite?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because the tests are generally not part of the tarball (I don't specifically remember if I tested this for this feedstock, but I consider it cleaner than PyPI sources because there's one less layer where stuff can be spuriously changed).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense.

Comment on lines -1 to +4
{% set name = "tokenizers" %}
{% set version = "0.13.1" %}

package:
name: {{ name|lower }}
name: tokenizers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep these, please?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of these, because it's a "fake" variable - it can never be changed. It may help for the initial recipe generation, but serves no purpose afterward, and makes reading the recipe (and copying the source URL, e.g. for calculating a new hash) harder. If you insist, I'll change it, just very reluctantly 🙃

url: https://github.com/huggingface/tokenizers/archive/refs/tags/python-v{{ version }}.tar.gz
sha256: 41cff8c8c87ba6dfbd9eb1d89b006aabb9c9823ffd09e281d6ddfb9ae695bd1a
patches:
- patches/0001-don-t-fork-on-windows.patch # [win]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
- patches/0001-don-t-fork-on-windows.patch # [win]
- patches/0001-dont-fork-on-windows.patch # [win]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is generated from the git commit messages with git format-patch, it makes no sense to police these IMO (though if you want I can change the commit message of the patch).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not policing it, just recommending what I think is cleaner. But that's fine.

- {{ PYTHON }} -m pip install . -vv

requirements:
build:
- python # [build_platform != target_platform]
- cross-python_{{ target_platform }} # [build_platform != target_platform]
- openssl # [build_platform != target_platform]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to pin the version here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's taken care of by the global pinning. :)

requires:
- pip
- pytest
- datasets
- numpy *
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't seen the * notation in a recipe, TIL :).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It keeps conda smithy from spuriously thinking it needs to compile the recipe against different numpy versions. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat!

Comment on lines +72 to +78
# adapted from https://github.com/huggingface/tokenizers/blob/master/bindings/python/Makefile
- mkdir data
- curl https://norvig.com/big.txt > data/big.txt
{% set tests_to_skip = "_not_a_real_test" %}
# windows and expectation of forking -> not gonna happen
{% set tests_to_skip = tests_to_skip + " or with_parallelism" %} # [win]
- pytest -v tests -k "not ({{ tests_to_skip }})"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Member Author

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! :)

version: {{ version }}

source:
url: https://pypi.io/packages/source/{{ name[0] }}/{{ name }}/{{ name }}-{{ version }}.tar.gz
sha256: 3333d1cee5c8f47c96362ea0abc1f81c77c9b92c6c3d11cbf1d01985f0d5cf1d
url: https://github.com/huggingface/tokenizers/archive/refs/tags/python-v{{ version }}.tar.gz
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because the tests are generally not part of the tarball (I don't specifically remember if I tested this for this feedstock, but I consider it cleaner than PyPI sources because there's one less layer where stuff can be spuriously changed).

Comment on lines -1 to +4
{% set name = "tokenizers" %}
{% set version = "0.13.1" %}

package:
name: {{ name|lower }}
name: tokenizers
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of these, because it's a "fake" variable - it can never be changed. It may help for the initial recipe generation, but serves no purpose afterward, and makes reading the recipe (and copying the source URL, e.g. for calculating a new hash) harder. If you insist, I'll change it, just very reluctantly 🙃

url: https://github.com/huggingface/tokenizers/archive/refs/tags/python-v{{ version }}.tar.gz
sha256: 41cff8c8c87ba6dfbd9eb1d89b006aabb9c9823ffd09e281d6ddfb9ae695bd1a
patches:
- patches/0001-don-t-fork-on-windows.patch # [win]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is generated from the git commit messages with git format-patch, it makes no sense to police these IMO (though if you want I can change the commit message of the patch).

- {{ PYTHON }} -m pip install . -vv

requirements:
build:
- python # [build_platform != target_platform]
- cross-python_{{ target_platform }} # [build_platform != target_platform]
- openssl # [build_platform != target_platform]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's taken care of by the global pinning. :)

requires:
- pip
- pytest
- datasets
- numpy *
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It keeps conda smithy from spuriously thinking it needs to compile the recipe against different numpy versions. :)

@h-vetinari h-vetinari mentioned this pull request Oct 27, 2022
@h-vetinari
Copy link
Member Author

@setu4993
Are you satisfied with my answers or could you give this another look please? :)

Copy link
Member

@setu4993 setu4993 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, @h-vetinari! Thanks for contributing and glad to have you maintain this alongside.

version: {{ version }}

source:
url: https://pypi.io/packages/source/{{ name[0] }}/{{ name }}/{{ name }}-{{ version }}.tar.gz
sha256: 3333d1cee5c8f47c96362ea0abc1f81c77c9b92c6c3d11cbf1d01985f0d5cf1d
url: https://github.com/huggingface/tokenizers/archive/refs/tags/python-v{{ version }}.tar.gz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense.

url: https://github.com/huggingface/tokenizers/archive/refs/tags/python-v{{ version }}.tar.gz
sha256: 41cff8c8c87ba6dfbd9eb1d89b006aabb9c9823ffd09e281d6ddfb9ae695bd1a
patches:
- patches/0001-don-t-fork-on-windows.patch # [win]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not policing it, just recommending what I think is cleaner. But that's fine.

requires:
- pip
- pytest
- datasets
- numpy *
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat!

@setu4993 setu4993 merged commit 628815d into conda-forge:main Oct 28, 2022
@setu4993
Copy link
Member

Sorry it took a few days.

@h-vetinari h-vetinari deleted the tests branch October 28, 2022 03:43
@h-vetinari
Copy link
Member Author

Sorry it took a few days.

No worries! :)

Thanks for the review & merging!

@h-vetinari
Copy link
Member Author

Just saw that this broke due to a pretty banal pip check...

+ pip check
multiprocess 0.70.14 has requirement dill>=0.3.6, but you have dill 0.3.5.1.

I think I'll fix it directly in #56

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants