-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closes #63 | Create dataloader for MongabayConservation #538
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @megasiska86, thank you for implementing the dataloader for MongabayConservation.
Tested and works fine! Just some minor fix to the nits, which are provided in the suggestion.
Also, it's okay to remove the README.md as the instruction to call the data will be provided in the SEACrowd catalogue later.
(positive, neutral, negative) based on related topics. | ||
""" | ||
|
||
_HOMEPAGE = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_HOMEPAGE = "" | |
_HOMEPAGE = "https://huggingface.co/datasets/Datasaur/mongabay-experiment" |
|
||
_HOMEPAGE = "" | ||
|
||
_LICENSE = "The Unlicense (unlicense)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_LICENSE = "The Unlicense (unlicense)" | |
_LICENSE = Licenses.UNLICENSE.value |
|
||
from seacrowd.utils import schemas | ||
from seacrowd.utils.configs import SEACrowdConfig | ||
from seacrowd.utils.constants import Tasks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from seacrowd.utils.constants import Tasks | |
from seacrowd.utils.constants import Tasks, Licenses |
Here is my general review of the dataloader:
Here are my comments of the dataset:
tag-classification test result
sentiment-classification test result
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nitpicks
), | ||
] | ||
|
||
DEFAULT_CONFIG_NAME = f"{_DATASETNAME}_source" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be one of the config name defined previously.
_HOMEPAGE = "" | ||
|
||
_LICENSE = "The Unlicense (unlicense)" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_LOCAL = False | |
In that vein, I also have a question. As the dataset owner and the paper author, is there a reason why you used separate URLs for the subsets (e.g., Mongabay-tags-classification, Mongabay-sentiment-classification, mongabay-experiment) rather than using this unified dataset URL, @megasiska86? cc: @akhdanfadh @jen-santoso |
Thank you for the review.
cc: @akhdanfadh |
Nice concern @holylovenia |
I see. Could you use mongabay_collectionas the URL for this dataloader for simplicity? |
A friendly reminder for @megasiska86 to address the suggestions. 🙏 |
Hi @megasiska86, is there anything we can help you with for the dataloader? |
Okay, will do it in the end of this week. thank you |
Thanks @megasiska86! Feel free to ask @akhdanfadh and @jen-santoso if you have any questions or concerns! |
Hi @megasiska86, I would like to let you know that we plan to finalize the calculation of the open contributions (e.g., dataloader implementations) by 30 May, so it'd be great if we could wrap up the reviewing and merge this PR before then. |
Hi @megasiska86, I would like to let you know that we plan to finalize the calculation of the open contributions (e.g., dataloader implementations) in 31 hours, so it'd be great if we could wrap up the reviewing and merge this PR before then. |
Hi @megasiska86, thank you for contributing to SEACrowd! I would like to let you know that we are still looking forward to completing this PR (and dataloader issues) and maintaining SEACrowd Data Hub. We hope to enable access to as many standardized dataloaders as possible for SEA datasets. Feel free to continue the PR whenever you're available, and if you would like to re-assign this dataloader to someone else, just let us know and we can help. 💪 Thanks again! cc: @akhdanfadh @jen-santoso |
Please name your PR title and the first line of PR message after the issue it will close. You can use the following examples:
Title: Closes #63| Add/Update Dataloader Mongabay
First line PR Message: Closes #63
where you replace the {ISSUE_NUMBER} with the one corresponding to your dataset.
Checkbox
seacrowd/sea_datasets/{my_dataset}/{my_dataset}.py
(please use only lowercase and underscore for dataset folder naming, as mentioned in dataset issue) and its__init__.py
within{my_dataset}
folder._CITATION
,_DATASETNAME
,_DESCRIPTION
,_HOMEPAGE
,_LICENSE
,_LOCAL
,_URLs
,_SUPPORTED_TASKS
,_SOURCE_VERSION
, and_SEACROWD_VERSION
variables._info()
,_split_generators()
and_generate_examples()
in dataloader script.BUILDER_CONFIGS
class attribute is a list with at least oneSEACrowdConfig
for the source schema and one for a seacrowd schema.datasets.load_dataset
function.python -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py
orpython -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py --subset_id {subset_name_without_source_or_seacrowd_suffix}
.