Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create dataset loader for MongabayConservationDataset #63

Open
SamuelCahyawijaya opened this issue Nov 19, 2023 · 27 comments · May be fixed by #538
Open

Create dataset loader for MongabayConservationDataset #63

SamuelCahyawijaya opened this issue Nov 19, 2023 · 27 comments · May be fixed by #538
Assignees
Labels
help wanted Extra attention is needed pr-ready A PR that closes this issue is Ready to be reviewed

Comments

@SamuelCahyawijaya
Copy link
Collaborator

SamuelCahyawijaya commented Nov 19, 2023

Dataloader name: mongabay/mongabay.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?mongabay

Dataset mongabay
Description Conservation dataset that was collected from mongabay.co.id contains topic-classification task (multi-label format) and sentiment classification. The dataset consists of 31 important topics that are commonly found in Indonesian conservation articles or general news, and each article can belong to more than one topic. After gathering topics for each article, each article will be classified into one of author's sentiments (positive, neutral, negative) based on related topics.
Subsets Multi-label, Sentiment-classification
Languages ind
Tasks Sentiment Analysis, Topic Modeling
License The Unlicense (unlicense)
Homepage https://huggingface.co/datasets/Datasaur/mongabay-experiment
HF URL https://huggingface.co/datasets/Datasaur/mongabay-experiment
Paper URL https://arxiv.org/pdf/2310.11258.pdf
@SamuelCahyawijaya SamuelCahyawijaya converted this from a draft issue Nov 19, 2023
@elyanah-aco
Copy link
Collaborator

elyanah-aco commented Nov 19, 2023

#self-assign

@megasiska86
Copy link

Hi all sorry for late response, I think it's better if I assign myself to create the dataloader for this dataset.
But it's okay too if @elyanah-aco alread self-assign to build it. Please let me know if anyone needs my help or explanation regarding this dataset. Thank you all

@elyanah-aco elyanah-aco removed their assignment Nov 22, 2023
@elyanah-aco
Copy link
Collaborator

Hello @megasiska86, no problem if you want to handle the dataloader for this (I haven't started on it anyways). You can assign yourself now

@megasiska86
Copy link

thank you @elyanah-aco , anyway how to assign myself here? I can't click Assignees column
Screenshot from 2023-11-22 12-36-13

@elyanah-aco
Copy link
Collaborator

elyanah-aco commented Nov 22, 2023

@megasiska86 Please comment "#self-assign" just like I did in first comment

@megasiska86
Copy link

#self-assign

@holylovenia
Copy link
Contributor

holylovenia commented Dec 10, 2023

Hi @megasiska86, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help.

@megasiska86
Copy link

yeah still working on it

@holylovenia
Copy link
Contributor

Okay then, @megasiska86. Feel free to let us know if you need any help!

Copy link

github-actions bot commented Jan 2, 2024

Hi, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help.

@sabilmakbar
Copy link
Collaborator

Hi @megasiska86, may I know the progress of this dataloader creation? Since it has passed 2+2 weeks of expected completion of dataloader, I will clear the assignee if no update is received by Monday 12 PM UTC.

@sabilmakbar sabilmakbar added the help wanted Extra attention is needed label Jan 15, 2024
@Enliven26
Copy link
Contributor

#self-assign

Copy link

Hi @, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help.

@Enliven26
Copy link
Contributor

yes

@Enliven26
Copy link
Contributor

Enliven26 commented Feb 24, 2024

There is a problem with the softlabel column in the training data. For "TEXT" schema feature, it is required to specify the possible values for the string. For validation and test data, it is ["negatif", "positif", "netral"]. However, softlabel column value in the traning data is in the form of array of floats, which I currently don't know what rule to turn it into the possible 3 values. How can I resolve this issue? @sabilmakbar . I might also need help on the citation.

@Enliven26
Copy link
Contributor

Also for multi-label subset, the softlabel in validation and test data is also in the form of array

@holylovenia
Copy link
Contributor

holylovenia commented Feb 25, 2024

There is a problem with the softlabel column in the training data. For "TEXT" schema feature, it is required to specify the possible values for the string. For validation and test data, it is ["negatif", "positif", "netral"]. However, softlabel column value in the traning data is in the form of array of floats, which I currently don't know what rule to turn it into the possible 3 values. How can I resolve this issue? @sabilmakbar . I might also need help on the citation.

Also for multi-label subset, the softlabel in validation and test data is also in the form of array

Hi @megasiska86, could you please help answer @Enliven26's questions as the dataset owner?

I also have a related question, @megasiska86. The huggingface dataset doesn't seem to have the labels for the topic classification task, so where do we get them?

@Enliven26
Copy link
Contributor

Enliven26 commented Mar 7, 2024

May I get an update about the question? I apologize since I haven't gotten any time to read the paper to find the rule to convert the softlabel into single-value label. Also, since in tags classification subset the label is in the form of array of tags, I think "TEXT" schema cant be used (?).

@holylovenia
Copy link
Contributor

There is a problem with the softlabel column in the training data. For "TEXT" schema feature, it is required to specify the possible values for the string. For validation and test data, it is ["negatif", "positif", "netral"]. However, softlabel column value in the traning data is in the form of array of floats, which I currently don't know what rule to turn it into the possible 3 values. How can I resolve this issue? @sabilmakbar . I might also need help on the citation.

Also for multi-label subset, the softlabel in validation and test data is also in the form of array

Hi @megasiska86, could you please help answer @Enliven26's questions as the dataset owner?

I also have a related question, @megasiska86. The huggingface dataset doesn't seem to have the labels for the topic classification task, so where do we get them?

Let me try mentioning @megasiska86 again in case she missed it, @Enliven26.

@megasiska86
Copy link

megasiska86 commented Mar 17, 2024

I apologize for missed it for a long time @holylovenia
thank you for your help @Enliven26

I've updated the homepage of my dataset
can be checked here

@megasiska86
Copy link

May I get an update about the question? I apologize since I haven't gotten any time to read the paper to find the rule to convert the softlabel into single-value label. Also, since in tags classification subset the label is in the form of array of tags, I think "TEXT" schema cant be used (?).

Okay, I'll try move it to text2text schema. Thank you

@megasiska86
Copy link

#self-assign

@megasiska86
Copy link

Hi @holylovenia
I've finished the dataloader, but got this access error when push it, may I ask the write access again? Thank you very much

[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

@holylovenia
Copy link
Contributor

holylovenia commented Mar 18, 2024

Hi @holylovenia I've finished the dataloader, but got this access error when push it, may I ask the write access again? Thank you very much

[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Hi @megasiska86, can you make a pull request instead of trying to push to master? Please take a look at the guide here. Thanks for your contribution, @megasiska86.

On another note, this issue had been #self-assigned by @Enliven26 when @megasiska86 #self-assigned. For this case, I let it pass because it seems that @Enliven26 hasn't started implementing the dataloader, but next time please only #self-assign dataloader issues with no assignee, @megasiska86.

Sorry for the inconvenience, @Enliven26. 🙏

@megasiska86
Copy link

Hi @megasiska86, can you make a pull request instead of trying to push to master? Please take a look at the guide here. Thanks for your contribution, @megasiska86.

Yeah I've tried to push it to the new branch and will create PR from that branch, but got this permission error. This permission error also comes up when I tried to clone the repository

On another note, this issue had been #self-assigned by @Enliven26 when @megasiska86 #self-assigned. For this case, I let it pass because it seems that @Enliven26 hasn't started implementing the dataloader, but next time please only #self-assign dataloader issues with no assignee, @megasiska86.

Sorry for the inconvenience, @Enliven26. 🙏

Really sorry for the inconvenience 🙏 . I tried to #self-assign because I thought it would give me access to push to the branch, but it didn't. I also creeated the dataloader to solve issue raised in this comment

May I get an update about the question? I apologize since I haven't gotten any time to read the paper to find the rule to convert the softlabel into single-value label. Also, since in tags classification subset the label is in the form of array of tags, I think "TEXT" schema cant be used (?).

Again sorry for the inconvenience.

@holylovenia
Copy link
Contributor

Hi @megasiska86, can you make a pull request instead of trying to push to master? Please take a look at the guide here. Thanks for your contribution, @megasiska86.

Yeah I've tried to push it to the new branch and will create PR from that branch, but got this permission error. This permission error also comes up when I tried to clone the repository

Have you tried forking (not cloning) the SEACrowd/seacrowd-datahub repo? Here is a detailed guide on how to fork the repo and submit the dataloader.

@megasiska86
Copy link

Hi @megasiska86, can you make a pull request instead of trying to push to master? Please take a look at the guide here. Thanks for your contribution, @megasiska86.

Yeah I've tried to push it to the new branch and will create PR from that branch, but got this permission error. This permission error also comes up when I tried to clone the repository

Have you tried forking (not cloning) the SEACrowd/seacrowd-datahub repo? Here is a detailed guide on how to fork the repo and submit the dataloader.

I will try it, thank you 🙏

@megasiska86 megasiska86 linked a pull request Mar 18, 2024 that will close this issue
8 tasks
@sabilmakbar sabilmakbar added the pr-ready A PR that closes this issue is Ready to be reviewed label Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed pr-ready A PR that closes this issue is Ready to be reviewed
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

6 participants