-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
processing dataset errors #6
Comments
I have the same errors. I use Amazon review 2018, and download the review data and meta data. The video domain can catch the paper's data, but the movie domain's number is inconsistent. 311143 86678 is not the same as the paper "297,498 59,944" |
@xingjinshuo We used the Luxury_Beauty dataset not All_Beauty. The thresholds for the preprocessing are varying for each dataset. I guess, I did not implement the automatically to set the threshold based on the dataset. You should check the value of the threshold which is the number of minimum interactions. |
@ghdtjr I understand,Thanks for your answer. Is "Toys" the "Toys and Games" in the amazon review dataset? |
@xingjinshuo You're right. The "Toys" dataset means the "Toys and Games". |
Hello, your paper has mentioned you select 30K(your paper wrote 3K maybe is wrong),But the whole data after filter 4-cores is larger than 30K. Do you select the user randomly?or something else? Thank you for your reply! |
Where can I find the beauty and toys data sets?I found All_beauty and Toys_and_Games at https://datarepo.eng.ucsd.edu/mcauley_group/data/amazon_v2/categoryFiles/, but the number of users and items obtained after processing by data_preprocess.py is inconsistent with the article.
The text was updated successfully, but these errors were encountered: