Prepare for cross validation-based benchmarking #60

takuti · 2022-03-21T14:08:09Z

Review and tweak cross_validation and evaluate for #26

In this case, use `length(pred)` as a recommendation size.

codecov-commenter · 2022-03-22T15:02:38Z

Codecov Report

Merging #60 (79a757a) into master (3d7ed2e) will increase coverage by 0.09%.
The diff coverage is 98.33%.

@@            Coverage Diff             @@
##           master      #60      +/-   ##
==========================================
+ Coverage   80.14%   80.24%   +0.09%     
==========================================
  Files          26       26              
  Lines         801      815      +14     
==========================================
+ Hits          642      654      +12     
- Misses        159      161       +2

Impacted Files	Coverage Δ
src/metrics/base.jl	`0.00% <0.00%> (ø)`
src/metrics/ranking.jl	`95.08% <94.44%> (-1.15%)`	⬇️
src/base_recommender.jl	`96.00% <100.00%> (ø)`
src/baseline/co_occurrence.jl	`100.00% <100.00%> (ø)`
src/baseline/item_mean.jl	`100.00% <100.00%> (ø)`
src/baseline/most_popular.jl	`100.00% <100.00%> (ø)`
src/baseline/threshold_percentage.jl	`100.00% <100.00%> (ø)`
src/baseline/user_mean.jl	`100.00% <100.00%> (ø)`
src/data_accessor.jl	`100.00% <100.00%> (ø)`
src/evaluation/cross_validation.jl	`100.00% <100.00%> (ø)`
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3d7ed2e...79a757a. Read the comment docs.

`ealuate()` unnecessarily made predictions for all user-item pairs. Comparison must be done between truth vs. pred.

Returned list of item-score tuples from `recommend` is already sorted by the scores.

when a recommender is evaluated by a ranking metric.

`truth` must be a ranked list of observed items for correct evaluation.

Cross validation has some randomness, and it may or may not return very poor/good result.

Adjust cross validation test cases to increase the probability of seeing an empty `truth` list.

If `n` equals to the number of all samples, `n`-fold CV is same as LOOCV.

Top-k recommendation for every single user is costly. It'd be recommended to parallelize whenever possible.

by checking the size of test samples

takuti added 5 commits March 21, 2022 07:07

Disallow n_folds<2, which returns no training data

645d10e

Logging cross validation results

fb6988a

Rename function args: u, i -> user, item

16bf8e1

Rename variable: k -> topk

e8312a5

Allow ranking metrics to take topk=nothing

4b09626

In this case, use `length(pred)` as a recommendation size.

takuti added 12 commits March 23, 2022 07:03

Evaluate accuracy only on truth samples

c2cecb4

`ealuate()` unnecessarily made predictions for all user-item pairs. Comparison must be done between truth vs. pred.

Pass topk for ranking evaluation

314701d

Remove unnecessary sort in evaluation for ranking metric

70bab16

Returned list of item-score tuples from `recommend` is already sorted by the scores.

Use sortperm to get ranked list of truth items

ee03e17

Make recommendation only for unobserved items

07e6573

when a recommender is evaluated by a ranking metric.

Drop unobserved items from truth for ranking evaluation

54f6747

`truth` must be a ranked list of observed items for correct evaluation.

Avoid topk to be more than recommendation size

b66673b

Treak test criteria so cross validation can pass as expected

83e0c0d

Cross validation has some randomness, and it may or may not return very poor/good result.

Skip recommendation and measurement when truth is empty

71082ff

Adjust cross validation test cases to increase the probability of seeing an empty `truth` list.

Explicitly support leave-one-out cross validation

2b6e9cf

If `n` equals to the number of all samples, `n`-fold CV is same as LOOCV.

Support multi-threadhing for ranking evaluation

45025aa

Top-k recommendation for every single user is costly. It'd be recommended to parallelize whenever possible.

Add leave-one-out CV as a synonym of k-fold CV

4cd0ced

takuti changed the title ~~Benchmark with all {recommender, metric, dataset} pairs~~ Prepare for cross validation-based benchmarking Apr 3, 2022

Refactor with k-fold splitter for DataAccessor

1ec68b7

takuti force-pushed the notebook branch from 87f9eec to 1ec68b7 Compare April 3, 2022 14:26

Remove accuracy=NaN check as it's prevented at upstream

79a757a

by checking the size of test samples

takuti merged commit 6082408 into master Apr 3, 2022

takuti deleted the notebook branch April 3, 2022 14:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prepare for cross validation-based benchmarking #60

Prepare for cross validation-based benchmarking #60

takuti commented Mar 21, 2022 •

edited

Loading

codecov-commenter commented Mar 22, 2022 •

edited

Loading

Prepare for cross validation-based benchmarking #60

Prepare for cross validation-based benchmarking #60

Conversation

takuti commented Mar 21, 2022 • edited Loading

codecov-commenter commented Mar 22, 2022 • edited Loading

Codecov Report

takuti commented Mar 21, 2022 •

edited

Loading

codecov-commenter commented Mar 22, 2022 •

edited

Loading