Skip to content

Commit

Permalink
Merge pull request #2129 from recommenders-team/staging
Browse files Browse the repository at this point in the history
Staging to main: Fix to NewsRec, LightFM to extras, issue with scipy
  • Loading branch information
miguelgfierro authored Jul 10, 2024
2 parents 2f1d8ea + 3672c2e commit d333a0d
Show file tree
Hide file tree
Showing 15 changed files with 59 additions and 22 deletions.
7 changes: 7 additions & 0 deletions .github/ISSUE_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,11 @@
<!--- * The tests for SAR PySpark should pass successfully. -->


### Willingness to contribute
<!--- Go over all the following points, and put an `x` in the box that apply. -->
- [ ] Yes, I can contribute for this issue independently.
- [ ] Yes, I can contribute for this issue with guidance from Recommenders community.
- [ ] No, I cannot contribute at this time.


### Other Comments
6 changes: 6 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,10 @@ assignees: ''
<!--- For example: -->
<!--- * The tests for SAR PySpark should pass successfully. -->

### Willingness to contribute
<!--- Go over all the following points, and put an `x` in the box that apply. -->
- [ ] Yes, I can contribute for this issue independently.
- [ ] Yes, I can contribute for this issue with guidance from Recommenders community.
- [ ] No, I cannot contribute at this time.

### Other Comments
6 changes: 6 additions & 0 deletions .github/ISSUE_TEMPLATE/feature_request.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,10 @@ assignees: ''
<!--- For example: -->
<!--- *Adding algorithm xxx will help people understand more about xxx use case scenarios. -->

### Willingness to contribute
<!--- Go over all the following points, and put an `x` in the box that apply. -->
- [ ] Yes, I can contribute for this issue independently.
- [ ] Yes, I can contribute for this issue with guidance from Recommenders community.
- [ ] No, I cannot contribute at this time.

### Other Comments
6 changes: 6 additions & 0 deletions .github/ISSUE_TEMPLATE/general-ask.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,10 @@ assignees: ''
### Description
<!--- Describe your general ask in detail -->

### Willingness to contribute
<!--- Go over all the following points, and put an `x` in the box that apply. -->
- [ ] Yes, I can contribute for this issue independently.
- [ ] Yes, I can contribute for this issue with guidance from Recommenders community.
- [ ] No, I cannot contribute at this time.

### Other Comments
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ The table below lists the recommendation algorithms currently available in the r
| LightFM/Factorization Machine | Collaborative Filtering | Factorization Machine algorithm for both implicit and explicit feedbacks. It works in the CPU environment. | [Quick start](examples/02_model_collaborative_filtering/lightfm_deep_dive.ipynb) |
| LightGBM/Gradient Boosting Tree<sup>*</sup> | Content-Based Filtering | Gradient Boosting Tree algorithm for fast training and low memory usage in content-based problems. It works in the CPU/GPU/PySpark environments. | [Quick start in CPU](examples/00_quick_start/lightgbm_tinycriteo.ipynb) / [Deep dive in PySpark](examples/02_model_content_based_filtering/mmlspark_lightgbm_criteo.ipynb) |
| LightGCN | Collaborative Filtering | Deep learning algorithm which simplifies the design of GCN for predicting implicit feedback. It works in the CPU/GPU environment. | [Deep dive](examples/02_model_collaborative_filtering/lightgcn_deep_dive.ipynb) |
| GeoIMC<sup>*</sup> | Collaborative Filtering | Matrix completion algorithm that has into account user and item features using Riemannian conjugate gradients optimization and following a geometric approach. It works in the CPU environment. | [Quick start](examples/00_quick_start/geoimc_movielens.ipynb) |
| GeoIMC<sup>*</sup> | Collaborative Filtering | Matrix completion algorithm that takes into account user and item features using Riemannian conjugate gradient optimization and follows a geometric approach. It works in the CPU environment. | [Quick start](examples/00_quick_start/geoimc_movielens.ipynb) |
| GRU | Collaborative Filtering | Sequential-based algorithm that aims to capture both long and short-term user preferences using recurrent neural networks. It works in the CPU/GPU environment. | [Quick start](examples/00_quick_start/sequential_recsys_amazondataset.ipynb) |
| Multinomial VAE | Collaborative Filtering | Generative model for predicting user/item interactions. It works in the CPU/GPU environment. | [Deep dive](examples/02_model_collaborative_filtering/multi_vae_deep_dive.ipynb) |
| Neural Recommendation with Long- and Short-term User Representations (LSTUR)<sup>*</sup> | Content-Based Filtering | Neural recommendation algorithm for recommending news articles with long- and short-term user interest modeling. It works in the CPU/GPU environment. | [Quick start](examples/00_quick_start/lstur_MIND.ipynb) |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@
"source": [
"This notebook explains the concept of a Factorization Machine based model for recommendation, it also outlines the steps to construct a pure matrix factorization and a Factorization Machine using the [LightFM](https://github.com/lyst/lightfm) package. It also demonstrates how to extract both user and item affinity from a fitted model.\n",
"\n",
"*NOTE: LightFM is not available in the core package of Recommenders, to run this notebook, install the experimental package with `pip install recommenders[experimental]`.*\n",
"\n",
"## 1. Factorization Machine model\n",
"\n",
"### 1.1 Background\n",
Expand Down
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@
requires = [
"setuptools>=52",
"wheel>=0.36",
"numpy>=1.15",
"numpy>=1.15,<2",
]
dependencies = [
"setuptools>=52",
"wheel>=0.36",
"numpy>=1.15",
"numpy>=1.15,<2",
]
build-backend = "setuptools.build_meta"

Expand Down
2 changes: 1 addition & 1 deletion recommenders/datasets/movielens.py
Original file line number Diff line number Diff line change
Expand Up @@ -582,7 +582,7 @@ def unique_columns(df, *, columns):
return not df[columns].duplicated().any()


class MockMovielensSchema(pa.SchemaModel):
class MockMovielensSchema(pa.DataFrameModel):
"""
Mock dataset schema to generate fake data for testing purpose.
This schema is configured to mimic the Movielens dataset
Expand Down
2 changes: 1 addition & 1 deletion recommenders/datasets/pandas_df_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ def fit(self, df, col_rating=DEFAULT_RATING_COL):
types = df.dtypes
if not all(
[
x == object or np.issubdtype(x, np.integer) or x == np.float
x == object or np.issubdtype(x, np.integer) or x == float
for x in types
]
):
Expand Down
4 changes: 2 additions & 2 deletions recommenders/evaluation/python_evaluation.py
Original file line number Diff line number Diff line change
Expand Up @@ -435,9 +435,9 @@ def merge_ranking_true_pred(

# count the number of hits vs actual relevant items per user
df_hit_count = pd.merge(
df_hit.groupby(col_user, as_index=False)[col_user].agg({"hit": "count"}),
df_hit.groupby(col_user, as_index=False)[col_user].agg(hit="count"),
rating_true_common.groupby(col_user, as_index=False)[col_user].agg(
{"actual": "count"}
actual="count",
),
on=col_user,
)
Expand Down
3 changes: 2 additions & 1 deletion recommenders/models/deeprec/DataModel/ImplicitCF.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ def _data_processing(self, train, test):
user_idx = df[[self.col_user]].drop_duplicates().reindex()
user_idx[self.col_user + "_idx"] = np.arange(len(user_idx))
self.n_users = len(user_idx)
self.n_users_in_train = train[self.col_user].nunique()
self.user_idx = user_idx

self.user2id = dict(
Expand Down Expand Up @@ -210,7 +211,7 @@ def sample_neg(x):
if neg_id not in x:
return neg_id

indices = range(self.n_users)
indices = range(self.n_users_in_train)
if self.n_users < batch_size:
users = [random.choice(indices) for _ in range(batch_size)]
else:
Expand Down
4 changes: 4 additions & 0 deletions recommenders/models/newsrec/models/base_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,8 @@ def fit(
valid_behaviors_file,
test_news_file=None,
test_behaviors_file=None,
step_limit=None,

):
"""Fit the model with train_file. Evaluate the model on valid_file per epoch to observe the training status.
If test_news_file is not None, evaluate it too.
Expand All @@ -212,6 +214,8 @@ def fit(
)

for batch_data_input in tqdm_util:
if step_limit is not None and step>=step_limit:
break

step_result = self.train(batch_data_input)
step_data_loss = step_result
Expand Down
8 changes: 4 additions & 4 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,22 +28,21 @@

install_requires = [
"category-encoders>=2.6.0,<3", # requires packaging
"cornac>=1.15.2,<2", # requires packaging, tqdm
"cornac>=1.15.2,<3", # requires packaging, tqdm
"hyperopt>=0.2.7,<1",
"lightfm>=1.17,<2", # requires requests
"lightgbm>=4.0.0,<5",
"locust>=2.12.2,<3", # requires jinja2
"memory-profiler>=0.61.0,<1",
"nltk>=3.8.1,<4", # requires tqdm
"notebook>=7.0.0,<8", # requires ipykernel, jinja2, jupyter, nbconvert, nbformat, packaging, requests
"notebook>=6.5.5,<8", # requires ipykernel, jinja2, jupyter, nbconvert, nbformat, packaging, requests
"numba>=0.57.0,<1",
"pandas>2.0.0,<3.0.0", # requires numpy
"pandera[strategies]>=0.6.5,<0.18;python_version<='3.8'", # For generating fake datasets
"pandera[strategies]>=0.15.0;python_version>='3.9'",
"retrying>=1.3.4,<2",
"scikit-learn>=1.2.0,<2", # requires scipy, and introduce breaking change affects feature_extraction.text.TfidfVectorizer.min_df
"scikit-surprise>=1.1.3",
"scipy>=1.10.1",
"scipy>=1.10.1,<=1.13.1", # FIXME: Remove scipy<=1.13.1 once cornac release a version newer than 2.2.1. See #2128
"seaborn>=0.13.0,<1", # requires matplotlib, packaging
"transformers>=4.27.0,<5", # requires packaging, pyyaml, requests, tqdm
]
Expand Down Expand Up @@ -80,6 +79,7 @@
# nni needs to be upgraded
"nni==1.5",
"pymanopt>=0.2.5",
"lightfm>=1.17,<2",
]

# The following dependency can be installed as below, however PyPI does not allow direct URLs.
Expand Down
17 changes: 11 additions & 6 deletions tests/ci/azureml_tests/test_groups.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,6 @@
"tests/functional/examples/test_notebooks_python.py::test_geoimc_functional", # 1006.19s
#
"tests/functional/examples/test_notebooks_python.py::test_benchmark_movielens_cpu", # 58s
#
"tests/functional/examples/test_notebooks_python.py::test_lightfm_functional",
],
"group_cpu_003": [ # Total group time: 2253s
"tests/data_validation/recommenders/datasets/test_criteo.py::test_download_criteo_sample", # 1.05s
Expand Down Expand Up @@ -237,10 +235,6 @@
"tests/unit/recommenders/models/test_geoimc.py::test_imcproblem",
"tests/unit/recommenders/models/test_geoimc.py::test_inferer_init",
"tests/unit/recommenders/models/test_geoimc.py::test_inferer_infer",
"tests/unit/recommenders/models/test_lightfm_utils.py::test_interactions",
"tests/unit/recommenders/models/test_lightfm_utils.py::test_fitting",
"tests/unit/recommenders/models/test_lightfm_utils.py::test_sim_users",
"tests/unit/recommenders/models/test_lightfm_utils.py::test_sim_items",
"tests/unit/recommenders/models/test_sar_singlenode.py::test_init",
"tests/unit/recommenders/models/test_sar_singlenode.py::test_fit",
"tests/unit/recommenders/models/test_sar_singlenode.py::test_predict",
Expand Down Expand Up @@ -453,3 +447,14 @@
"tests/unit/examples/test_notebooks_gpu.py::test_gpu_vm",
],
}

# Experimental are additional test groups that require to install extra dependencies: pip install .[experimental]
experimental_test_groups = {
"group_cpu_001": [
"tests/unit/recommenders/models/test_lightfm_utils.py::test_interactions",
"tests/unit/recommenders/models/test_lightfm_utils.py::test_fitting",
"tests/unit/recommenders/models/test_lightfm_utils.py::test_sim_users",
"tests/unit/recommenders/models/test_lightfm_utils.py::test_sim_items",
"tests/functional/examples/test_notebooks_python.py::test_lightfm_functional",
]
}
8 changes: 4 additions & 4 deletions tests/smoke/recommenders/recommender/test_newsrec_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ def test_model_nrms(mind_resource_path):
assert model.run_eval(valid_news_file, valid_behaviors_file) is not None
assert isinstance(
model.fit(
train_news_file, train_behaviors_file, valid_news_file, valid_behaviors_file
train_news_file, train_behaviors_file, valid_news_file, valid_behaviors_file,step_limit=10
),
BaseModel,
)
Expand Down Expand Up @@ -115,7 +115,7 @@ def test_model_naml(mind_resource_path):
assert model.run_eval(valid_news_file, valid_behaviors_file) is not None
assert isinstance(
model.fit(
train_news_file, train_behaviors_file, valid_news_file, valid_behaviors_file
train_news_file, train_behaviors_file, valid_news_file, valid_behaviors_file,step_limit=10
),
BaseModel,
)
Expand Down Expand Up @@ -166,7 +166,7 @@ def test_model_lstur(mind_resource_path):
assert model.run_eval(valid_news_file, valid_behaviors_file) is not None
assert isinstance(
model.fit(
train_news_file, train_behaviors_file, valid_news_file, valid_behaviors_file
train_news_file, train_behaviors_file, valid_news_file, valid_behaviors_file,step_limit=10
),
BaseModel,
)
Expand Down Expand Up @@ -217,7 +217,7 @@ def test_model_npa(mind_resource_path):
assert model.run_eval(valid_news_file, valid_behaviors_file) is not None
assert isinstance(
model.fit(
train_news_file, train_behaviors_file, valid_news_file, valid_behaviors_file
train_news_file, train_behaviors_file, valid_news_file, valid_behaviors_file,step_limit=10
),
BaseModel,
)

0 comments on commit d333a0d

Please sign in to comment.