Skip to content

Commit

Permalink
add minor edits to the pretrained embeddings transformer example (#1150)
Browse files Browse the repository at this point in the history
  • Loading branch information
radekosmulski committed Jun 20, 2023
1 parent 7a0e221 commit c8503ec
Showing 1 changed file with 8 additions and 8 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -204,13 +204,13 @@
"\n",
"In particular, the `skus` dataset contains the mapping between the `product_sku_hash` (essentially an item id) to the `description_vector` -- an embedding obtained from the description.\n",
"\n",
"This is a piece of information that we would like to use in our model. In order to do so, we need to map the `product_sku_hash` information to an id.\n",
"We would like to enable our model to make use of this piece of information. In order to feed this data to our model, we need to map the `product_sku_hash` to an id.\n",
"\n",
"But we need to make sure that the way we process `skus` and the `train` dataset (event information) is consistent. That the same `product_sku_hash` is mapped to the same id both when processing `skus` and `train`.\n",
"But we need to make sure that the way we process `skus` and the `train` dataset (event information) is consistent, that the same `product_sku_hash` is mapped to the same id both when processing `skus` and `train`.\n",
"\n",
"We do so by defining and fitting a `Categorify` op once and using it to process both the `skus` and the `train` datasets.\n",
"\n",
"Additionally, we apply some further processing to the `train` dataset. We group rows of data by `session_id_hash` so that each training example will contain events from a single customer visit to the online store arranged in chronological order.\n",
"Additionally, we apply some further processing to the `train` dataset. We group rows by `session_id_hash` so that each training example will contain events from a single customer visit to the online store arranged in chronological order.\n",
"\n",
"If you would like to learn more about leveraging `NVTabular` to process tabular data on the GPU using a set of industry standard operators, please consult the examples available [here](https://github.com/NVIDIA-Merlin/NVTabular/tree/main/examples).\n",
"\n",
Expand Down Expand Up @@ -262,7 +262,7 @@
"id": "45a4828e",
"metadata": {},
"source": [
"Here are a couple of example rows from `train` transformed."
"Here are a couple of example rows from `train_transformed`."
]
},
{
Expand Down Expand Up @@ -575,7 +575,7 @@
"source": [
"Let us now export the embedding information to a `numpy` array and write it to disk.\n",
"\n",
"We will later pass this information so that the `Loader` will load the correct emebedding for the product corresponding to the given step of a customer journey.\n",
"We will later pass this information to the `Loader` so that it will load the correct emebedding for the product corresponding to a given step of a customer journey.\n",
"\n",
"The embeddings are linked to the train set using the `product_sku_hash` information."
]
Expand All @@ -599,7 +599,7 @@
"\n",
"The `product_sku_hash` ids have been exported along with the embeddings and are contained in the first column of the output `numpy` array.\n",
"\n",
"Here is the id of the first embedding stored in `skus.npy`."
"Here is the id of the first embedding stored in `skus.npy`:"
]
},
{
Expand Down Expand Up @@ -694,7 +694,7 @@
"\n",
"Depending on the hardware that you will be running this on and the size of the dataset that you will be using, should you run out of GPU memory, you can specify one of the several parameters that can ease the memory load (`npartitions`, `part_size`, or `part_mem_fraction`).\n",
"\n",
"The `BATCH_SIZE` of 16 should work on a broad set of hardware, but if you are training on a lot of data and your hardware permitting, you might want to significantly increase it."
"The `BATCH_SIZE` of 16 should work on a broad set of hardware, but if you are training on a lot of data and your hardware permitting you might want to significantly increase it."
]
},
{
Expand Down Expand Up @@ -761,7 +761,7 @@
"id": "4f037d5d",
"metadata": {},
"source": [
"Using the `EmbeddingOperator` object we referenced our `embeddings` and advised the model what to use as a key to look up the information.\n",
"Using the `EmbeddingOperator` object we referenced our `product_embeddings` and insructed the model what to use as a key to look up the information.\n",
"\n",
"Below is an example batch of data that our model will consume."
]
Expand Down

0 comments on commit c8503ec

Please sign in to comment.