ml6team · TDehaene · Dec 20, 2021 · Dec 20, 2021 · Dec 20, 2021 · kbuschme
diff --git a/nlp/README.md b/nlp/README.md
@@ -2,40 +2,40 @@
 
 Current content:
 
--  [_Multilingual Sentence Embeddings_ (21/01/2021)](2021_01_21_multilingual_sentence_embeddings):
+-  [_Multilingual Sentence Embeddings_](multilingual_sentence_embeddings):
 Gives an overview of various current multilingual sentence embedding techniques and tools, and
 how they compare given various sequence lengths.
 
--  [_Spacy 3.0_ (01/02/2021)](2021_02_01_spacy_3_projects):
+-  [_Spacy 3.0_](spacy_3_projects):
 Spacy 3.0 has just been released and in this tip, we'll have a look at some of the new features.
 We'll be training a German NER model and streamline the end-to-end pipeline using the brand new spaCy projects!
 
--  [_Compact transformers_ (26/02/2021)](2021_02_26_compact_transformers):
+-  [_Compact transformers_](compact_transformers):
 Bigger isn't always better. In this tip we look at some compact BERT-based models that provide a nice balance
 between computational resources and model accuracy.
 
--  [_Keyword Extraction with pke_ (18/03/2021)](2021_03_18_pke_keyword_extraction):
+-  [_Keyword Extraction with pke_](pke_keyword_extraction):
 The KEYNG (read *king*) is dead, long live the KEYNG!
 In this tip we look at `pke`, an alternative to Gensim for keyword extraction.
 
--  [_Explainable transformers using SHAP_ (22/04/2021)](2021_04_22_shap_for_huggingface_transformers):
+-  [_Explainable transformers using SHAP_](shap_for_huggingface_transformers):
 BERT, explain yourself! 📖
 Up until recently language model predictions have lacked transparency. In this tip we look at `SHAP`, a way to explain your latest transformer based models.
 
--  [_Transformer-based Data Augmentation_ (18/06/2021)](2021_06_18_data_augmentation):
+-  [_Transformer-based Data Augmentation_](data_augmentation):
 Ever struggled with having a limited non-English NLP dataset for a project? Fear not, data augmentation to the rescue ⛑️
 In this week's tip, we look at backtranslation 🔀 and contextual word embedding insertions as data augmentation techniques for multilingual NLP. 
 
--  [_Long range transformers_ (14/07/2021)](2021_06_29_long_range_transformers):
+-  [_Long range transformers_](long_range_transformers):
 Beyond and above the 512! 🏅 In this week's tip, we look at novel long range transformer architectures and compare them against the well-known RoBERTa model.
 
--  [_Neural Keyword Extraction_ (10/09/2021)](2021_09_10_neural_keyword_extraction):
+-  [_Neural Keyword Extraction_](neural_keyword_extraction):
 Neural Keyword Extraction 🧠
 In this week's tip, we look at neural keyword extraction methods and how they compare to classical methods.
 
--  [_HuggingFace Optimum_ (12/10/2021)](2021_10_12_huggingface_optimum):
+-  [_HuggingFace Optimum_](huggingface_optimum):
 HuggingFace Optimum Quantization ✂️
 In this week's tip, we take a look at the new HuggingFace Optimum package to check out some model quantization techniques.
 
-- [ _Text Augmentation using large-scale LMs and prompt engineering_ (25/11/2021)](2021_11_25_augmentation_lm):
+- [ _Text Augmentation using large-scale LMs and prompt engineering_](augmentation_lm):
 Typically, the more data we have, the better performance we can achieve 🤙. However, it is sometimes difficult and/or expensive to annotate a large amount of training data 😞. In this tip, we leverage three large-scale LMs (GPT-3, GPT-J and GPT-Neo) to generate very realistic samples from a very small dataset.
diff --git a/nlp/2021_11_25_augmentation_lm/README.md → nlp/augmentation_lm/README.md b/nlp/2021_11_25_augmentation_lm/README.md → nlp/augmentation_lm/README.md
@@ -5,4 +5,4 @@ Typically, the more data we have, the better performance we can achieve 🤙. Ho
 Large-scale language models (LMs) are excellent few-shot learners, allowing them to be controlled via natural text prompts. In this tip, we leverage three large-scale LMs (GPT-3, GPT-J and GPT-Neo) and prompt engineering to generate very realistic samples from a very small dataset. The model takes as input two real samples from our dataset, embeds them in a carefully designed prompt and generates an augmented mixed sample influenced by the sample sentences. We use the [Emotion](https://huggingface.co/datasets/emotion) dataset and distilled BERT pre-trained model and show that this augmentation method boosts the model performance and generates very realistic samples. For more information on text augmentation using large-scale LMs check [GPT3Mix](https://arxiv.org/pdf/2104.08826.pdf).
 
 We recommend to open the notebook using Colab for an interactive explainable experience and optimal rendering of the visuals 👇:
-[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ml6team/quick-tips/blob/main/nlp/2021_11_25_augmentation_lm/nlp_augmentation_lm.ipynb)
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ml6team/quick-tips/blob/main/nlp/augmentation_lm/nlp_augmentation_lm.ipynb)
diff --git a/...gmentation_lm/data/gpt-3/10/dataset.arrow → ...gmentation_lm/data/gpt-3/10/dataset.arrow b/...gmentation_lm/data/gpt-3/10/dataset.arrow → ...gmentation_lm/data/gpt-3/10/dataset.arrow
diff --git a/...tation_lm/data/gpt-3/10/dataset_info.json → ...tation_lm/data/gpt-3/10/dataset_info.json b/...tation_lm/data/gpt-3/10/dataset_info.json → ...tation_lm/data/gpt-3/10/dataset_info.json
diff --git a/..._augmentation_lm/data/gpt-3/10/state.json → nlp/augmentation_lm/data/gpt-3/10/state.json b/..._augmentation_lm/data/gpt-3/10/state.json → nlp/augmentation_lm/data/gpt-3/10/state.json
diff --git a/...mentation_lm/data/gpt-3/100/dataset.arrow → ...mentation_lm/data/gpt-3/100/dataset.arrow b/...mentation_lm/data/gpt-3/100/dataset.arrow → ...mentation_lm/data/gpt-3/100/dataset.arrow
diff --git a/...ation_lm/data/gpt-3/100/dataset_info.json → ...ation_lm/data/gpt-3/100/dataset_info.json b/...ation_lm/data/gpt-3/100/dataset_info.json → ...ation_lm/data/gpt-3/100/dataset_info.json
diff --git a/...augmentation_lm/data/gpt-3/100/state.json → ...augmentation_lm/data/gpt-3/100/state.json b/...augmentation_lm/data/gpt-3/100/state.json → ...augmentation_lm/data/gpt-3/100/state.json
diff --git a/...mentation_lm/data/gpt-3/200/dataset.arrow → ...mentation_lm/data/gpt-3/200/dataset.arrow b/...mentation_lm/data/gpt-3/200/dataset.arrow → ...mentation_lm/data/gpt-3/200/dataset.arrow
diff --git a/...ation_lm/data/gpt-3/200/dataset_info.json → ...ation_lm/data/gpt-3/200/dataset_info.json b/...ation_lm/data/gpt-3/200/dataset_info.json → ...ation_lm/data/gpt-3/200/dataset_info.json
diff --git a/...augmentation_lm/data/gpt-3/200/state.json → ...augmentation_lm/data/gpt-3/200/state.json b/...augmentation_lm/data/gpt-3/200/state.json → ...augmentation_lm/data/gpt-3/200/state.json
diff --git a/...mentation_lm/data/gpt-3/300/dataset.arrow → ...mentation_lm/data/gpt-3/300/dataset.arrow b/...mentation_lm/data/gpt-3/300/dataset.arrow → ...mentation_lm/data/gpt-3/300/dataset.arrow
diff --git a/...ation_lm/data/gpt-3/300/dataset_info.json → ...ation_lm/data/gpt-3/300/dataset_info.json b/...ation_lm/data/gpt-3/300/dataset_info.json → ...ation_lm/data/gpt-3/300/dataset_info.json
diff --git a/...augmentation_lm/data/gpt-3/300/state.json → ...augmentation_lm/data/gpt-3/300/state.json b/...augmentation_lm/data/gpt-3/300/state.json → ...augmentation_lm/data/gpt-3/300/state.json
diff --git a/...mentation_lm/data/gpt-3/400/dataset.arrow → ...mentation_lm/data/gpt-3/400/dataset.arrow b/...mentation_lm/data/gpt-3/400/dataset.arrow → ...mentation_lm/data/gpt-3/400/dataset.arrow
diff --git a/...ation_lm/data/gpt-3/400/dataset_info.json → ...ation_lm/data/gpt-3/400/dataset_info.json b/...ation_lm/data/gpt-3/400/dataset_info.json → ...ation_lm/data/gpt-3/400/dataset_info.json
diff --git a/...augmentation_lm/data/gpt-3/400/state.json → ...augmentation_lm/data/gpt-3/400/state.json b/...augmentation_lm/data/gpt-3/400/state.json → ...augmentation_lm/data/gpt-3/400/state.json
diff --git a/...gmentation_lm/data/gpt-3/50/dataset.arrow → ...gmentation_lm/data/gpt-3/50/dataset.arrow b/...gmentation_lm/data/gpt-3/50/dataset.arrow → ...gmentation_lm/data/gpt-3/50/dataset.arrow
diff --git a/...tation_lm/data/gpt-3/50/dataset_info.json → ...tation_lm/data/gpt-3/50/dataset_info.json b/...tation_lm/data/gpt-3/50/dataset_info.json → ...tation_lm/data/gpt-3/50/dataset_info.json
diff --git a/..._augmentation_lm/data/gpt-3/50/state.json → nlp/augmentation_lm/data/gpt-3/50/state.json b/..._augmentation_lm/data/gpt-3/50/state.json → nlp/augmentation_lm/data/gpt-3/50/state.json
diff --git a/...mentation_lm/data/gpt-3/500/dataset.arrow → ...mentation_lm/data/gpt-3/500/dataset.arrow b/...mentation_lm/data/gpt-3/500/dataset.arrow → ...mentation_lm/data/gpt-3/500/dataset.arrow
diff --git a/...ation_lm/data/gpt-3/500/dataset_info.json → ...ation_lm/data/gpt-3/500/dataset_info.json b/...ation_lm/data/gpt-3/500/dataset_info.json → ...ation_lm/data/gpt-3/500/dataset_info.json
diff --git a/...augmentation_lm/data/gpt-3/500/state.json → ...augmentation_lm/data/gpt-3/500/state.json b/...augmentation_lm/data/gpt-3/500/state.json → ...augmentation_lm/data/gpt-3/500/state.json
diff --git a/...gmentation_lm/data/gpt-j/10/dataset.arrow → ...gmentation_lm/data/gpt-j/10/dataset.arrow b/...gmentation_lm/data/gpt-j/10/dataset.arrow → ...gmentation_lm/data/gpt-j/10/dataset.arrow
diff --git a/...tation_lm/data/gpt-j/10/dataset_info.json → ...tation_lm/data/gpt-j/10/dataset_info.json b/...tation_lm/data/gpt-j/10/dataset_info.json → ...tation_lm/data/gpt-j/10/dataset_info.json
diff --git a/..._augmentation_lm/data/gpt-j/10/state.json → nlp/augmentation_lm/data/gpt-j/10/state.json b/..._augmentation_lm/data/gpt-j/10/state.json → nlp/augmentation_lm/data/gpt-j/10/state.json
diff --git a/...mentation_lm/data/gpt-j/100/dataset.arrow → ...mentation_lm/data/gpt-j/100/dataset.arrow b/...mentation_lm/data/gpt-j/100/dataset.arrow → ...mentation_lm/data/gpt-j/100/dataset.arrow
diff --git a/...ation_lm/data/gpt-j/100/dataset_info.json → ...ation_lm/data/gpt-j/100/dataset_info.json b/...ation_lm/data/gpt-j/100/dataset_info.json → ...ation_lm/data/gpt-j/100/dataset_info.json
diff --git a/...augmentation_lm/data/gpt-j/100/state.json → ...augmentation_lm/data/gpt-j/100/state.json b/...augmentation_lm/data/gpt-j/100/state.json → ...augmentation_lm/data/gpt-j/100/state.json
diff --git a/...mentation_lm/data/gpt-j/200/dataset.arrow → ...mentation_lm/data/gpt-j/200/dataset.arrow b/...mentation_lm/data/gpt-j/200/dataset.arrow → ...mentation_lm/data/gpt-j/200/dataset.arrow
diff --git a/...ation_lm/data/gpt-j/200/dataset_info.json → ...ation_lm/data/gpt-j/200/dataset_info.json b/...ation_lm/data/gpt-j/200/dataset_info.json → ...ation_lm/data/gpt-j/200/dataset_info.json
diff --git a/...augmentation_lm/data/gpt-j/200/state.json → ...augmentation_lm/data/gpt-j/200/state.json b/...augmentation_lm/data/gpt-j/200/state.json → ...augmentation_lm/data/gpt-j/200/state.json
diff --git a/...gmentation_lm/data/gpt-j/50/dataset.arrow → ...gmentation_lm/data/gpt-j/50/dataset.arrow b/...gmentation_lm/data/gpt-j/50/dataset.arrow → ...gmentation_lm/data/gpt-j/50/dataset.arrow
diff --git a/...tation_lm/data/gpt-j/50/dataset_info.json → ...tation_lm/data/gpt-j/50/dataset_info.json b/...tation_lm/data/gpt-j/50/dataset_info.json → ...tation_lm/data/gpt-j/50/dataset_info.json
diff --git a/..._augmentation_lm/data/gpt-j/50/state.json → nlp/augmentation_lm/data/gpt-j/50/state.json b/..._augmentation_lm/data/gpt-j/50/state.json → nlp/augmentation_lm/data/gpt-j/50/state.json
diff --git a/...entation_lm/data/gpt-neo/10/dataset.arrow → ...entation_lm/data/gpt-neo/10/dataset.arrow b/...entation_lm/data/gpt-neo/10/dataset.arrow → ...entation_lm/data/gpt-neo/10/dataset.arrow
diff --git a/...tion_lm/data/gpt-neo/10/dataset_info.json → ...tion_lm/data/gpt-neo/10/dataset_info.json b/...tion_lm/data/gpt-neo/10/dataset_info.json → ...tion_lm/data/gpt-neo/10/dataset_info.json
diff --git a/...ugmentation_lm/data/gpt-neo/10/state.json → ...ugmentation_lm/data/gpt-neo/10/state.json b/...ugmentation_lm/data/gpt-neo/10/state.json → ...ugmentation_lm/data/gpt-neo/10/state.json
diff --git a/...ntation_lm/data/gpt-neo/100/dataset.arrow → ...ntation_lm/data/gpt-neo/100/dataset.arrow b/...ntation_lm/data/gpt-neo/100/dataset.arrow → ...ntation_lm/data/gpt-neo/100/dataset.arrow
diff --git a/...ion_lm/data/gpt-neo/100/dataset_info.json → ...ion_lm/data/gpt-neo/100/dataset_info.json b/...ion_lm/data/gpt-neo/100/dataset_info.json → ...ion_lm/data/gpt-neo/100/dataset_info.json
diff --git a/...gmentation_lm/data/gpt-neo/100/state.json → ...gmentation_lm/data/gpt-neo/100/state.json b/...gmentation_lm/data/gpt-neo/100/state.json → ...gmentation_lm/data/gpt-neo/100/state.json
diff --git a/...ntation_lm/data/gpt-neo/200/dataset.arrow → ...ntation_lm/data/gpt-neo/200/dataset.arrow b/...ntation_lm/data/gpt-neo/200/dataset.arrow → ...ntation_lm/data/gpt-neo/200/dataset.arrow
diff --git a/...ion_lm/data/gpt-neo/200/dataset_info.json → ...ion_lm/data/gpt-neo/200/dataset_info.json b/...ion_lm/data/gpt-neo/200/dataset_info.json → ...ion_lm/data/gpt-neo/200/dataset_info.json
diff --git a/...gmentation_lm/data/gpt-neo/200/state.json → ...gmentation_lm/data/gpt-neo/200/state.json b/...gmentation_lm/data/gpt-neo/200/state.json → ...gmentation_lm/data/gpt-neo/200/state.json
diff --git a/...entation_lm/data/gpt-neo/50/dataset.arrow → ...entation_lm/data/gpt-neo/50/dataset.arrow b/...entation_lm/data/gpt-neo/50/dataset.arrow → ...entation_lm/data/gpt-neo/50/dataset.arrow
diff --git a/...tion_lm/data/gpt-neo/50/dataset_info.json → ...tion_lm/data/gpt-neo/50/dataset_info.json b/...tion_lm/data/gpt-neo/50/dataset_info.json → ...tion_lm/data/gpt-neo/50/dataset_info.json
diff --git a/...ugmentation_lm/data/gpt-neo/50/state.json → ...ugmentation_lm/data/gpt-neo/50/state.json b/...ugmentation_lm/data/gpt-neo/50/state.json → ...ugmentation_lm/data/gpt-neo/50/state.json