Name		Name	Last commit message	Last commit date
parent directory ..
2021_01_21_multilingual_sentence_embeddings		2021_01_21_multilingual_sentence_embeddings
2021_02_01_spacy_3_projects		2021_02_01_spacy_3_projects
2021_02_26_compact_transformers		2021_02_26_compact_transformers
2021_03_18_pke_keyword_extraction		2021_03_18_pke_keyword_extraction
2021_04_22_shap_for_huggingface_transformers		2021_04_22_shap_for_huggingface_transformers
2021_06_18_data_augmentation		2021_06_18_data_augmentation
2021_06_29_long_range_transformers		2021_06_29_long_range_transformers
2021_09_10_neural_keyword_extraction		2021_09_10_neural_keyword_extraction
2021_10_12_huggingface_optimum		2021_10_12_huggingface_optimum
2021_11_25_augmentation_lm		2021_11_25_augmentation_lm
gender_debiasing_cda		gender_debiasing_cda
gpt2_quantization_onnxruntime		gpt2_quantization_onnxruntime
README.md		README.md

README.md

ML6 NLP Quick Tips

Current content:

Multilingual Sentence Embeddings (21/01/2021): Gives an overview of various current multilingual sentence embedding techniques and tools, and how they compare given various sequence lengths.
Spacy 3.0 (01/02/2021): Spacy 3.0 has just been released and in this tip, we'll have a look at some of the new features. We'll be training a German NER model and streamline the end-to-end pipeline using the brand new spaCy projects!
Compact transformers (26/02/2021): Bigger isn't always better. In this tip we look at some compact BERT-based models that provide a nice balance between computational resources and model accuracy.
Keyword Extraction with pke (18/03/2021): The KEYNG (read king) is dead, long live the KEYNG! In this tip we look at pke, an alternative to Gensim for keyword extraction.
Explainable transformers using SHAP (22/04/2021): BERT, explain yourself! 📖 Up until recently language model predictions have lacked transparency. In this tip we look at SHAP, a way to explain your latest transformer based models.
Transformer-based Data Augmentation (18/06/2021): Ever struggled with having a limited non-English NLP dataset for a project? Fear not, data augmentation to the rescue ⛑️ In this week's tip, we look at backtranslation 🔀 and contextual word embedding insertions as data augmentation techniques for multilingual NLP.
Long range transformers (14/07/2021): Beyond and above the 512! 🏅 In this week's tip, we look at novel long range transformer architectures and compare them against the well-known RoBERTa model.
Neural Keyword Extraction (10/09/2021): Neural Keyword Extraction 🧠 In this week's tip, we look at neural keyword extraction methods and how they compare to classical methods.
HuggingFace Optimum (12/10/2021): HuggingFace Optimum Quantization ✂️ In this week's tip, we take a look at the new HuggingFace Optimum package to check out some model quantization techniques.
Text Augmentation using large-scale LMs and prompt engineering (25/11/2021): Typically, the more data we have, the better performance we can achieve 🤙. However, it is sometimes difficult and/or expensive to annotate a large amount of training data 😞. In this tip, we leverage three large-scale LMs (GPT-3, GPT-J and GPT-Neo) to generate very realistic samples from a very small dataset.
Gender debaising of datasets using CDA (25/01/2022): A lot of large language models are trained on webtext. However, this means that unintended biases can sneak into your model behaviour 😞. In this tip, we'll look at how to try and alleviate this bias using Counterfactual Data Augmentation ⚖️.
GPT2 Quantization using ONNXRuntime (19/04/2022): Large language models are costly to run, in this notebook we leverage ONNXRuntime to quantize and run our Dutch GPT2 model in a more efficient way 💰.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nlp

nlp

README.md

ML6 NLP Quick Tips

Files

nlp

Directory actions

More options

Directory actions

More options

Latest commit

History

nlp

Folders and files

parent directory

README.md

ML6 NLP Quick Tips