Releases: DFKI-NLP/thermostat
Importable function for explaining custom datasets; Updated dependencies
A few minor improvements and fixes were needed after over a year of radio silence on my part.
The new version of thermostat-datasets includes
Importable function for explaining custom datasets
from thermostat.explain import explain_custom_data
takes a .jsonnet config file and will run the same code that produced the already existing Thermostat datasets. It should work on most Hugging Face datasets. In most cases, you need to specify text_field
in dataset
of your config.
Updated scikit-learn and numpy dependencies
Thanks to @g8a9 for fixing the issue with scikit-learn
!
Give ferret a try if you don't know it yet and are interested in explainability benchmarks! 😄
Inseq library
Lastly, I want to promote the exciting new library for interpreting sequence generation models by Gabriele Sarti, Ludwig Sickert, Oskar van der Wal, Malvina Nissim, Arianna Bisazza and myself.
Inseq lets you attribute entire datasets and visualize attributions in the form of matrices to explain the behavior of state-of-the-art LLMs and other sequence generation models.
Since this library is much more recent and has more exciting functionalities, I will probably not do much maintenance with Thermostat in the future and instead focus on improving Inseq.
Inseq will be presented at ACL 2023 alongside my new Saliency Map Verbalization paper. Hoping to see you in Toronto! 🍁
LayerDeepLiftSHAP (lds) & LayerGradientSHAP (lgs) explanations
1.0.2.1 of thermostat-datasets
is out now via PyPI!
Thank you very much for the overwhelming response to this project!
Thanks to @aj280192 there are now two new explainers, LayerDeepLiftShap and LayerGradientShap, from Captum that have been applied to all four datasets, IMDb, MNLI, XNLI and AG News.
Unfortunately, XLNet explanations could not be produced due to an issue with Captum, but all four other models, ALBERT, BERT, ELECTRA and RoBERTa are available for both explainers.
A minor issue with the import of tqdm has also been fixed.
EMNLP 2021 System Demonstrations Camera-ready
- All 90 configurations promised in the EMNLP 2021 Demo Track submission are now available
- Pre-print of the paper has been published on arXiv
- First version of a locally hosted dataset explorer on the basis of Hugging Face's datasets explorer powered by Streamlit is available
Note: I've been experiencing issues with the .render()
function in Google Colab that displays heatmaps using displaCy. The next update will include an alternative engine such as ipymarkup.
Initial release
- Thermostat is now installable via PyPI :
pip install thermostat-datasets
- 86 feature attribution maps are downloadable currently. The largest dataset "IMDb" with 25k instances is 100% complete with five different fine-tuned models (ALBERT, BERT, ELECTRA, RoBERTa, XLNet) and five different explainers (Gradient x Activation, Integrated Gradients, LIME, Occlusion, Shapley Value Sampling) available.
- Convenience functions such as indexing and visualizing as a heatmap using displaCy are available