This repository contains data files introduced in our paper "Reconnaissance de défigements dans des tweets en français par similarité d'alignements textuels" (TALN 2023). 2 files can be found :
- seeds.json : a list of every expressions used to collect tweets with Twitter's API.
- tweets.json : a list containing the ids of every tweets we used for our analysis.
For an up-to-date version of this dataset, see the FrUIT corpus. For an up-to-date version of the scripts used to extract both multiwords and unfrozen multiwords expressions, see the ASMR repository.