Terms of Services (ToS) are legal agreements between users and service providers. In order for the user to consume any service they must accept the terms. However, since ToS documents are very verbose and use a very opaque jargon, users tend to acknowledge them without fully understanding the agreement. This can lead to the user signing obligations which they might not be willing to in reality, or might be exposed to unfair terms and practices. The proposed idea is to make user more informed about the unfairness of the clauses in ToS and also present the obligations imposed by it.
The contributions of this project to the earlier research are:
- An extensive comparison of Transformer based embeddings (RoBERTa and XLNet) with various deep learning models.
- Considering and identifying user obligated clauses as critical information in addition to unfair clauses.
Project Demo: link
ToS dataset created as a part of CLAUDETTE experimental study.
Topic | File location in Repository |
---|---|
Fairness Classification | src |
Obligation Detection | Obligation_Detection |
GRU with RoBERTa Embeddings Model Weights | model |
BERT Double | fairness_classification/bert_double |
Legal BERT | fairness_classification/legal_bert |
Custom Legal BERT | fairness_classification/custom-legal-bert |
SVM Models | fairness_classification/SVM |
Embeddings Generation | fairness_classification/input_feature_generation |
RNN Based Models | fairness_classification/rnn_models |
Steps to execute
# install all necessary packages
pip install -r requirements.txt
# execute the fairness classification code
python3 src/main.py ./../examples/9gag.txt # sample clauses files can be found in src/examples
# execute the obligation detection code
python3 Obligation_Detection/Obligations_v2.py input.txt
When does pretraining help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset - Github Code
CLAUDETTE: an Automated Detector of Potentially Unfair Clauses in Online Terms of Service
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
RoBERTa: A Robustly Optimized BERT Pretraining Approach
XLNet: Generalized autoregressive pretraining for language understanding
Named Entity Recognition on legal text for secondary dataset
The cost of reading privacy policies
Aditya Ashok Dave ([email protected])
Akanksha Sanjay Nogaja ([email protected])
Lavina Lavakumar Agarwal ([email protected])
Shreya Venkatesh Prabhu ([email protected])
Sai Sree Yoshitha Akunuri ([email protected])