Sentence-Simplification ( Python3.x implementation)

Course Project for Natural Language Processing (CSE 472)

Team Linguists

Problem Statement

Complex sentences create difficulties in Machine translation. It has been noticed that whenever there are more than two clauses (verbs), the translated Hindi sentence shows implications in fluency and faithfulness.This project deals with :

Identifying English sentences with more than two clauses.
Additionally marking the clause boundaries.
Suggest strategy for breaking sentences with more than two clauses into multiple sentences, each sentence having no more than two clauses each.

Dataset Statistics

Dataset Used: English USD Dataset( in CoNLL-U format) Trees: 12543 Word Count: 204607 Token Count: 204607 Dependency Relations: 378 ( 341=POS tag based, 37=(category, value) feature pairs) Sentences with more than 2 clauses: 3604

Code Components

dep_parser.py Creates the following files: ..* A treebank for the sentences with more than two clauses, with the name clause-treebank.conllu ..* A metadata file (.pkl)
complete_sentence.py Splits the sentences according to clause and attempts to form complete sentences.

Running the code

Clone the github repo
Download the stanford dependency parserfor python from https://nlp.stanford.edu/software/lex-parser.shtml
First run -> python3 dep_parser.py {path to the dataset} {path to the unzipped stanford parser folder}
Then run this -> python3 complete_sentence.py

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
src		src
README.md		README.md
project_report.pdf		project_report.pdf
readme.txt		readme.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentence-Simplification ( Python3.x implementation)

Course Project for Natural Language Processing (CSE 472)

Team Linguists

Problem Statement

Dataset Statistics

Code Components

Running the code

About

Releases

Packages

Languages

shreyaUp/Sentence-Simplification

Folders and files

Latest commit

History

Repository files navigation

Sentence-Simplification ( Python3.x implementation)

Course Project for Natural Language Processing (CSE 472)

Team Linguists

Problem Statement

Dataset Statistics

Code Components

Running the code

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages