Skip to content

Soobin's Data Science term project comparing learning curve between Koreans learning English, English speakers' learning Korean

License

Notifications You must be signed in to change notification settings

Data-Sci-2022/Learning-Curve-Comparison

Repository files navigation

Final report for this profect

Developmental process of L2 - English and Korean

Brief description for the project

This project is to compare those whose first language is English learning Korean with those whose first language is Korean learning English.

This project comprises two parts; intergroup comparison and intragroup comparison. For the intergroup comparison, the L2 proficiency of the same level in each group is the main focus. For the intragroup comparison, the anslysis focuse on how much the proficiency increases between each level.

Brief description of the data.

Korean Learner Corpus

This is the corpus from Dr. Park's github repository:

https://github.com/jungyeul/korean-learner-corpus/tree/main/data

This corpus contains participants' unique ID, nationality, gender, the topic of the text, raw text, POS tagged morphemes, proficiency level of Korean, and their essay score.

Since the aim of this project is to compare English-speaking participants learning Korean to Korean speaking participants learning English, the data is sorted out by the nationality of the participants. Details are delineated in the final_report.md.

PELIC

This is the learner corpus data collected in the University of Pittsburgh English Language Institute, and it is accessible through github:

https://github.com/ELI-Data-Mining-Group/PELIC-dataset

It is a longitudinal corpus, which underlines its significance of giving "greater opportunity for tracking development in a natural classroom setting". (PELIC readme.md, 1. overview)

This corpus contains 1,177 participants, and the total number of token is 4,250,703. In order to match the purpose of this project, the data of those whose mother tongue is Korean is sorted out. Details are delineated in the final_report.md

About

Soobin's Data Science term project comparing learning curve between Koreans learning English, English speakers' learning Korean

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published