Fake-News-Detection-Dataset

한국어 가짜 뉴스 탐지 데이터셋에 대한 baseline 실험 결과

Enviroments

python 3.6.10

torch==1.8.0a0+17f8c32
konlpy==0.6.0
einops
gluonnlp==0.10.0
wandb==0.12.18
transformers==4.18.0
git+https://[email protected]/SKTBrain/KoBERT.git@master

Computer Resources

CPU: i7-9800X
GPU: RTX 2080Ti x 2
RAM: 64GB
SSD: 2TB x 2
OS: ubuntu 18.04

1. docker image

docker hub를 통해서 docker image pull 하는 방법

docker pull dsbalab/fake_news

Dockerfile을 통해서 docker image 설치 방법

docker image 생성 시 word-embedding와 Part1과 Part2에 대한 checkpoints 가 함께 생성

cd ./docker
docker build -t $image_name .

2. Korean word-embeddings

본 프로젝트에서는 한국어 word embedding 모델로 Mecab을 사용

한국어 임베딩 [ github ]
word-embeddings [ download ]

Directory Tree

Fake-News-Detection-Dataset
.
├── data
│   ├── Part1
│   │   ├── train
│   │   │   ├── Clickbait_Auto
│   │   │   │   ├── EC
│   │   │   │   ├── ET
│   │   │   │   ├── GB
│   │   │   │   ├── IS
│   │   │   │   ├── LC
│   │   │   │   ├── PO
│   │   │   │   └── SO
│   │   │   ├── Clickbait_Direct
│   │   │   └── NonClickbait_Auto
│   │   ├── validation
│   │   └── train
│   └── Part2
│   │   ├── train
│   │   ├── validation
│   │   └── train
├── docker
├── docs
├── LICENSE
├── part1_title
├── part2_context
├── README.md
└── requirements.txt

Data

./data에는 다음과 같은 데이터 폴더 구조로 구성되어 있음

Part 1: 제목 - 본문 일치성 [ Part1 ]

Baseline Models

HAND¹
FNDNet²
BERT³

Part 2: 주제 분리 탐지 [ Part2 ]

Baseline Models

BERT⁴
KoBERTSeg⁵

Reference

Jeong, H. (2021). Hierarchical Attention Networks for Fake News Detection (Doctoral dissertation, The Florida State University). ↩
Kaliyar, R. K., Goswami, A., Narang, P., & Sinha, S. (2020). FNDNet–a deep convolutional neural network for fake news detection. Cognitive Systems Research, 61, 32-44. ↩
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT (1), 4171-4186 ↩
전재민, 최우용, 최수정, & 박세영. (2019). BTS: 한국어 BERT 를 사용한 텍스트 세그멘테이션. 한국정보과학회 학술발표논문집, 413-415. ↩
소규성, 이윤승, 정의석, & 강필성. (2022). KoBERTSEG: 한국어 BERT 를 이용한 Local Context 기반 주제 분리 방법론. 대한산업공학회지, 48(2), 235-248. ↩

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fake-News-Detection-Dataset

Enviroments

Directory Tree

Data

Part 1: 제목 - 본문 일치성 [ Part1 ]

Baseline Models

Part 2: 주제 분리 탐지 [ Part2 ]

Baseline Models

Reference

About

Releases 2

Packages

Contributors 5

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 321 Commits
clickbait_direct		clickbait_direct
docker		docker
docs		docs
part1_title		part1_title
part2_context		part2_context
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

TooTouch/Fake-News-Detection-Dataset

Folders and files

Latest commit

History

Repository files navigation

Fake-News-Detection-Dataset

Enviroments

Directory Tree

Data

Part 1: 제목 - 본문 일치성 [ Part1 ]

Baseline Models

Part 2: 주제 분리 탐지 [ Part2 ]

Baseline Models

Reference

Footnotes

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 5

Languages

Packages