Graph Convolutional Network for Bible book classification

Overview

The text-based graph convolutional network (GCN) model is an interesting and novel state-of-the-art semi-supervised learning concept that is proposed recently, which is able to very accurately predict the labels of some unknown textual data given related known labeled textual data. It does so by embedding the entire corpus into a graph with documents and words as nodes, with each document-word & word-word edges having some predetermined weights based on their relationships with each other (eg. Tf-idf). A GCN is then trained on this graph with documents nodes that have known labels, and the trained GCN model is then used to infer the labels of unlabelled documents.

We implement text-GCN here using the Holy Bible as the corpus. The Holy Bible consists of 66 Books (Genesis, Exodus, etc) and 1189 Chapters. The goal here is to train a language model that is able to correctly classify the Book that some unlabelled Chapters belong to, given the labels of other Chapters. (Since we actually do know the exact labels of all Chapters, we intentionally mask the labels of some 10-20 % of the Chapters, which will be used as test set during model inference to measure the model accuracy) To do that, the language model needs to be able to distinguish between the contexts associated with the various Books (eg. Book of Genesis talks more about Adam & Eve while Book of Ecclesiastes talks about the life of King Solomon). The good results of the text-GCN model show that the graph structure is able to capture such context nicely, where the document (Chapter)-word edges encode the context within Chapters, while the word-word edges encode the relative context between Chapters.

Do consider sponsoring to support my work!

Dataset

The Bible text data used here (BBE version) is obtained courtesy of https://github.com/scrollmapper/bible_databases.

Implementation

Implementation follows the paper on Text-based Graph Convolutional Network (https://arxiv.org/abs/1809.05679)

For more details on the scripts & implementation, see this article: https://towardsdatascience.com/text-based-graph-convolutional-network-for-semi-supervised-bible-book-classification-c71f6f61ff0f

Requirements

Requirements: Python (3.6+), networkx (2.1), torch (1.0.0), torchvision (0.2.1), standard Python libraries

generate_train_test_datasets.py � script containing functions to compute the edges weights, build and save the graph
models.py � script containing the GCN model
text_GCN.py � Main program to build the dataset and graph, construct the GCN and trains the model
evaluate_results.py - evaluate the results and misclassified labels
Data folder containing the Bible data (t_bbe.csv)

How to use

To start, clone the repo, then run text_GCN.py (-h for additional arguments)

Additional resources

Implement GCN (and more) on your own dataset (https://github.com/plkmo/NLP_Toolkit)

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.github		.github
data		data
.gitignore		.gitignore
README.md		README.md
evaluate_results.py		evaluate_results.py
generate_train_test_datasets.py		generate_train_test_datasets.py
models.py		models.py
text_GCN.py		text_GCN.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graph Convolutional Network for Bible book classification

Overview

Dataset

Implementation

Requirements

Contents

How to use

Additional resources

About

Releases

Sponsor this project

Packages

Contributors 2

Languages

plkmo/Bible_Text_GCN

Folders and files

Latest commit

History

Repository files navigation

Graph Convolutional Network for Bible book classification

Overview

Dataset

Implementation

Requirements

Contents

How to use

Additional resources

About

Topics

Resources

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Contributors 2

Languages

Packages