A Tale of Two Linkings: Dynamically Gating between Schema Linking and Structural Linking for Text-to-SQL Parsing
This repo implements the dynamic gating mechanism described in our COLING 2020 paper on top of a graph neural network-based Text-to-SQL parser. The implementation is built on top of this repository.
-
Install pytorch version 1.5.0 that fits your CUDA version
-
Install the rest of required packages
pip install -r requirements.txt
-
Run this command to install NLTK punkt.
python -c "import nltk; nltk.download('punkt')"
-
Download the dataset from the official Spider dataset website
-
Edit the config file
train_configs/defaults.jsonnet
to update the location of the dataset:
local dataset_path = "dataset/";
- Before preprocessing the dataset, modify two lines in allennlp lib, to replace
self._tokenizer
with_tokenizer
. This change greatly reduces the size of cache data and memory usage. Also, change the number of processes indataset_readers/spider.py
according to your machine setting.
Run the following command to train a new model with or without the dynamic gating mechanism.
python run.py [--gated]
First time loading of the dataset might take a while (a few hours) since the model first loads values from SQL tables and calculates similarity features with the relevant question. It will then be cached for subsequent runs.
Run the following command to generate model predictions.
python run.py <path> --mode eval
The predictions can be further evaluated by the official evaluation scripts of the Spider dataset.
Ablations and alternative approach studies can be performed by the following command.
python run.py <path> --mode train --gated --ablation <study_name>
Refer to AllenNLP, use run.py
for debugging.
@inproceedings{chen2020tale,
title={A Tale of Two Linkings: Dynamically Gating between Schema Linking and Structural Linking for Text-to-SQL Parsing},
author={Sanxing Chen and Aidan San and Xiaodong Liu and Yangfeng Ji},
booktitle = "Proceedings of the 28th International Conference on Computational Linguistics",
year={2020}
}