This repository contains the files and directories, as well as the needed data, used for our final project for NLP and advanced machine learning courses, in Tel Aviv University, spring semester 2017.
This is a short description of the files and directories. Note that not all are listed here. more elaborate description is to be found in the documentation and in the project assignment paper.
Directory containing all files and data with the impementation of the model.
.py files:
- seq2seq.py : this is the only runnable file in the directory. It contains the tf implementation of the model architecture, and the functions for training and evaluating the model.
- beam_search.py : our implementation of epsilon greedy randomized beam search.
- beam_boosting.py : functions used for boosting the baseline performance of the beam search
- partial_program.py : contains the class PartialProgram that is used to wrap the programs in the beam.
- hyper_params : constants and boolean properties of the model. can be changed between runs.
data directories:
- learnedWeightsPreTrain : weights learned from running the pre-training using generated sentences and annotations of certain common patterns.
- learnedWeightsWeaklySupervised : weights learned using the weakly supervised model (learning from denotations). The current weights in the dir are those achieving the beat results so far on the dev and test data sets.
- running_logs : directory for saving logs with results of running training or testing of the model. Right now contains the results by sentence of running our best model on the dev and test sets.
- word2vec : word embeddings used by the model and the code used for creating them.
Contains most of the data needed for the project, including the original data set and other data used or generated by us.
- nlvr-data : the original CNLVR data set
- logical forms : data for using the logical forms in the model
- parsed sentences : contains patterns of sentences with their annotations, as well as the dataset for pre-train that was generated based on them.
- sentence processing : data needed for (or aquired through) pre-processing of the sentences.
- data_manager.py : loads the needed data, processes it and return it as an object that is convenient to work with.
- sentence processing.py : used by the data manager to preprocess the sentences in the data, in order to reduce noise (e.g. generated by spelling errors) in the data.
- logical forms.py: the code for the functions that are run when executing a logical form on a structured representation of an image.
- structured_rep.py: classes representing the structured representation of an image in the data set.