This repository contains data and code for our ASE 2022 paper "Code comment generation based on graph neural network enhanced transformer model for code understanding in open-source software ecosystems". We provide the log files while training and evaluating the models in ours-log-files directory. You can find the results and examples that we provided in our paper.
You can parse the original dataset to the graph format by Parsers.
You can also get the processed data from google drive.
$ cd scripts/DATASET_NAME
where, choices for DATASET_NAME are ["java","python"]
To train/evealuate the GTrans model, run:
$ bash GTrans.sh 0 code2jdoc
where, 0 means GPU_ID.
- If GPU_ID is set to -1, CPU will be used.
- If GPU_ID is set to one specific number, only one GPU will be used.
- If GPU_ID is set to multiple numbers (e.g., 0,1,2), then parallel computing will be used.
While training and evaluating the models, a list of files are generated inside a DATASET_NAME-tmp
directory. The files are as follows.
- MODEL_NAME.mdl
- Model file containing the parameters of the best model.
- MODEL_NAME.mdl.checkpoint
- A model checkpoint, in case if we need to restart the training.
- MODEL_NAME.txt
- Log file for training.
- MODEL_NAME.json
- The predictions and gold references are dumped during validation.
- MODEL_NAME_test.txt
- Log file for evaluation (greedy).
- MODEL_NAME_test.json
- The predictions and gold references are dumped during evaluation (greedy).
- MODEL_NAME_beam.txt
- Log file for evaluation (beam).
- MODEL_NAME_beam.json
- The predictions and gold references are dumped during evaluation (beam).
We borrowed and modified code from NeuralCodeSum, ggnn.pytorch. We would like to expresse our gratitdue for the authors of these repositeries.