MagNet: A Neural Network for Directed Graphs
@article{zhang2021magnet,
title={Magnet: A neural network for directed graphs},
author={Zhang, Xitong and He, Yixuan and Brugnone, Nathan and Perlmutter, Michael and Hirn, Matthew},
journal={Advances in Neural Information Processing Systems},
volume={34},
pages={27003--27015},
year={2021}
}
MagNet is included in the package: PyTorch Geometric Signed Directed
@article{he2022pytorch,
title={{PyTorch Geometric Signed Directed: A Software Package on Graph Neural Networks for Signed and Directed Graphs}},
author={He, Yixuan and Zhang, Xitong and Huang, Junjie and Rozemberczki, Benedek and Cucuringu, Mihai and Reinert, Gesine},
journal={arXiv preprint arXiv:2202.10793},
year={2022}
}
The project has been tested on the following environment specification:
- Ubuntu 18.04.5 LTS (Other x86_64 based Linux distributions should also be fine, such as Fedora 32)
- Nvidia Graphic Card (NVIDIA GeForce RTX 2080 with driver version 440.36, and NVIDIA RTX 8000) and CPU (Intel Core i7-10700 CPU @ 2.90GHz)
- Python 3.6.13 (and Python 3.6.12)
- CUDA 10.2 (and CUDA 9.2)
- Pytorch 1.8.0 (built against CUDA 10.2) and Python 1.6.0 (built against CUDA 9.2)
- Other libraries and python packages (See below)
The implementation of MagNet without baselines: SimpleMagNet
There are two installation methods listed below. Please try to install the packages manually if the first method fails.
You should handle (1),(2) yourself. For (3), (4), (5) and (6), we provide a list of steps to install them.
We provide two examples of envionmental setup, one with CUDA 10.2 and GPU, the other with CPU.
Following steps assume you've done with (1) and (2).
-
Install conda. Both Miniconda and Anaconda are OK.
-
Create an environment and install python packages (GPU):
conda env create -f environment_GPU.yml
- Create an environment and install python packages (CPU):
conda env create -f environment_CPU.yml
The codebase is implemented in Python 3.6.12. package versions used for development are below.
networkx 2.5
numpy 1.19.2
pandas 1.1.4
scipy 1.5.4
argparse 1.1.0
sklearn 0.23.2
stellargraph 1.2.1 (for link direction prediction: conda install -c stellargraph stellargraph)
torch 1.8.0
torch-scatter 2.0.5
torch-geometric 1.6.3 (follow https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html)
matplotlib 3.3.4 (for generating plots and results)
When installation is done, you could check you enviroment via:
cd execution
bash setup_test.sh
- ./execution/ stores files that can be executed to generate outputs. For vast number of experiments, we use parallel (https://www.gnu.org/software/parallel/, can be downloaded in command line and make it executable via:
wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel
chmod 755 ./parallel
- ./joblog/ stores job logs from parallel. You might need to create it by
mkdir joblog
- ./Output/ stores raw outputs (ignored by Git) from parallel. You might need to create it by
mkdir Output
-
./dataset/ stores processed data sets.
-
./src/ stores files to train various models, utils and metrics.
-
./result_arrays/ stores results for different data sets. Each data set has a separate subfolder.
-
./result_anlysis/ stores notebooks for generating result plots or tables.
-
./logs/ stores trained models and logs, as well as predicted clusters (optional). When you are in debug mode (see below), your logs will be stored in ./debug_logs/ folder.
MagNet provides the following command line arguments, which can be viewed in the ./src/param_parser.py.
See file ./src/sparse_Magnet.py for example.
--p_q FLOAT Direction strength, from 0.5 to 1. Default is 0.95.
--p_inter FLOAT Inter-cluster edge probabilities. Default is 0.1.
--seed INT Random seed for training testing
split/random graph generation. Default is 0.
See file ./src/sparse_Magnet.py for example.
--epochs INT Number of (maximum) training epochs. Default is 3000.
--q FLOAT q value for the phase matrix. Default is 0.
--new_setting, -NS BOOL Whether not to load best settings. Default is False.
--num_filter INT Numer of filters. Default is 1.
--layer INT Numer of layers. Default is 2.
--lr FLOAT Learning rate. Default is 0.005.
--l2 FLOAT Weight decay (L2 loss on parameters). Default is 5^-4.
--dropout FLOAT Dropout rate (1 - keep probability). Default is 0.0.
--debug, -D BOOL Debug with minimal training setting, not to get results. Default is False.
--dataset STR Data set to consider. Default is 'WebKB/Cornell'.
See file ./src/Edge_sparseMagnet.py for example.
--epochs INT Number of (maximum) training epochs. Default is 1500.
--num_class_link INT Number of classes for link direction prediction(2 or 3). Default is 2.
--task INT Task to conduct: 1 is existence prediction while 2 is
link (direction) prediction, could be 2 or 3 classes. Default is 2.
First, get into the ./execution/ folder:
cd execution
To reproduce synthetic results on node classification on cyclic meta-graph structure.
bash cyclic.sh
To reproduce results on link prediction.
bash link_prediction.sh
Other execution files are similar to run.
Note that if you are operating on CPU, you may delete the commands ``CUDA_VISIBLE_DEVICES=xx". You can also set you own number of parallel jobs, not necessarily following the j numbers in the .sh files.
First, get into the ./src/ folder:
cd src
Then, below are various options to try:
Creating a MagNet model of the default setting.
python ./sparse_Magnet.py
Creating an APPNP model for cyclic meta-graph synthetic data with seed 10 and do not load "best setting".
python ./APPNP.py --dataset syn/cyclic --seed 10 -NS
Creating a model for Telegram data set on DGCN with some custom learning rate and epoch number.
python ./Sym_DiGCN.py --dataset telegram --lr 0.001 --epochs 300
Creating a model on Digraph with inception block for link prediction on a binary classification problem of direction prediction.
python ./Digraph.py --method_name DiG_ib --task 2 --num_class_link 2
cd src/utils
To generate cyclic DSBM graphs:
python generate_cyclic.py
To generate ordered DSBM graphs:
python generate_complete.py
python generate_syn2_3.py
To generate noisy cyclic DSBM graphs:
python generate_fill.py
-
Notations. Sym_DiGCN corresponds to DGCN. Digraph corresponids to two variants, specified by method names "DiG" and "DiG_ib", respectively.
-
When tuning hyperparameters on synthetic data for the node classification task, we need "-NS" to ignore the auto loading of saved dictionaries of best setting. When we are running multiple experiments on different generated networks under the same synthetic setting, we do not put "-NS" into the command but we only need to specify the data we use and the p_q values etc.. The .py files will automatically load the best setting for us. To change the best setting, you could create or update the dictionaries, or simply add "-NS" and specify your setting of choice.