Accelerating Training with Neuron Interaction and Nowcasting Networks

Accelerating Training with Neuron Interaction and Nowcasting Networks
Boris Knyazev, Abhinav Moudgil, Guillaume Lajoie, Eugene Belilovsky, Simon Lacoste-Julien

marktechpost

Intro

Neuron interaction and Nowcasting (NiNo) model

We introduce the NiNo model predicting future (nowcasting) parameters by learning neuron interaction in vision and language tasks. We feed c (c=5 by default) past parameter states as input to NiNo and nowcast K states leveraging neural graph structure and graph neural networks. For a new optimization task, NiNo is applied rarely over time: only once per 1k steps of Adam (or another base optimizer).

Using NiNo with Adam

Adam without and with nowcasting using our NiNo model on a language task that NiNo has not seen during its training.

Requirements

The experiments from our paper can be run using a single GPU with <= 80GB of memory.

python >= 3.8
pytorch >= 2.0
torch_geometric
transformers
datasets
other optional dependencies (networkx, pydot)

Updates

Pretrained NiNo models

We provide the checkpoint for our best performing NiNo model at checkpoints/nino.pt.

Usage

Example

Training loop with NiNo for some language model:

import torch
import torch.nn.functional as F
from transformers import AutoModelForCausalLM
from optim import NiNo

model = AutoModelForCausalLM.from_config(...)  # some model

# NiNo is implemented as a wrapper around the base optimizer
# any optimizer other than Adam should also be possible to use with NiNo
opt = NiNo(base_opt=torch.optim.AdamW(model.parameters(), lr=1e-3, weight_decay=1e-2),
           ckpt='checkpoints/nino.pt',
           message_passing_device=None,  # can use 'cpu' when NiNo is applied to larger models 
           model=model,
           period=1000,
           max_steps=10000)
for step in range(10000):
    if opt.need_grads:  # True/False based on the step number and period
        opt.zero_grad()  # zero out gradients
        data, targets = ...  # get some batch of data
        # base optimizer step (majority of the time)
        outputs = model(data)  # forward pass
        loss = F.cross_entropy(outputs, targets)  # compute some loss
        loss.backward()  # only compute gradients for the base optimizer            
    opt.step()  # base_opt step or nowcast params every 1000 steps using NiNo    
    ...

Reproducing the results from our paper

Optimization in vision tasks

Evaluate on all vision tasks:

for task in FM-16 C10-16 FM-32 C10-32 C100-32; 
do for seed in $(seq 1000 1000 10000); 
do python train_vision.py --task $task --seed $seed | tee -a results.log; done; done

To evaluate without the NiNo model, run with --nino_ckpt none. You should get the results similar Table 1 and 2 in the paper.

Use --verbose 2 for graph visualization and more detailed output.

Optimization in language tasks

Single seed training on the Wiki/3-64 task:

python train_lm.py --dataset_name wikitext --dataset_config_name wikitext-103-raw-v1 --num_train_epochs 4 --layers 3 --dim 64 --heads 4

For LM1B tasks, use --dataset_name lm1b --dataset_config_name plain_text.

Contributing

Pull requests and github issues are welcome. For major changes, please open an issue first to discuss what you would like to change.

LICENSE

MIT, see the LICENSE file.

Citation

@misc{knyazev2024accelerating,
  title={Accelerating Training with Neuron Interaction and Nowcasting Networks}, 
  author={Boris Knyazev and Abhinav Moudgil and Guillaume Lajoie and Eugene Belilovsky and Simon Lacoste-Julien},
  year={2024},
  eprint={2409.04434},
  archivePrefix={arXiv},
  url={https://arxiv.org/abs/2409.04434}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
checkpoints		checkpoints
figs		figs
graph		graph
optim		optim
results		results
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
convert_nino_tmp.py		convert_nino_tmp.py
read_results.py		read_results.py
train_lm.py		train_lm.py
train_vision.py		train_vision.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Accelerating Training with Neuron Interaction and Nowcasting Networks

Intro

Requirements

Updates

Pretrained NiNo models

Usage

Example

Reproducing the results from our paper

Optimization in vision tasks

Optimization in language tasks

Contributing

LICENSE

Citation

About

Releases

Packages

Languages

License

SamsungSAILMontreal/nino

Folders and files

Latest commit

History

Repository files navigation

Accelerating Training with Neuron Interaction and Nowcasting Networks

Intro

Requirements

Updates

Pretrained NiNo models

Usage

Example

Reproducing the results from our paper

Optimization in vision tasks

Optimization in language tasks

Contributing

LICENSE

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages