Transformers Neuron 0.5.0 Release Notes

Date: 2023-07-03

What's New?

[Experimental] Added support for GPT-NeoX models.
[Experimental] Added support for BLOOM models.
[Prototype] Added support for LLaMA models.
Added support for more flexible tensor-parallel configurations to GPT2, OPT, and BLOOM. The attention heads doesn't need to be evenly divisible by tp_degree anymore. (Note: The tp_degree still needs to satisfy the runtime topologies constraint for collective communication (i.e Allreduce). For more details on supported topologies, see: Tensor-parallelism support and https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/arch/neuron-features/collective-communication.html.)
Added multi-query / multi-group attention support for GPT2.

Bug Fixes

Fixed NaN issues for GPT2 model.
Fixed OPT/GPT-NeoX gibberish output
Resolved an issue where NaN values could be produced when the context_length argument was used in GPT2/OPT.

Known Issues and Limitations

Missing cache reorder support for beam search.

Transformers Neuron 0.4.0 Release Notes

Date: 2023-06-12

What's New?

Added int8 weight storage for GPT2 models.
Improved prompt context encoding performance for GPT2 models.
Improved collective communications performance for tp-degrees 4, 8, and 24 on Inf2.
Improved collective communications performance for tp-degrees 8 and 32 on Trn1.
Support for the --model-type=transformer-inference compiler flag for optimized decoder-only LLM inference.

Bug Fixes

Added padding to the GPT-J linear layer to correctly handle odd vocabulary sizes.
Issues where the HuggingFace generate method produces incorrect results when beam_search is used have been resolved.

Transformers Neuron 0.3.0 Release Notes

Date: 2023-04-28

What's New?

Added transformers-neuronx artifacts to PyPI repository.
Added support for the the Hugging Face generate()
Added support for model serialization, including model saving, loading, and weight swapping.
Added support for caching compiled artifacts.
Improved performance by removing unnecessary KV-cache tensor resetting.
Improved prompt context encoding performance (OPT, GPT2).

Bug Fixes

Incorrect GPT-J amp_callback import: Fixed the GPT-J demo now imports the correct amp_callback function.

Known Issues and Limitations

Incorrect output with HuggingFace beam_search: When the HuggingFace generate method is configured to use beam_search, this can produce incorrect results for certain configurations. It is recommended to use other generation methods such as sample or greedy_search.

Transformers Neuron 0.2.0 Release Notes

Date: 2023-02-24

What's New?

Added error handling to check if the desired generated sequence length is valid based on the model configuration
Improved logging:
- Reduced overly verbose compiler messages
- Disabled lazy module warnings

Bug Fixes

Updated src/transformers_neuronx/gptj/demo.py to correctly use the amp_callback function from transformers_neuronx.gpt2.demo
Extend the gpt_demo.py save function to support GPT-2 and GPT-J configs

Transformers Neuron 0.1.0 Release Notes

Date: 2023-02-08

First release of transformers-neuronx, a new library that enables LLM model inference on Inf2 & Trn1 using the Neuron SDK. transformers-neuronx contains optimized model implementations that are checkpoint-compatible with HuggingFace Transformers, and currently supports Transformer Decoder models like GPT2, GPT-J and OPT.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

releasenotes.md

releasenotes.md

Transformers Neuron 0.5.0 Release Notes

What's New?

Bug Fixes

Known Issues and Limitations

Transformers Neuron 0.4.0 Release Notes

What's New?

Bug Fixes

Transformers Neuron 0.3.0 Release Notes

What's New?

Bug Fixes

Known Issues and Limitations

Transformers Neuron 0.2.0 Release Notes

What's New?

Bug Fixes

Transformers Neuron 0.1.0 Release Notes

Files

releasenotes.md

Latest commit

History

releasenotes.md

File metadata and controls

Transformers Neuron 0.5.0 Release Notes

What's New?

Bug Fixes

Known Issues and Limitations

Transformers Neuron 0.4.0 Release Notes

What's New?

Bug Fixes

Transformers Neuron 0.3.0 Release Notes

What's New?

Bug Fixes

Known Issues and Limitations

Transformers Neuron 0.2.0 Release Notes

What's New?

Bug Fixes

Transformers Neuron 0.1.0 Release Notes