add gpt_neox support #113

twaka · 2023-10-19T15:47:26Z

Hi, I'm trying to support gpt_neox architecture.

Support for gpt-neox model #41

awq/models/gpt_neox.py results in almost the same code with bloom and I didn't try to scale fused query_key_value.
Also I conducted eval.py with gpt_neox variants because of not enough resource to run original gpt-neox-20b.
Much appreciaed if someone can evaluate with it.

rinna/bilingual-gpt-neox-4b

before

Task	Version	Metric	Value
wikitext	1	word_perplexity	15.0848
		byte_perplexity	1.6611
		bits_per_byte	0.7321

after

Task	Version	Metric	Value
wikitext	1	word_perplexity	15.8949
		byte_perplexity	1.6774
		bits_per_byte	0.7462

stabilityai/stablelm-base-alpha-7b

before

Task	Version	Metric	Value
wikitext	1	word_perplexity	17.1947
		byte_perplexity	1.7023
		bits_per_byte	0.7674

after

Task	Version	Metric	Value
wikitext	1	word_perplexity	17.6954
		byte_perplexity	1.7114
		bits_per_byte	0.7752

casper-hansen · 2023-10-20T21:14:50Z

Looks good to me! Thanks for implementing this architecture.

twaka added 2 commits October 17, 2023 13:42

gpt_neox

ad45716

add comments

bcfdeb3

casper-hansen merged commit be39598 into casper-hansen:main Oct 20, 2023

casper-hansen mentioned this pull request Oct 20, 2023

Support StableLM-3B #115

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add gpt_neox support #113

add gpt_neox support #113

twaka commented Oct 19, 2023

casper-hansen commented Oct 20, 2023

add gpt_neox support #113

add gpt_neox support #113

Conversation

twaka commented Oct 19, 2023

casper-hansen commented Oct 20, 2023