Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add gpt_neox support #113

Merged
merged 2 commits into from
Oct 20, 2023
Merged

add gpt_neox support #113

merged 2 commits into from
Oct 20, 2023

Conversation

twaka
Copy link
Contributor

@twaka twaka commented Oct 19, 2023

Hi, I'm trying to support gpt_neox architecture.

awq/models/gpt_neox.py results in almost the same code with bloom and I didn't try to scale fused query_key_value.
Also I conducted eval.py with gpt_neox variants because of not enough resource to run original gpt-neox-20b.
Much appreciaed if someone can evaluate with it.

rinna/bilingual-gpt-neox-4b

before

Task Version Metric Value Stderr
wikitext 1 word_perplexity 15.0848
byte_perplexity 1.6611
bits_per_byte 0.7321

after

Task Version Metric Value Stderr
wikitext 1 word_perplexity 15.8949
byte_perplexity 1.6774
bits_per_byte 0.7462

stabilityai/stablelm-base-alpha-7b

before

Task Version Metric Value Stderr
wikitext 1 word_perplexity 17.1947
byte_perplexity 1.7023
bits_per_byte 0.7674

after

Task Version Metric Value Stderr
wikitext 1 word_perplexity 17.6954
byte_perplexity 1.7114
bits_per_byte 0.7752

@casper-hansen
Copy link
Owner

Looks good to me! Thanks for implementing this architecture.

@casper-hansen casper-hansen merged commit be39598 into casper-hansen:main Oct 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants