Fix (data): updating wikitext2 data utility #1080

i-colbert · 2024-10-29T20:01:12Z

Reason for this PR

Update the wikitext2 data utility to work within the latest LLM quantization entry point. This version of the wikitext2 data loader uses the whole test dataset without random subsampling, which affords us more consistent benchmarking.

Changes Made in this PR

Integrated the data pre-processing into the get_wikitext2 function.

Testing Summary

N/A

Risk Highlight

N/A

Checklist

Code comments added to any hard-to-understand areas, if applicable.
Changes generate no new warnings.
Updated any relevant tests, if applicable.
No conflicts with destination dev branch.
I reviewed my own code changes.
Initial CI/CD passing.
1+ reviews given, and any review issues addressed and approved.
Post-review full CI/CD passing.

src/brevitas_examples/llm/llm_quant/data.py

Giuseppe5 · 2024-10-29T21:32:58Z

Is this version of wikitext2 used in some paper?
Is there anything we're missing from the old version of wikitext2?
Out of curiosity, if we were to compare with other popular implementation of this code (e.g., AutoGPTQ I guess), where do we land?

i-colbert · 2024-10-29T22:04:03Z

Is this version of wikitext2 used in some paper? Is there anything we're missing from the old version of wikitext2? Out of curiosity, if we were to compare with other popular implementation of this code (e.g., AutoGPTQ I guess), where do we land?

Yes, this version is modified from the original GPTQ codebase, as attributed in the file header, and is likely used by many works to collect results for their papers. The version in optimum uses random subsampling with replacement, which is useful for prototyping, but does not actually calculate the likelihood over the whole test dataset as sequences can be repeated or not even represented.

Giuseppe5 reviewed Oct 29, 2024

View reviewed changes

src/brevitas_examples/llm/llm_quant/data.py Show resolved Hide resolved

Fix (data): updating wikitext2 data utility

cb84410

i-colbert force-pushed the fix/data_utils branch from 2d0895a to cb84410 Compare October 30, 2024 15:28

Giuseppe5 merged commit ae3ec68 into Xilinx:dev Oct 30, 2024
23 checks passed

i-colbert deleted the fix/data_utils branch October 30, 2024 16:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix (data): updating wikitext2 data utility #1080

Fix (data): updating wikitext2 data utility #1080

i-colbert commented Oct 29, 2024 •

edited

Loading

Giuseppe5 commented Oct 29, 2024

i-colbert commented Oct 29, 2024

Fix (data): updating wikitext2 data utility #1080

Fix (data): updating wikitext2 data utility #1080

Conversation

i-colbert commented Oct 29, 2024 • edited Loading

Reason for this PR

Changes Made in this PR

Testing Summary

Risk Highlight

Checklist

Giuseppe5 commented Oct 29, 2024

i-colbert commented Oct 29, 2024

i-colbert commented Oct 29, 2024 •

edited

Loading