Develop #7

thomwolf · 2018-11-07T22:34:18Z

Fixing run_squad.py pre-processing bug.

Various clean-ups:

the weight initialization was not optimal (tf. truncated_normal_initializer(stddev=0.02) was translated in weight.data.normal_(0.02) instead of weight.data.normal_(mean=0.0, std=0.02) which likely affected the performance of run_classifer.py also.
gradient accumulation loss was not averaged over the accumulation steps which would have required to change the hyper-parameters for using accumulation.
the evaluation was not done with torch.no_grad() and thus sub-optimal in terms of speed/memory.

…- no_grad on evaluation

Develop

@thomwolf

* Remove `special_tokens_mask` from inputs in README Co-authored-by: Thomas Wolf @thomwolf * fix repetition penalty * soft launch distilroberta * Add Benchmarks to issue templates * Benchmarks example script * Benchmark section added to the documentation * Fix hanging when loading pretrained models - Fix hanging when loading pretrained models from the cache without having internet access. This is a widespread issue on supercomputers whose internal compute nodes are firewalled. * gradient norm clipping should be done right before calling the optimiser * Fix citation * gradient norm clipping should be done right before calling the optimiser - fixing run_glue and run_ner as well * Fix huggingface#1597 * Option to benchmark only one of the two libraries * Fix architectures count * [CTRL] warn if generation prompt does not start with a control code see also salesforce/ctrl#50 * [RELEASE] DistilRoBERTa * [release] fix table weirdness * RoBERTa token classification [WIP] copy paste bert token classification for roberta * Use roberta model and update doc strings * Add Roberta to run_ner.py * Add roberta to doc

* Initial commit to get BERT + run_glue.py on TPU * Add README section for TPU and address comments. * Cleanup TPU bits from run_glue.py (#3) TPU runner is currently implemented in: https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py. We plan to upstream this directly into `huggingface/transformers` (either `master` or `tpu`) branch once it's been more thoroughly tested. * Cleanup TPU bits from run_glue.py TPU runner is currently implemented in: https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py. We plan to upstream this directly into `huggingface/transformers` (either `master` or `tpu`) branch once it's been more thoroughly tested. * No need to call `xm.mark_step()` explicitly (#4) Since for gradient accumulation we're accumulating on batches from `ParallelLoader` instance which on next() marks the step itself. * Resolve R/W conflicts from multiprocessing (#5) * Add XLNet in list of models for `run_glue_tpu.py` (#6) * Add RoBERTa to list of models in TPU GLUE (#7) * Add RoBERTa and DistilBert to list of models in TPU GLUE (#8) * Use barriers to reduce duplicate work/resources (#9) * Shard eval dataset and aggregate eval metrics (#10) * Shard eval dataset and aggregate eval metrics Also, instead of calling `eval_loss.item()` every time do summation with tensors on device. * Change defaultdict to float * Reduce the pred, label tensors instead of metrics As brought up during review some metrics like f1 cannot be aggregated via averaging. GLUE task metrics depends largely on the dataset, so instead we sync the prediction and label tensors so that the metrics can be computed accurately on those instead. * Only use tb_writer from master (#11) * Apply huggingface black code formatting * Style * Remove `--do_lower_case` as example uses cased * Add option to specify tensorboard logdir This is needed for our testing framework which checks regressions against key metrics writtern by the summary writer. * Using configuration for `xla_device` * Prefix TPU specific comments. * num_cores clarification and namespace eval metrics * Cache features file under `args.cache_dir` Instead of under `args.data_dir`. This is needed as our test infra uses data_dir with a read-only filesystem. * Rename `run_glue_tpu` to `run_tpu_glue` Co-authored-by: LysandreJik <[email protected]>

Updating GPT2-TF2 Scripts

# This is the 1st commit message: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#2: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#3: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#4: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#5: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#6: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#7: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#8: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#9: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#10: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#11: Update docs/source/ko/tasks/summarization.mdx

goog

* Cohere Model Release (#1) Cohere Model Release * Remove unnecessary files and code (#2) Some cleanup * Delete cohere-model directory (#3) * Make Fix (#5) * Pr fixes (#6) * fixes for pr * pr fixes for the format * pr fixes for the format * src/transformers/models/auto/tokenization_auto.py * Tokenizer test (#8) * tokenizer test * format fix * Adding Docs and other minor changes (#7) * Add modeling tests (#9) * Smol Fix (#11) * tokenization tests are fixed * format fixes * fix pr doc tests * fix pr doc tests * fix pr doc tests * fix pr style check * small changes in cohere.md * FIX: Address final comments for transformers integration (#13) * fix modeling final nits and add proper test file * for now leave empty tests * add integration test * push new test * fix modeling cohere (#14) * Update chat templates to use the new API (#15) --------- Co-authored-by: ahmetustun <[email protected]> Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: Matt <[email protected]>

thomwolf and others added 3 commits November 7, 2018 22:12

fixing pre-processing bug - averaging loss for gradient accumulation …

6bb7510

…- no_grad on evaluation

cleaning up - speeding up a bit multi-gpu

dbc318a

Merge branch 'master' into develop

efeb6b1

thomwolf merged commit 5c0838d into master Nov 7, 2018

thomwolf deleted the develop branch November 7, 2018 22:51

qwang70 pushed a commit to DRL36/pytorch-pretrained-BERT that referenced this pull request Mar 2, 2019

Merge pull request huggingface#7 from huggingface/develop

c8fe787

Develop

maeotaku mentioned this pull request May 23, 2019

bert->onnx ->caffe2 weird error #633

Closed

HongyanJiao mentioned this pull request Sep 19, 2019

traced_model #1291

Closed

manchandasahil mentioned this pull request Mar 22, 2021

Longformer training : CUDA error: device-side assert triggered #10852

Closed

2 tasks

Guillaume-slize mentioned this pull request May 30, 2021

Encoding/decoding NLP model in tensorflow lite (fine-tuned GPT2) #11947

Closed

rraminen pushed a commit to rraminen/transformers that referenced this pull request Jun 3, 2022

Merge pull request huggingface#7 from ROCmSoftwarePlatform/gpt2-tf2

bd12e8b

Updating GPT2-TF2 Scripts

jlamypoirier pushed a commit to jlamypoirier/transformers that referenced this pull request Apr 4, 2023

Add graphs (huggingface#7)

66ed2bd

jameshennessytempus pushed a commit to jameshennessytempus/transformers that referenced this pull request Jun 1, 2023

Merge pull request huggingface#7 from huggingface/main

9aecbdf

goog

lwmlyy mentioned this pull request Aug 15, 2023

add util for ram efficient loading of model when using fsdp #25107

Merged

1 task

younesbelkada pushed a commit to younesbelkada/transformers that referenced this pull request Mar 14, 2024

Adding Docs and other minor changes (huggingface#7)

cacb8ae

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop #7

Develop #7

thomwolf commented Nov 7, 2018

Develop #7

Develop #7

Conversation

thomwolf commented Nov 7, 2018