Swapped to_seq_len/from_seq_len in comment #11

nikitakit · 2018-11-09T06:13:08Z

I'm pretty sure this comment:

https://github.com/huggingface/pytorch-pretrained-BERT/blob/2c5d993ba48841575d9c58f0754bca00b288431c/modeling.py#L339-L343

should instead say:

# Sizes are [batch_size, 1, 1, to_seq_length] 
# So we can broadcast to [batch_size, num_heads, from_seq_length, to_seq_length]

When masking out tokens for attention, it doesn't matter what happens to attention from padding tokens, only that there is no attention to padding tokens.

I don't believe the code is doing what the comment currently suggests because that would be an implementation flaw.

The text was updated successfully, but these errors were encountered:

thomwolf · 2018-11-09T08:31:25Z

Yes! fixed the comment

fix issues in new quac-kd runner (cont.)

* Initial commit to get BERT + run_glue.py on TPU * Add README section for TPU and address comments. * Cleanup TPU bits from run_glue.py (#3) TPU runner is currently implemented in: https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py. We plan to upstream this directly into `huggingface/transformers` (either `master` or `tpu`) branch once it's been more thoroughly tested. * Cleanup TPU bits from run_glue.py TPU runner is currently implemented in: https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py. We plan to upstream this directly into `huggingface/transformers` (either `master` or `tpu`) branch once it's been more thoroughly tested. * No need to call `xm.mark_step()` explicitly (#4) Since for gradient accumulation we're accumulating on batches from `ParallelLoader` instance which on next() marks the step itself. * Resolve R/W conflicts from multiprocessing (#5) * Add XLNet in list of models for `run_glue_tpu.py` (#6) * Add RoBERTa to list of models in TPU GLUE (#7) * Add RoBERTa and DistilBert to list of models in TPU GLUE (#8) * Use barriers to reduce duplicate work/resources (#9) * Shard eval dataset and aggregate eval metrics (#10) * Shard eval dataset and aggregate eval metrics Also, instead of calling `eval_loss.item()` every time do summation with tensors on device. * Change defaultdict to float * Reduce the pred, label tensors instead of metrics As brought up during review some metrics like f1 cannot be aggregated via averaging. GLUE task metrics depends largely on the dataset, so instead we sync the prediction and label tensors so that the metrics can be computed accurately on those instead. * Only use tb_writer from master (#11) * Apply huggingface black code formatting * Style * Remove `--do_lower_case` as example uses cased * Add option to specify tensorboard logdir This is needed for our testing framework which checks regressions against key metrics writtern by the summary writer. * Using configuration for `xla_device` * Prefix TPU specific comments. * num_cores clarification and namespace eval metrics * Cache features file under `args.cache_dir` Instead of under `args.data_dir`. This is needed as our test infra uses data_dir with a read-only filesystem. * Rename `run_glue_tpu` to `run_tpu_glue` Co-authored-by: LysandreJik <[email protected]>

Bert type cast fix

…r-2022-05-05 IFU-master-2022-05-05

# This is the 1st commit message: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#2: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#3: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#4: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#5: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#6: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#7: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#8: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#9: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#10: Update docs/source/ko/tasks/summarization.mdx Co-authored-by: Wonhyeong Seo <[email protected]> # This is the commit message huggingface#11: Update docs/source/ko/tasks/summarization.mdx

Pop

…uggingface#11)

* Cohere Model Release (#1) Cohere Model Release * Remove unnecessary files and code (#2) Some cleanup * Delete cohere-model directory (#3) * Make Fix (#5) * Pr fixes (#6) * fixes for pr * pr fixes for the format * pr fixes for the format * src/transformers/models/auto/tokenization_auto.py * Tokenizer test (#8) * tokenizer test * format fix * Adding Docs and other minor changes (#7) * Add modeling tests (#9) * Smol Fix (#11) * tokenization tests are fixed * format fixes * fix pr doc tests * fix pr doc tests * fix pr doc tests * fix pr style check * small changes in cohere.md * FIX: Address final comments for transformers integration (#13) * fix modeling final nits and add proper test file * for now leave empty tests * add integration test * push new test * fix modeling cohere (#14) * Update chat templates to use the new API (#15) --------- Co-authored-by: ahmetustun <[email protected]> Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: Matt <[email protected]>

Suite tmp

thomwolf closed this as completed Nov 9, 2018

maeotaku mentioned this issue May 23, 2019

bert->onnx ->caffe2 weird error #633

Closed

HongyanJiao mentioned this issue Sep 19, 2019

traced_model #1291

Closed

stevezheng23 added a commit to stevezheng23/transformers that referenced this issue Mar 24, 2020

Merge pull request huggingface#11 from stevezheng23/dev/zheng/quac

4906c01

fix issues in new quac-kd runner (cont.)

manchandasahil mentioned this issue Mar 22, 2021

Longformer training : CUDA error: device-side assert triggered #10852

Closed

2 tasks

amathews-amd referenced this issue in ROCm/transformers Aug 6, 2021

Merge pull request #11 from microsoft/jingywa/hfbert-changes

6b9500b

Bert type cast fix

yananchen1989 mentioned this issue Aug 20, 2021

Bugs when fine tuning the gpt2 #12965

Closed

rraminen pushed a commit to rraminen/transformers that referenced this issue Jun 3, 2022

Merge pull request huggingface#11 from ROCmSoftwarePlatform/IFU-maste…

dc78c95

…r-2022-05-05 IFU-master-2022-05-05

jameshennessytempus pushed a commit to jameshennessytempus/transformers that referenced this issue Jun 1, 2023

Merge pull request huggingface#11 from jamesthesnake/pop

180ae81

Pop

lwmlyy mentioned this issue Aug 15, 2023

add util for ram efficient loading of model when using fsdp #25107

Merged

1 task

ocavue pushed a commit to ocavue/transformers that referenced this issue Sep 13, 2023

Update conversion script to copy certain JSON files to destination (h…

1f14ca0

…uggingface#11)

younesbelkada pushed a commit to younesbelkada/transformers that referenced this issue Mar 14, 2024

Smol Fix (huggingface#11)

ef6ed3d

SangbumChoi added a commit to SangbumChoi/transformers that referenced this issue Aug 22, 2024

Merge pull request huggingface#11 from Superb-AI-Suite/suite-tmp

5cc2c05

Suite tmp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Swapped to_seq_len/from_seq_len in comment #11

Swapped to_seq_len/from_seq_len in comment #11

nikitakit commented Nov 9, 2018

thomwolf commented Nov 9, 2018

Swapped to_seq_len/from_seq_len in comment #11

Swapped to_seq_len/from_seq_len in comment #11

Comments

nikitakit commented Nov 9, 2018

thomwolf commented Nov 9, 2018