From bdcbc5acb3b4da7b7f00af594a0b4743cef3f59b Mon Sep 17 00:00:00 2001
From: Timo Imhof <timo.imhof.uni@gmail.com>
Date: Thu, 30 Mar 2023 11:41:58 +0200
Subject: [PATCH 01/12] correct spelling error

---
 adapter_docs/installation.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/adapter_docs/installation.md b/adapter_docs/installation.md
index 53ad93aed..da6359e85 100644
--- a/adapter_docs/installation.md
+++ b/adapter_docs/installation.md
@@ -6,8 +6,8 @@ It currently supports Python 3.8+ and PyTorch 1.12.1+. You will have to [install
 ```{eval-rst}
 .. important::
     ``adapter-transformers`` is a direct fork of ``transformers``.
-    This means our package includes all the awesome features of HuggingFace's original package plus the adapter implementation.
-    As both packages share the same namespace, they ideally should not installed in the same environment.
+    This means our package includes all the awesome features of HuggingFace's original package, plus the adapter implementation.
+    As both packages share the same namespace, they ideally should not be installed in the same environment.
 ```
 
 ## Using pip

From 0c50020afab785205b921ba9116a93f4e43194a3 Mon Sep 17 00:00:00 2001
From: Timo Imhof <timo.imhof.uni@gmail.com>
Date: Thu, 30 Mar 2023 11:44:18 +0200
Subject: [PATCH 02/12] correct spelling errors and update code slices to run
 on any OS

---
 adapter_docs/quickstart.md | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/adapter_docs/quickstart.md b/adapter_docs/quickstart.md
index fb4010dd8..07ddb9cbb 100644
--- a/adapter_docs/quickstart.md
+++ b/adapter_docs/quickstart.md
@@ -24,9 +24,11 @@ We use BERT in this example, so we first load a pre-trained `BertTokenizer` to e
 `bert-base-uncased` checkpoint from HuggingFace's Model Hub using the [`BertAdapterModel`](transformers.adapters.BertAdapterModel) class:
 
 ```python
+import os
+
 import torch
 from transformers import BertTokenizer
-from transformers.adapters import BertAdapterModel
+from transformers.adapters import BertAdapterModel, AutoAdapterModel
 
 # Load pre-trained BERT tokenizer from Huggingface.
 tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
@@ -44,16 +46,21 @@ model = BertAdapterModel.from_pretrained('bert-base-uncased')
 ```
 
 Having loaded the model, we now add a pre-trained task adapter that is useful to our task from AdapterHub.
-As we're doing sentiment classification, we use [an adapter trained on the SST-2 dataset](https://adapterhub.ml/adapters/ukp/bert-base-uncased_sentiment_sst-2_pfeiffer/) in this case.
+In this case, for sentiment classification, we thus use [an adapter trained on the SST-2 dataset](https://adapterhub.ml/adapters/ukp/bert-base-uncased_sentiment_sst-2_pfeiffer/).
 The task prediction head loaded together with the adapter gives us a class label for our sentence:
 
 ```python
 # load pre-trained task adapter from Adapter Hub
 # this method call will also load a pre-trained classification head for the adapter task
-adapter_name = model.load_adapter('sst-2@ukp', config='pfeiffer')
+# adapter_name = model.load_adapter('sst-2@ukp', config='pfeiffer')
+adapter_name = model.load_adapter("AdapterHub/bert-base-uncased-pf-sst2", source="hf")
+
 
 # activate the adapter we just loaded, so that it is used in every forward pass
 model.set_active_adapters(adapter_name)
+# TODO: remove! But I only found out the name of the adapter like that, shouldn't this be also on the website on how to 
+#  use the model? how should we include how to save the adapters? because you need to give the correct name as argument
+print(f"model_config_adapters: {model.config.adapters.adapters}")  
 
 # predict output tensor
 outputs = model(**input_data)
@@ -66,25 +73,28 @@ assert predicted == 1
 To save our pre-trained model and adapters, we can easily store and reload them as follows:
 
 ```python
+# for the sake of this example an example path for loading and storing is given below
+example_path = os.path.join(os.getcwd(), "adapter-quickstart")
+
 # save model
-model.save_pretrained('./path/to/model/directory/')
+model.save_pretrained(example_path)
 # save adapter
-model.save_adapter('./path/to/adapter/directory/', 'sst-2')
+model.save_adapter(example_path, 'glue_sst2')
 
 # load model
-model = AutoAdapterModel.from_pretrained('./path/to/model/directory/')
-model.load_adapter('./path/to/adapter/directory/')
+model = AutoAdapterModel.from_pretrained(example_path)
+model.load_adapter(example_path)
 ```
 
 Similar to how the weights of the full model are saved, the `save_adapter()` will create a file for saving the adapter weights and a file for saving the adapter configuration in the specified directory.
 
-Finally, if we have finished working with adapters, we can restore the base Transformer in its original form by deactivating and deleting the adapter:
+Finally, if we have finished working with adapters, we can restore the base Transformer to its original form by deactivating and deleting the adapter:
 
 ```python
 # deactivate all adapters
 model.set_active_adapters(None)
 # delete the added adapter
-model.delete_adapter('sst-2')
+model.delete_adapter('glue_sst2')
 ```
 
 ## Quick Tour: Adapter training

From 36435348bfca77f6004f81be078ec71a88924d22 Mon Sep 17 00:00:00 2001
From: Timo Imhof <timo.imhof.uni@gmail.com>
Date: Thu, 30 Mar 2023 12:39:45 +0200
Subject: [PATCH 03/12] correct spelling errors reframe unclear passages

---
 adapter_docs/training.md | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/adapter_docs/training.md b/adapter_docs/training.md
index fe75e7697..17e201da3 100644
--- a/adapter_docs/training.md
+++ b/adapter_docs/training.md
@@ -13,7 +13,7 @@ pip install -r ./examples/pytorch/<your_examples_folder>/requirements.txt
 
 ## Train a Task Adapter
 
-Training a task adapter module on a dataset only requires minor modifications from training the full model.
+Training a task adapter module on a dataset only requires minor modifications from training the whole model.
 Suppose we have an existing script for training a Transformer model.
 In the following, we will use HuggingFace's [run_glue.py](https://github.com/Adapter-Hub/adapter-transformers/blob/master/examples/pytorch/text-classification/run_glue.py) example script for training on the GLUE benchmark.
 We go through all required changes step by step:
@@ -43,17 +43,17 @@ model = AutoAdapterModel.from_pretrained(
 model.add_classification_head(data_args.task_name, num_labels=num_labels)
 ```
 
-Note that this change is entirely optional and training will also work with the original model class.
-Learn more about the benefits of AdapterModel classes [here](prediction_heads.md)
+Note that this change is optional and training will also work with the original model class.
+Learn more about the benefits of AdapterModel classes [here](prediction_heads.md).
 
 ### Step C - Setup adapter methods
 
 ```{eval-rst}
 .. tip::
-    In the following, we show how to setup adapters manually. In most cases, you can use the built-in ``setup_adapter_training()`` method to perform this job automatically. Just add a statement similar to this anywhere between model instantiation and training start in your script: ``setup_adapter_training(model, adapter_args, task_name)``
+    In the following, we show how to set up adapters manually. In most cases, you can use the built-in ``setup_adapter_training()`` method to perform this job automatically. Just add a statement similar to this anywhere between model instantiation and training start in your script: ``setup_adapter_training(model, adapter_args, task_name)``
 ```
 
-Compared to fine-tuning the full model, there is only this one significant adaptation we have to make: adding an adapter setup and activating it.
+Compared to fine-tuning the entire model, we have to make only one significant adaptation: adding an adapter setup and activating it.
 
 ```python
 # task adapter - only add if not existing
@@ -69,14 +69,14 @@ model.train_adapter(task_name)
 ```{eval-rst}
 .. important::
     The most crucial step when training an adapter module is to freeze all weights in the model except for those of the
-    adapter. In the previous snippet, this is achieved by calling the ``train_adapter()`` method which disables training
+    adapter. In the previous snippet, this is achieved by calling the ``train_adapter()`` method, which disables training
     of all weights outside the task adapter. In case you want to unfreeze all model weights later on, you can use
     ``freeze_model(False)``.
 ```
 
 Besides this, we only have to make sure that the task adapter and prediction head are activated so that they are used in every forward pass. To specify the adapter modules to use, we can use the `model.set_active_adapters()` 
 method and pass the adapter setup. If you only use a single adapter, you can simply pass the name of the adapter. For more information
-on complex setups checkout the [Composition Blocks](https://docs.adapterhub.ml/adapter_composition.html).
+on complex setups, checkout the [Composition Blocks](https://docs.adapterhub.ml/adapter_composition.html).
 
 ```python
 model.set_active_adapters(task_name)
@@ -88,14 +88,14 @@ Finally, we switch the `Trainer` class built into Transformers for adapter-trans
 See [below for more information](#adaptertrainer).
 
 Technically, this change is not required as no changes to the training loop are required for training adapters.
-However, `AdapterTrainer` e.g. provides better support for checkpointing and reloading adapter weights.
+However, `AdapterTrainer` e.g., provides better support for checkpointing and reloading adapter weights.
 
 ### Step E - Start training
 
 The rest of the training procedure does not require any further changes in code.
 
 You can find the full version of the modified training script for GLUE at [run_glue.py](https://github.com/Adapter-Hub/adapter-transformers/blob/master/examples/pytorch/text-classification/run_glue.py) in the `examples` folder of our repository.
-We also adapted [various other example scripts](https://github.com/Adapter-Hub/adapter-transformers/tree/master/examples/pytorch) (e.g. `run_glue.py`, `run_multiple_choice.py`, `run_squad.py`, ...) to support adapter training.
+We also adapted [various other example scripts](https://github.com/Adapter-Hub/adapter-transformers/tree/master/examples/pytorch) (e.g., `run_glue.py`, `run_multiple_choice.py`, `run_squad.py`, ...) to support adapter training.
 
 To start adapter training on a GLUE task, you can run something similar to:
 
@@ -117,7 +117,7 @@ python run_glue.py \
   --adapter_config pfeiffer
 ```
 
-The important flag here is `--train_adapter` which switches from fine-tuning the full model to training an adapter module for the given GLUE task.
+The important flag here is `--train_adapter`, which switches from fine-tuning the entire model to training an adapter module for the given GLUE task.
 
 ```{eval-rst}
 .. tip::
@@ -126,7 +126,7 @@ The important flag here is `--train_adapter` which switches from fine-tuning the
 
 ```{eval-rst}
 .. tip::
-    Depending on your data set size you might also need to train longer than usual. To avoid overfitting you can evaluating the adapters after each epoch on the development set and only save the best model.
+    Depending on your data set size, you might also need to train longer than usual. To avoid overfitting, you can evaluate the adapters after each epoch on the development set and only save the best model.
 ```
 
 ## Train a Language Adapter
@@ -160,12 +160,12 @@ You can adapt this script to train AdapterFusion with different pre-trained adap
 
 ```{eval-rst}
 .. important::
-    AdapterFusion on a target task is trained in a second training stage, after independently training adapters on individual tasks.
+    AdapterFusion on a target task is trained in a second training stage after independently training adapters on individual tasks.
     When setting up a fusion architecture on your model, make sure to load the pre-trained adapter modules to be fused using ``model.load_adapter()`` before adding a fusion layer.
     For more on AdapterFusion, also refer to `Pfeiffer et al., 2020 <https://arxiv.org/pdf/2005.00247>`_.
 ```
 
-To start fusion training on SST-2 as target task, you can run something like the following:
+To start fusion training on SST-2 as the target task, you can run something like the following:
 
 ```
 export GLUE_DIR=/path/to/glue

From fd1a74b389e32ab5f89aa09af0fff83eac1dee5b Mon Sep 17 00:00:00 2001
From: Timo Imhof <timo.imhof.uni@gmail.com>
Date: Thu, 30 Mar 2023 12:41:22 +0200
Subject: [PATCH 04/12] correct spelling error

---
 adapter_docs/training.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/adapter_docs/training.md b/adapter_docs/training.md
index 17e201da3..165315243 100644
--- a/adapter_docs/training.md
+++ b/adapter_docs/training.md
@@ -190,7 +190,7 @@ python run_fusion_glue.py \
 
 Similar to the `Trainer` class provided by HuggingFace, adapter-transformers provides an `AdapterTrainer` class. This class is only
 intended for training adapters. The `Trainer` class should still be used to fully fine-tune models. To train adapters with the `AdapterTrainer`
-class, simply initialize it the same way you would initialize the `Trainer` class e.g.: 
+class, simply initialize it the same way you would initialize the `Trainer` class,e.g.: 
 
 ```python
 model.add_adapter(task_name)

From 790030362d5b31979567ab804a969fd657c9fa60 Mon Sep 17 00:00:00 2001
From: Timo Imhof <timo.imhof.uni@gmail.com>
Date: Thu, 30 Mar 2023 17:27:46 +0200
Subject: [PATCH 05/12] finalize demo code

---
 adapter_docs/quickstart.md | 33 +++++++++++++++------------------
 1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/adapter_docs/quickstart.md b/adapter_docs/quickstart.md
index 07ddb9cbb..2e51bb49e 100644
--- a/adapter_docs/quickstart.md
+++ b/adapter_docs/quickstart.md
@@ -27,22 +27,22 @@ We use BERT in this example, so we first load a pre-trained `BertTokenizer` to e
 import os
 
 import torch
-from transformers import BertTokenizer
-from transformers.adapters import BertAdapterModel, AutoAdapterModel
+from transformers import AutoTokenizer  # TODO: discuss: I find it more convenient to use the Auto class
+from transformers.adapters import AutoAdapterModel
 
-# Load pre-trained BERT tokenizer from Huggingface.
-tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
+# Load pre-trained BERT tokenizer from Huggingface
+tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
 
 # An input sentence.
 sentence = "It's also, clearly, great fun."
 
-# Tokenize the input sentence and create a PyTorch input tensor.
+# Tokenize the input sentence and create a PyTorch input tensor
 input_data = tokenizer(sentence, return_tensors="pt")
 
-# Load pre-trained BERT model from HuggingFace Hub.
-# The `BertAdapterModel` class is specifically designed for working with adapters.
-# It can be used with different prediction heads.
-model = BertAdapterModel.from_pretrained('bert-base-uncased')
+# Load pre-trained BERT model from HuggingFace Hub
+# The `BertAdapterModel` class is specifically designed for working with adapters
+# It can be used with different prediction heads
+model = AutoAdapterModel.from_pretrained('bert-base-uncased')
 ```
 
 Having loaded the model, we now add a pre-trained task adapter that is useful to our task from AdapterHub.
@@ -52,15 +52,12 @@ The task prediction head loaded together with the adapter gives us a class label
 ```python
 # load pre-trained task adapter from Adapter Hub
 # this method call will also load a pre-trained classification head for the adapter task
-# adapter_name = model.load_adapter('sst-2@ukp', config='pfeiffer')
-adapter_name = model.load_adapter("AdapterHub/bert-base-uncased-pf-sst2", source="hf")
-
+# TODO: discuss: When looking for this adapter on the webiste the name is "sentiment/..." I think we should keep names 
+#  consistent because for new people (at least for me) more possibilities for loading the same thing is confusing
+adapter_name = model.load_adapter("sentiment/sst-2@ukp", config='pfeiffer')
 
 # activate the adapter we just loaded, so that it is used in every forward pass
 model.set_active_adapters(adapter_name)
-# TODO: remove! But I only found out the name of the adapter like that, shouldn't this be also on the website on how to 
-#  use the model? how should we include how to save the adapters? because you need to give the correct name as argument
-print(f"model_config_adapters: {model.config.adapters.adapters}")  
 
 # predict output tensor
 outputs = model(**input_data)
@@ -73,13 +70,13 @@ assert predicted == 1
 To save our pre-trained model and adapters, we can easily store and reload them as follows:
 
 ```python
-# for the sake of this example an example path for loading and storing is given below
+# for the sake of this demonstration an example path for loading and storing is given below
 example_path = os.path.join(os.getcwd(), "adapter-quickstart")
 
 # save model
 model.save_pretrained(example_path)
 # save adapter
-model.save_adapter(example_path, 'glue_sst2')
+model.save_adapter(example_path, adapter_name)  # TODO: discuss: nobody knows where the 'sst-2' comes from
 
 # load model
 model = AutoAdapterModel.from_pretrained(example_path)
@@ -94,7 +91,7 @@ Finally, if we have finished working with adapters, we can restore the base Tran
 # deactivate all adapters
 model.set_active_adapters(None)
 # delete the added adapter
-model.delete_adapter('glue_sst2')
+model.delete_adapter(adapter_name)
 ```
 
 ## Quick Tour: Adapter training

From 317005016b848d6dc128af1e73414a8185f008cd Mon Sep 17 00:00:00 2001
From: Timo Imhof <timo.imhof.uni@gmail.com>
Date: Thu, 30 Mar 2023 17:29:19 +0200
Subject: [PATCH 06/12] modify phrasing, make use of adapter-transformers
 consistent

---
 adapter_docs/training.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/adapter_docs/training.md b/adapter_docs/training.md
index 165315243..a67b685f5 100644
--- a/adapter_docs/training.md
+++ b/adapter_docs/training.md
@@ -13,14 +13,14 @@ pip install -r ./examples/pytorch/<your_examples_folder>/requirements.txt
 
 ## Train a Task Adapter
 
-Training a task adapter module on a dataset only requires minor modifications from training the whole model.
+Training a task adapter module on a dataset only requires minor modifications from training a whole model.
 Suppose we have an existing script for training a Transformer model.
 In the following, we will use HuggingFace's [run_glue.py](https://github.com/Adapter-Hub/adapter-transformers/blob/master/examples/pytorch/text-classification/run_glue.py) example script for training on the GLUE benchmark.
 We go through all required changes step by step:
 
 ### Step A - Parse `AdapterArguments`
 
-The [`AdapterArguments`](transformers.adapters.training.AdapterArguments) class integrated into adapter-transformers provides a set of command-line options useful for training adapters.
+The [`AdapterArguments`](transformers.adapters.training.AdapterArguments) class integrated into `adapter-transformers` provides a set of command-line options useful for training adapters.
 These include options such as `--train_adapter` for activating adapter training and `--load_adapter` for loading adapters from checkpoints.
 Thus, the first step of integrating adapters is to add these arguments to the line where `HfArgumentParser` is instantiated:
 
@@ -84,7 +84,7 @@ model.set_active_adapters(task_name)
 
 ### Step D - Switch to `AdapterTrainer` class
 
-Finally, we switch the `Trainer` class built into Transformers for adapter-transformers' [`AdapterTrainer`](transformers.adapters.AdapterTrainer) class that is optimized for training adapter methods.
+Finally, we exchange the `Trainer` class built into Transformers for `adapter-transformers`' [`AdapterTrainer`](transformers.adapters.AdapterTrainer) class that is optimized for training adapter methods.
 See [below for more information](#adaptertrainer).
 
 Technically, this change is not required as no changes to the training loop are required for training adapters.
@@ -121,7 +121,7 @@ The important flag here is `--train_adapter`, which switches from fine-tuning th
 
 ```{eval-rst}
 .. tip::
-    Adapter weights are usually initialized randomly. That is why we require a higher learning rate. We have found that a default adapter learning rate of ``1e-4`` works well for most settings.
+    Adapter weights are usually initialized randomly, which is why we require a higher learning rate. We have found that a default adapter learning rate of ``1e-4`` works well for most settings.
 ```
 
 ```{eval-rst}
@@ -188,7 +188,7 @@ python run_fusion_glue.py \
 
 ## AdapterTrainer
 
-Similar to the `Trainer` class provided by HuggingFace, adapter-transformers provides an `AdapterTrainer` class. This class is only
+Similar to the `Trainer` class provided by HuggingFace, `adapter-transformers` provides an `AdapterTrainer` class. This class is only
 intended for training adapters. The `Trainer` class should still be used to fully fine-tune models. To train adapters with the `AdapterTrainer`
 class, simply initialize it the same way you would initialize the `Trainer` class,e.g.: 
 

From 0894df5787d6725c32698d7d3e1d0580e050aefb Mon Sep 17 00:00:00 2001
From: TimoImhof <timo.imhof.uni@gmail.com>
Date: Fri, 31 Mar 2023 09:48:23 +0200
Subject: [PATCH 07/12] update broken link

---
 adapter_docs/training.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/adapter_docs/training.md b/adapter_docs/training.md
index a67b685f5..9e13c86a8 100644
--- a/adapter_docs/training.md
+++ b/adapter_docs/training.md
@@ -1,7 +1,7 @@
 # Adapter Training
 
 This section describes some examples of training adapter methods for different scenarios. We focus on integrating adapter methods into existing training scripts for Transformer models.
-All presented scripts are only slightly modified from the original [examples from HuggingFace Transformers](https://huggingface.co/transformers/examples.html).
+All presented scripts are only slightly modified from the original [examples from HuggingFace Transformers](https://github.com/huggingface/transformers/tree/main/examples/pytorch#examples).
 To run the scripts, make sure you have the latest version of the repository and have installed some additional requirements:
 
 ```
@@ -84,7 +84,7 @@ model.set_active_adapters(task_name)
 
 ### Step D - Switch to `AdapterTrainer` class
 
-Finally, we exchange the `Trainer` class built into Transformers for `adapter-transformers`' [`AdapterTrainer`](transformers.adapters.AdapterTrainer) class that is optimized for training adapter methods.
+Finally, we exchange the `Trainer` class built into Transformers for adapter-transformers' [`AdapterTrainer`](transformers.adapters.AdapterTrainer) class that is optimized for training adapter methods.
 See [below for more information](#adaptertrainer).
 
 Technically, this change is not required as no changes to the training loop are required for training adapters.

From 28dfae67e0cdef49739527769d8778e767f77bd3 Mon Sep 17 00:00:00 2001
From: TimoImhof <timo.imhof.uni@gmail.com>
Date: Fri, 31 Mar 2023 09:49:25 +0200
Subject: [PATCH 08/12] integrate Bert specific classes again to demonstrate
 later that the Auto class can be used instead

---
 adapter_docs/quickstart.md | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/adapter_docs/quickstart.md b/adapter_docs/quickstart.md
index 2e51bb49e..b29074fe8 100644
--- a/adapter_docs/quickstart.md
+++ b/adapter_docs/quickstart.md
@@ -27,11 +27,11 @@ We use BERT in this example, so we first load a pre-trained `BertTokenizer` to e
 import os
 
 import torch
-from transformers import AutoTokenizer  # TODO: discuss: I find it more convenient to use the Auto class
-from transformers.adapters import AutoAdapterModel
+from transformers import BertTokenizer
+from transformers.adapters import BertAdapterModel, AutoAdapterModel
 
 # Load pre-trained BERT tokenizer from Huggingface
-tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
+tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
 
 # An input sentence.
 sentence = "It's also, clearly, great fun."
@@ -42,7 +42,7 @@ input_data = tokenizer(sentence, return_tensors="pt")
 # Load pre-trained BERT model from HuggingFace Hub
 # The `BertAdapterModel` class is specifically designed for working with adapters
 # It can be used with different prediction heads
-model = AutoAdapterModel.from_pretrained('bert-base-uncased')
+model = BertAdapterModel.from_pretrained('bert-base-uncased')
 ```
 
 Having loaded the model, we now add a pre-trained task adapter that is useful to our task from AdapterHub.
@@ -79,6 +79,7 @@ model.save_pretrained(example_path)
 model.save_adapter(example_path, adapter_name)  # TODO: discuss: nobody knows where the 'sst-2' comes from
 
 # load model
+# similar to HuggingFace's AutoModel class, you can also use AutoAdapterModel instead of BertAdapterModel
 model = AutoAdapterModel.from_pretrained(example_path)
 model.load_adapter(example_path)
 ```

From d191d7b4bf29e84074a2c9cf79789c5ce4e98795 Mon Sep 17 00:00:00 2001
From: TimoImhof <timo.imhof.uni@gmail.com>
Date: Fri, 31 Mar 2023 10:12:48 +0200
Subject: [PATCH 09/12] integrate Bert specific classes again to demonstrate
 later that the Auto class can be used instead, make comments consistent, make
 HF name consistent

---
 adapter_docs/quickstart.md | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/adapter_docs/quickstart.md b/adapter_docs/quickstart.md
index b29074fe8..9845ecba6 100644
--- a/adapter_docs/quickstart.md
+++ b/adapter_docs/quickstart.md
@@ -10,7 +10,7 @@ storing (`save_adapter()`) and deletion (`delete_adapter()`) are added to the mo
 .. note::
     This document focuses on the adapter-related functionalities added by *adapter-transformers*.
     For a more general overview of the *transformers* library, visit
-    `the 'Usage' section in Huggingface's documentation <https://huggingface.co/transformers/usage.html>`_.
+    `the 'Usage' section in HuggingFace's documentation <https://huggingface.co/transformers/usage.html>`_.
 ```
 
 ## Quick Tour: Using a pre-trained adapter for inference
@@ -30,10 +30,10 @@ import torch
 from transformers import BertTokenizer
 from transformers.adapters import BertAdapterModel, AutoAdapterModel
 
-# Load pre-trained BERT tokenizer from Huggingface
+# Load pre-trained BERT tokenizer from HuggingFace
 tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
 
-# An input sentence.
+# An input sentence
 sentence = "It's also, clearly, great fun."
 
 # Tokenize the input sentence and create a PyTorch input tensor
@@ -50,19 +50,19 @@ In this case, for sentiment classification, we thus use [an adapter trained on t
 The task prediction head loaded together with the adapter gives us a class label for our sentence:
 
 ```python
-# load pre-trained task adapter from Adapter Hub
-# this method call will also load a pre-trained classification head for the adapter task
+# Load pre-trained task adapter from Adapter Hub
+# This method call will also load a pre-trained classification head for the adapter task
 # TODO: discuss: When looking for this adapter on the webiste the name is "sentiment/..." I think we should keep names 
 #  consistent because for new people (at least for me) more possibilities for loading the same thing is confusing
 adapter_name = model.load_adapter("sentiment/sst-2@ukp", config='pfeiffer')
 
-# activate the adapter we just loaded, so that it is used in every forward pass
+# Activate the adapter we just loaded, so that it is used in every forward pass
 model.set_active_adapters(adapter_name)
 
-# predict output tensor
+# Predict output tensor
 outputs = model(**input_data)
 
-# retrieve the predicted class label
+# Retrieve the predicted class label
 predicted = torch.argmax(outputs[0]).item()
 assert predicted == 1
 ```
@@ -70,16 +70,16 @@ assert predicted == 1
 To save our pre-trained model and adapters, we can easily store and reload them as follows:
 
 ```python
-# for the sake of this demonstration an example path for loading and storing is given below
+# For the sake of this demonstration an example path for loading and storing is given below
 example_path = os.path.join(os.getcwd(), "adapter-quickstart")
 
-# save model
+# Save model
 model.save_pretrained(example_path)
-# save adapter
+# Save adapter
 model.save_adapter(example_path, adapter_name)  # TODO: discuss: nobody knows where the 'sst-2' comes from
 
-# load model
-# similar to HuggingFace's AutoModel class, you can also use AutoAdapterModel instead of BertAdapterModel
+# Load model, similar to HuggingFace's AutoModel class, 
+# you can also use AutoAdapterModel instead of BertAdapterModel
 model = AutoAdapterModel.from_pretrained(example_path)
 model.load_adapter(example_path)
 ```
@@ -89,9 +89,9 @@ Similar to how the weights of the full model are saved, the `save_adapter()` wil
 Finally, if we have finished working with adapters, we can restore the base Transformer to its original form by deactivating and deleting the adapter:
 
 ```python
-# deactivate all adapters
+# Deactivate all adapters
 model.set_active_adapters(None)
-# delete the added adapter
+# Delete the added adapter
 model.delete_adapter(adapter_name)
 ```
 

From 04ec37c706a9e60748388f9b6c6066c172a6f12a Mon Sep 17 00:00:00 2001
From: TimoImhof <timo.imhof.uni@gmail.com>
Date: Fri, 31 Mar 2023 10:21:28 +0200
Subject: [PATCH 10/12] remove TODOs

---
 adapter_docs/quickstart.md | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/adapter_docs/quickstart.md b/adapter_docs/quickstart.md
index 9845ecba6..17f7f0b5c 100644
--- a/adapter_docs/quickstart.md
+++ b/adapter_docs/quickstart.md
@@ -52,8 +52,6 @@ The task prediction head loaded together with the adapter gives us a class label
 ```python
 # Load pre-trained task adapter from Adapter Hub
 # This method call will also load a pre-trained classification head for the adapter task
-# TODO: discuss: When looking for this adapter on the webiste the name is "sentiment/..." I think we should keep names 
-#  consistent because for new people (at least for me) more possibilities for loading the same thing is confusing
 adapter_name = model.load_adapter("sentiment/sst-2@ukp", config='pfeiffer')
 
 # Activate the adapter we just loaded, so that it is used in every forward pass
@@ -76,7 +74,7 @@ example_path = os.path.join(os.getcwd(), "adapter-quickstart")
 # Save model
 model.save_pretrained(example_path)
 # Save adapter
-model.save_adapter(example_path, adapter_name)  # TODO: discuss: nobody knows where the 'sst-2' comes from
+model.save_adapter(example_path, adapter_name)
 
 # Load model, similar to HuggingFace's AutoModel class, 
 # you can also use AutoAdapterModel instead of BertAdapterModel

From ca355ae787d94fbd7586f2ef7bd8cb154c16909d Mon Sep 17 00:00:00 2001
From: hSterz <hsterz16@gmail.com>
Date: Fri, 31 Mar 2023 15:57:23 +0200
Subject: [PATCH 11/12] Update adapter_docs/training.md

Co-authored-by: calpt <calpt@mail.de>
---
 adapter_docs/training.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/adapter_docs/training.md b/adapter_docs/training.md
index 9e13c86a8..a858e8d26 100644
--- a/adapter_docs/training.md
+++ b/adapter_docs/training.md
@@ -190,7 +190,7 @@ python run_fusion_glue.py \
 
 Similar to the `Trainer` class provided by HuggingFace, `adapter-transformers` provides an `AdapterTrainer` class. This class is only
 intended for training adapters. The `Trainer` class should still be used to fully fine-tune models. To train adapters with the `AdapterTrainer`
-class, simply initialize it the same way you would initialize the `Trainer` class,e.g.: 
+class, simply initialize it the same way you would initialize the `Trainer` class, e.g.: 
 
 ```python
 model.add_adapter(task_name)

From 10c6d4b7303dae0ab65d0016be32bf2da4633fa6 Mon Sep 17 00:00:00 2001
From: hSterz <hsterz16@gmail.com>
Date: Fri, 31 Mar 2023 15:57:30 +0200
Subject: [PATCH 12/12] Update adapter_docs/training.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: Leon Engländer <77012866+lenglaender@users.noreply.github.com>
---
 adapter_docs/training.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/adapter_docs/training.md b/adapter_docs/training.md
index a858e8d26..ce4c9aaa7 100644
--- a/adapter_docs/training.md
+++ b/adapter_docs/training.md
@@ -13,7 +13,7 @@ pip install -r ./examples/pytorch/<your_examples_folder>/requirements.txt
 
 ## Train a Task Adapter
 
-Training a task adapter module on a dataset only requires minor modifications from training a whole model.
+Training a task adapter module on a dataset only requires minor modifications compared to training the entire model.
 Suppose we have an existing script for training a Transformer model.
 In the following, we will use HuggingFace's [run_glue.py](https://github.com/Adapter-Hub/adapter-transformers/blob/master/examples/pytorch/text-classification/run_glue.py) example script for training on the GLUE benchmark.
 We go through all required changes step by step: