Create start.rst

mila-iqia · Jun 14, 2024 · 54d59a4 · 54d59a4
1 parent 4cdc2d3
commit 54d59a4
Showing 1 changed file with 269 additions and 0 deletions.
diff --git a/docs/start.rst b/docs/start.rst
@@ -0,0 +1,269 @@
+
+Getting Started
+===============
+
+Base Setup
+----------
+
+.. code-block:: bash
+  
+   salloc  -w "cn-d[003-004]" --ntasks=1 --gpus-per-task=a100l:8 --exclusive --nodes=1 --cpus-per-task=128 --time=120:00:00 --ntasks-per-node=1 --mem=0
+   cd /tmp/
+   mkdir milabench
+   cd milabench
+   git clone https://github.com/mila-iqia/milabench.git
+   conda activate base
+   python --version
+   Python 3.11.4
+   virtualenv ./env
+   source ./env/bin/activate
+   pip install -e milabench/
+   export MILABENCH_WORDIR="$(pwd)"
+   export MILABENCH_BASE="$MILABENCH_WORDIR /results"
+   export MILABENCH_CONFIG="$MILABENCH_WORDIR/milabench/config/standard.yaml"
+   export BENCHMARK_VENV="$MILABENCH_WORDIR/results/venv/torch"
+   module load cuda/12.3.2                                          # <= or set CUDA_HOME to the right spot
+   milabench install
+   milabench prepare
+   milabench run
+
+The current setup runs on 8xA100 SXM4 80Go.
+Note that some benchmarks do require more than 40Go of VRAM.
+One bench might be problematic; rwkv which requires nvcc but can be ignored.
+
+Recipes
+-------
+
+One Env
+^^^^^^^
+
+If your are using a container with dependencies such as pytorch already installed,
+you can force milabench to use a single environment for everything.
+
+.. code-block:: bash
+
+    milabench --use-current-env install
+    milabench --use-current-env prepare
+    milabench --use-current-env run --select  bert-fp32 
+
+Batch resizer
+^^^^^^^^^^^^^
+
+If the GPU you are using has lower VRAM automatic batch resizing could be enabled with the command below.
+Note that will not impact benchmarks that already use a batch of one, such as opt-6_7b and possibly opt-1_3b.
+
+.. code-block:: bash
+
+   MILABENCH_SIZER_AUTO=True milabench run
+
+Device Select
+^^^^^^^^^^^^^
+
+To run on a subset of GPUs (note that by default milabench will try to use all the GPUs all the time
+which might make a run take a bit longer, reducing the number of visible devices to 2 might make experimentation faster)
+
+.. code-block:: bash
+  
+   CUDA_VISIBLE_DEVICES=0,1,2,3 milabench run 
+
+Update Package
+^^^^^^^^^^^^^^
+
+To update pytorch to use a newer version of cuda (milabench creates a separate environment for benchmarks)
+
+.. code-block:: bash
+  
+   # can be executed after `milabench install` at the earliest
+   source $BENCHMARK_VENV/bin/activate
+   pip install -U torch torchvision torchaudio
+
+Arguments
+^^^^^^^^^
+
+If environment variables are troublesome, the values can also be passed as arguments.
+
+.. code-block:: bash
+   
+   milabench install --base $MILABENCH_BASE --config $MILABENCH_CONFIG
+   milabench prepare --base $MILABENCH_BASE --config $MILABENCH_CONFIG
+   milabench run --base $MILABENCH_BASE --config $MILABENCH_CONFIG
+
+To help us troubleshoot future issues, you can forward your result directory.
+It holds all the benchmark specific logs and metrics gathered by milabench.
+
+.. code-block:: bash
+
+  zip -r results.zip results
+
+Example Reports
+---------------
+
+8xA100 SXM4 80Go
+^^^^^^^^^^^^^^^^
+
+.. code-block:: bash
+  
+   milabench run 
+   =================
+   Benchmark results
+   =================
+   bench                          | fail | n |       perf |   sem% |   std% | peak_memory |      score | weight
+   bert-fp16                      |    0 | 8 |     154.92 |   0.3% |   4.5% |       28500 |    1240.06 |  0.00
+   bert-fp32                      |    0 | 8 |      29.55 |   0.0% |   0.5% |       35464 |     236.54 |  0.00
+   bert-tf32                      |    0 | 8 |     120.02 |   0.3% |   4.9% |       35466 |     960.04 |  0.00
+   bert-tf32-fp16                 |    0 | 8 |     154.87 |   0.3% |   4.5% |       28500 |    1239.70 |  3.00
+   bf16                           |    0 | 8 |     293.43 |   0.3% |   7.2% |        5688 |    2363.29 |  0.00
+   convnext_large-fp16            |    0 | 8 |     247.31 |   2.4% |  37.6% |       31362 |    1986.27 |  0.00
+   convnext_large-fp32            |    0 | 8 |      45.58 |   0.7% |  11.5% |       53482 |     360.90 |  0.00 ** High memory **
+   convnext_large-tf32            |    0 | 8 |     117.54 |   1.2% |  18.8% |       53482 |     940.03 |  0.00 ** High memory **
+   convnext_large-tf32-fp16       |    0 | 8 |     214.41 |   2.9% |  46.4% |       31362 |    1713.47 |  3.00
+   davit_large                    |    0 | 8 |     308.33 |   0.3% |   7.3% |       37900 |    2475.47 |  1.00
+   davit_large-multi              |    0 | 1 |    2242.69 |   2.0% |  15.2% |       45610 |    2242.69 |  5.00 ** High memory **
+   dlrm                           |    0 | 1 |  398088.30 |   2.5% |  19.3% |        7030 |  398088.30 |  1.00
+   focalnet                       |    0 | 8 |     391.21 |   0.3% |   6.8% |       29808 |    3143.46 |  2.00
+   fp16                           |    0 | 8 |     289.62 |   0.2% |   4.8% |        5688 |    2327.60 |  0.00
+   fp32                           |    0 | 8 |      19.13 |   0.0% |   1.3% |        6066 |     153.20 |  0.00
+   llama                          |    0 | 8 |     496.84 |   4.4% |  79.2% |       32326 |    3778.17 |  1.00
+   opt-1_3b                       |    0 | 1 |      28.23 |   0.1% |   0.4% |       45976 |      28.23 |  5.00 ** High memory **
+   opt-6_7b                       |    0 | 1 |      14.22 |   0.0% |   0.1% |       57196 |      14.22 |  5.00 ** High memory **
+   reformer                       |    0 | 8 |      61.45 |   0.0% |   1.0% |       29304 |     492.17 |  1.00
+   regnet_y_128gf                 |    0 | 8 |      82.25 |   0.3% |   6.8% |       35454 |     658.46 |  2.00
+   resnet152                      |    0 | 8 |     669.61 |   0.4% |   9.6% |       37878 |    5378.33 |  1.00
+   resnet152-multi                |    0 | 1 |    5279.39 |   1.2% |   9.2% |       42532 |    5279.39 |  5.00 ** High memory **
+   resnet50                       |    0 | 8 |     456.63 |   3.0% |  66.1% |        8630 |    3620.48 |  1.00
+   rwkv                           |    8 | 8 |        nan |   nan% |   nan% |        5458 |        nan |  1.00
+   stargan                        |    0 | 8 |      34.07 |   2.1% |  45.4% |       41326 |     271.44 |  1.00
+   super-slomo                    |    0 | 8 |      35.55 |   1.4% |  30.7% |       37700 |     285.19 |  1.00
+   t5                             |    0 | 8 |      47.77 |   0.2% |   4.0% |       39344 |     382.20 |  2.00
+   tf32                           |    0 | 8 |     147.05 |   0.2% |   4.9% |        6066 |    1181.93 |  0.00
+   whisper                        |    0 | 8 |     145.26 |   2.2% |  48.3% |       40624 |    1160.69 |  1.00
+    
+    Scores
+    ------
+    Failure rate:       4.06% (FAIL)
+    Score:             567.57
+    
+    Errors
+    ------
+    8 errors, details in HTML report
+
+4xA100 SXM4 80Go
+^^^^^^^^^^^^^^^^
+
+.. code-block:: bash
+  
+    CUDA_VISIBLE_DEVICES=0,1,2,3 milabench run 
+    =================
+    Benchmark results
+    =================
+    bench                          | fail | n |       perf |   sem% |   std% | peak_memory |      score | weight
+    bert-fp16                      |    0 | 4 |     154.86 |   0.4% |   4.5% |       28500 |     619.75 |  0.00
+    bert-fp32                      |    0 | 4 |      29.58 |   0.0% |   0.5% |       35464 |     118.38 |  0.00
+    bert-tf32                      |    0 | 4 |     119.99 |   0.4% |   4.4% |       35466 |     480.05 |  0.00
+    bert-tf32-fp16                 |    0 | 4 |     155.04 |   0.4% |   4.6% |       28500 |     620.50 |  3.00
+    bf16                           |    0 | 4 |     293.40 |   0.3% |   6.6% |        5688 |    1180.12 |  0.00
+    convnext_large-fp16            |    0 | 4 |     265.18 |   2.8% |  30.6% |       31362 |    1065.59 |  0.00
+    convnext_large-fp32            |    0 | 4 |      46.35 |   1.3% |  14.2% |       53482 |     182.25 |  0.00  ** High memory **
+    convnext_large-tf32            |    0 | 4 |     122.58 |   1.4% |  15.9% |       53482 |     490.00 |  0.00  ** High memory **
+    convnext_large-tf32-fp16       |    0 | 4 |     295.47 |   2.1% |  22.8% |       31362 |    1191.62 |  3.00
+    davit_large                    |    0 | 4 |     310.47 |   0.4% |   6.5% |       38144 |    1247.04 |  1.00
+    davit_large-multi              |    0 | 1 |    1183.76 |   1.1% |   8.2% |       45336 |    1183.76 |  5.00 ** High memory **
+    dlrm                           |    0 | 1 |  430871.61 |   2.6% |  20.2% |        7758 |  430871.61 |  1.00
+    focalnet                       |    0 | 4 |     391.96 |   0.4% |   6.4% |       29812 |    1575.26 |  2.00
+    fp16                           |    0 | 4 |     289.99 |   0.2% |   4.1% |        5688 |    1164.13 |  0.00
+    fp32                           |    0 | 4 |      19.13 |   0.0% |   0.9% |        6066 |      76.58 |  0.00
+    llama                          |    0 | 4 |     492.72 |   6.2% |  78.3% |       32326 |    1884.58 |  1.00
+    opt-1_3b                       |    0 | 1 |      14.45 |   0.0% |   0.2% |       46016 |      14.45 |  5.00 ** High memory **
+    opt-6_7b                       |    0 | 1 |       5.96 |   0.0% |   0.1% |       75444 |       5.96 |  5.00 ** High memory **
+    reformer                       |    0 | 4 |      61.39 |   0.1% |   1.0% |       29304 |     245.83 |  1.00
+    regnet_y_128gf                 |    0 | 4 |      82.67 |   0.3% |   5.1% |       35454 |     330.98 |  2.00
+    resnet152                      |    0 | 4 |     672.09 |   0.4% |   6.9% |       39330 |    2694.83 |  1.00
+    resnet152-multi                |    0 | 1 |    2470.38 |   1.5% |  11.2% |       47288 |    2470.38 |  5.00 ** High memory **
+    resnet50                       |    0 | 4 |     454.49 |   3.2% |  50.5% |        8630 |    1800.61 |  1.00
+    rwkv                           |    4 | 4 |        nan |   nan% |   nan% |        5458 |        nan |  1.00
+    stargan                        |    0 | 4 |      42.30 |   1.9% |  29.9% |       53412 |     169.73 |  1.00 ** High memory **
+    super-slomo                    |    0 | 4 |      40.67 |   0.8% |  13.1% |       37700 |     163.08 |  1.00
+    t5                             |    0 | 4 |      47.74 |   0.3% |   3.9% |       39344 |     190.95 |  2.00
+    tf32                           |    0 | 4 |     146.72 |   0.2% |   4.0% |        6066 |     588.99 |  0.00
+    whisper                        |    0 | 4 |     207.47 |   1.0% |  15.4% |       40624 |     832.75 |  1.00
+    
+    Scores
+    ------
+    Failure rate:       3.96% (FAIL)
+    Score:             300.23
+
+4xA100 SXM4 80Go limited to 40Go of VRAM
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+
+.. code-block:: bash
+  
+   CUDA_VISIBLE_DEVICES=0,1,2,3 MILABENCH_SIZER_AUTO=True MILABENCH_SIZER_CAPACITY=40000Mo milabench run
+    =================
+    Benchmark results
+    =================
+                             fail n       perf   sem%   std% peak_memory          score weight
+    bert-fp16                   0 4     147.52   0.2%   1.9%       41938     588.500016   0.00
+    bert-fp32                   0 4      29.08   0.9%  10.3%       42138     116.083048   0.00
+    bert-tf32                   0 4     117.82   0.1%   1.0%       42140     470.743584   0.00
+    bert-tf32-fp16              0 4     147.67   0.2%   2.4%       41938     588.804052   3.00
+    bf16                        0 4     293.92   0.3%   6.0%        5688    1181.627938   0.00
+    convnext_large-fp16         0 4     269.92   2.9%  32.5%       42628    1085.129084   0.00
+    convnext_large-fp32         0 4      50.31   0.7%   7.8%       42136     199.292499   0.00
+    convnext_large-tf32         0 4     136.86   0.5%   5.0%       42138     549.100135   0.00
+    convnext_large-tf32-fp16    0 4     266.48   3.1%  33.8%       42628    1071.146282   3.00
+    davit_large                 0 4     300.29   0.5%   7.7%       41728    1203.538777   1.00
+    davit_large-multi           0 1    1171.04   1.2%   9.3%       50030    1171.042025   5.00
+    dlrm                        0 1  454625.69   2.1%  16.4%        7758  454625.687871   1.00
+    focalnet                    0 4     391.81   0.3%   5.1%       41802    1569.986673   2.00
+    fp16                        0 4     289.96   0.2%   3.9%        5688    1163.810339   0.00
+    fp32                        0 4      19.14   0.0%   0.8%        6066      76.603551   0.00
+    llama                       0 4     493.43   6.1%  78.2%       32326    1888.979344   1.00
+    opt-1_3b                    0 1      14.52   0.1%   0.3%       45930      14.518303   5.00
+    opt-6_7b                    0 1       5.96   0.0%   0.1%       75444       5.955118   5.00 ** High memory **
+    reformer                    0 4      46.27   0.0%   0.3%       41986     185.104527   1.00
+    regnet_y_128gf              0 4     105.08   0.7%  10.8%       42318     421.706539   2.00
+    resnet152                   0 4     674.90   0.5%   7.3%       43688    2706.277411   1.00
+    resnet152-multi             0 1    2350.25   2.2%  16.9%       52338    2350.245540   5.00
+    resnet50                    0 4     420.09   5.8%  91.1%       42262    1653.944065   1.00
+    rwkv                        4 4        NaN    NaN    NaN        5458            NaN   1.00
+    stargan                     0 4      36.75   1.3%  20.5%       32310     147.651415   1.00
+    super-slomo                 0 4      41.87   0.8%  12.0%       41986     167.928514   1.00
+    t5                          0 4      49.55   0.3%   4.5%       41444     198.383370   2.00
+    tf32                        0 4     146.74   0.2%   3.8%        6066     588.944520   0.00
+    whisper                     0 4     209.19   0.7%  10.5%       42242     838.753126   1.00
+    
+    Scores
+    ------
+    Failure rate:       4.00% (FAIL)
+    Score:             444.18
+    
+    Errors
+    ------
+    4 errors, details in HTML report.
+
+
+Issues
+------
+.. code-block:: txt
+  
+    > Traceback (most recent call last):
+    >   File "/gpfs/home3/pmorillas/mila/milabench/milabench/utils.py", line 69, in wrapped
+    > 	return fn(*args, **kwargs)
+    >   File "/gpfs/home3/pmorillas/mila/milabench/milabench/summary.py", line 50, in aggregate
+    > 	assert config and start and end
+    > AssertionError
+    > Source: mila_installation/runs/
+
+This indicates that the configuration might be missing or invalid.
+It can happen when generating a report from an incomplete run as either the first metric entry (config) or the last config entry (end)
+might be missing. It can be the symptom of another problem that caused benchmarks to fail to run successfully.
+
+.. code-block:: txt
+
+    >   File "/gpfs/home3/pmorillas/mila2/milabench/milabench/cli/run.py", line 82, in cli_run
+    >     arch = next(iter(mp.packs.values())).config["system"]["arch"]
+    >            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+    > StopIteration
+
+This indicates no bench were found to run; either the configuration was invalid or the `--select` filtered out all benchmarks.