Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs Update #73

Merged
merged 1 commit into from
Dec 3, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ There are two ways to use it: Automatic mode and configurable mode.

## Example: Amazon SageMaker Zero-Code-Change
This example uses a zero-script-change experience, where you can use your training script as-is.
See the [example notebooks](https://link.com) for more details.
See the [example notebooks](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-debugger) for more details.
```python
import sagemaker
from sagemaker.debugger import rule_configs, Rule, CollectionConfig
Expand Down Expand Up @@ -88,7 +88,7 @@ print(f"Loss values were {trial.tensor('CrossEntropyLoss:0')}")
Amazon SageMaker Debugger uses a `hook` to store the values of tensors throughout the training process. Another process called a `rule` job
simultaneously monitors and validates these outputs to ensure that training is progressing as expected.
A rule might check for vanishing gradients, or exploding tensor values, or poor weight initialization.
If a rule is triggered, it will raise a CloudWatch event and stop the training job, saving you time
If a rule is triggered, it will raise a CloudWatch event, saving you time
and money.

Amazon SageMaker Debugger can be used inside or outside of SageMaker. There are three main use cases:
Expand All @@ -99,9 +99,9 @@ Amazon SageMaker Debugger can be used inside or outside of SageMaker. There are
The reason for different setups is that SageMaker Zero-Script-Change (via Deep Learning Containers) uses custom framework forks of TensorFlow, PyTorch, MXNet, and XGBoost to save tensors automatically.
These framework forks are not available in custom containers or non-SM environments, so you must modify your training script in these environments.

See the [SageMaker page](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/sagemaker.md) for details on SageMaker Zero-Code-Change and BYOC experience.\
See the [SageMaker page](docs/sagemaker.md) for details on SageMaker Zero-Code-Change and BYOC experience.\
See the frameworks pages for details on modifying the training script:
- [TensorFlow](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/tensorflow.md)
- [PyTorch](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/pytorch.md)
- [MXNet](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/mxnet.md)
- [XGBoost](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/xgboost.md)
- [TensorFlow](docs/tensorflow.md)
- [PyTorch](docs/pytorch.md)
- [MXNet](docs/mxnet.md)
- [XGBoost](docs/xgboost.md)
18 changes: 8 additions & 10 deletions docs/api.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@

# Common API
These objects exist across all frameworks.
- [SageMaker Zero-Code-Change vs. Python API](#sagemaker)
- [Creating a Hook](#creating-a-hook)
- [Hook from SageMaker](#hook-from-sagemaker)
- [Hook from Python](#hook-from-python)
Expand All @@ -14,8 +13,7 @@ These objects exist across all frameworks.

The imports assume `import smdebug.{tensorflow,pytorch,mxnet,xgboost} as smd`.

**Hook**: The main interface to use training. This object can be passed as a model hook/callback
in Tensorflow and Keras. It keeps track of collections and writes output files at each step.
**Hook**: The main class to pass as a callback object, or to create callback functions. It keeps track of collections and writes output files at each step.
- `hook = smd.Hook(out_dir="/tmp/mnist_job")`

**Mode**: One of "train", "eval", "predict", or "global". Helpful for segmenting data based on the phase
Expand All @@ -32,10 +30,10 @@ tensors to include/exclude.
**ReductionConfig**: Allows you to save a reduction, such as 'mean' or 'l1 norm', instead of the full tensor.
- `reduction_config = smd.ReductionConfig(reductions=['min', 'max', 'mean'], norms=['l1'])`

**Trial**: The main interface to use when analyzing a completed training job. Access collections and tensors. See [trials documentation](https://link.com).
**Trial**: The main interface to use when analyzing a completed training job. Access collections and tensors. See [trials documentation](analysis.md).
- `trial = smd.create_trial(out_dir="/tmp/mnist_job")`

**Rule**: A condition that will trigger an exception and terminate the training job early, for example a vanishing gradient. See [rules documentation](https://link.com).
**Rule**: A condition that will trigger an exception, for example a vanishing gradient. See [rules documentation](analysis.md).


---
Expand All @@ -44,7 +42,7 @@ tensors to include/exclude.

### Hook from SageMaker
If you create a SageMaker job and specify the hook configuration in the SageMaker Estimator API
as described in [AWS Docs](https://link.com),
as described in [AWS Docs](https://docs.aws.amazon.com/sagemaker/latest/dg/train-model.html),
the a JSON file will be automatically written. You can create a hook from this file by calling
```python
hook = smd.{hook_class}.create_from_json_file()
Expand All @@ -53,10 +51,10 @@ with no arguments and then use the hook Python API in your script. `hook_class`

### Hook from Python
See the framework-specific pages for more details.
* [TensorFlow](https://link.com)
* [PyTorch](https://link.com)
* [MXNet](https://link.com)
* [XGBoost](https://link.com)
* [TensorFlow](tensorflow.md)
* [PyTorch](pytorch.md)
* [MXNet](mxnet.md)
* [XGBoost](xgboost.md)

---

Expand Down
2 changes: 1 addition & 1 deletion docs/mxnet.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# MXNet

SageMaker Zero-Code-Change supported container: MXNet 1.6. See [AWS Docs](https://link.com) for more information.\
SageMaker Zero-Code-Change supported container: MXNet 1.6. See [AWS Docs](https://docs.aws.amazon.com/sagemaker/latest/dg/train-model.html) for more information.\
Python API supported versions: MXNet 1.4, 1.5, 1.6.

## Contents
Expand Down
6 changes: 3 additions & 3 deletions docs/pytorch.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# PyTorch

SageMaker Zero-Code-Change supported containers: PyTorch 1.3. See [AWS Docs](https://link.com) for more information.\
SageMaker Zero-Code-Change supported containers: PyTorch 1.3. See [AWS Docs](https://docs.aws.amazon.com/sagemaker/latest/dg/train-model.html) for more information.\
Python API supported versions: 1.2, 1.3.

## Contents
Expand Down Expand Up @@ -71,8 +71,8 @@ for (inputs, labels) in trainloader:
```

## Full API
See the [Common API](https://link.com) page for details about Collection, SaveConfig, and ReductionConfig.\
See the [Analysis](https://link.com) page for details about analyzing a training job.
See the [Common API](api.md) page for details about Collection, SaveConfig, and ReductionConfig.\
See the [Analysis](analysis.md) page for details about analyzing a training job.

## Hook
```python
Expand Down
9 changes: 5 additions & 4 deletions docs/sagemaker.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,10 @@ These framework forks are not available in custom containers or non-SM environme

This configuration is used for both ZCC and BYOC. The only difference is that with a custom container, you modify your training script as well. See the framework pages below for details on how to modify your training script.

- [TensorFlow](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/tensorflow.md)
- [PyTorch](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/pytorch.md)
- [MXNet](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/mxnet.md)
- [XGBoost](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/xgboost.md)
- [TensorFlow](tensorflow.md)
- [PyTorch](pytorch.md)
- [MXNet](mxnet.md)
- [XGBoost](xgboost.md)

```python
rule = sagemaker.debugger.Rule.sagemaker(
Expand Down Expand Up @@ -110,6 +110,7 @@ sagemaker_simple_estimator.fit()

## List of Builtin Rules
Full list of rules is:

| Rule Name | Behavior |
rahul003 marked this conversation as resolved.
Show resolved Hide resolved
|---|---|
| `vanishing_gradient` | Detects a vanishing gradient. |
Expand Down