Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyTorch Profiler — PyTorch Tutorials 2.2.1+cu121 documentation #690

Open
1 task
irthomasthomas opened this issue Mar 5, 2024 · 1 comment
Open
1 task
Labels
Algorithms Sorting, Learning or Classifying. All algorithms go here. code-generation code generation models and tools like copilot and aider data-validation Validating data structures and formats Software2.0 Software development driven by AI and neural networks.

Comments

@irthomasthomas
Copy link
Owner

PyTorch Profiler — PyTorch Tutorials 2.2.1+cu121 documentation

DESCRIPTION:
PyTorch Profiler

This recipe explains how to use PyTorch profiler and measure the time and memory consumption of the model’s operators.

Introduction

PyTorch includes a simple profiler API that is useful when user needs to determine the most expensive operators in the model.

In this recipe, we will use a simple Resnet model to demonstrate how to use profiler to analyze model performance.

Setup

To install torch and torchvision use the following command:

pip install torch torchvision

Steps

  1. Import all necessary libraries

In this recipe we will use torch, torchvision.models and profiler modules:

import torch
import torchvision.models as models
from torch.profiler import profile, record_function, ProfilerActivity
  1. Instantiate a simple Resnet model

Let’s create an instance of a Resnet model and prepare an input for it:

model = models.resnet18()
inputs = torch.randn(5, 3, 224, 224)
  1. Using profiler to analyze execution time

PyTorch profiler is enabled through the context manager and accepts a number of parameters, some of the most useful are:

  • activities - a list of activities to profile:
    • ProfilerActivity.CPU - PyTorch operators, TorchScript functions and user-defined code labels (see record_function below);
    • ProfilerActivity.CUDA - on-device CUDA kernels;
  • record_shapes - whether to record shapes of the operator inputs;
  • profile_memory - whether to report amount of memory consumed by model’s Tensors;
  • use_cuda - whether to measure execution time of CUDA kernels.

Note: when using CUDA, profiler also shows the runtime CUDA events occurring on the host.

Let’s see how we can use profiler to analyze the execution time:

with profile(activities=[ProfilerActivity.CPU], record_shapes=True) as prof:
    with record_function("model_inference"):
        model(inputs)

Note that we can use record_function context manager to label arbitrary code ranges with user provided names (model_inference is used as a label in the example above).

Profiler allows one to check which operators were called during the execution of a code range wrapped with a profiler context manager. If multiple profiler ranges are active at the same time (e.g. in parallel PyTorch threads), each profiling context manager tracks only the operators of its corresponding range. Profiler also automatically profiles the asynchronous tasks launched with torch.jit._fork and (in case of a backward pass) the backward pass operators launched with backward() call.

Let’s print out the stats for the execution above:

print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=10))

The output will look like (omitting some columns):

# ---------------------------------  ------------  ------------  ------------  ------------
#                              Name      Self CPU     CPU total  CPU time avg    # of Calls
# ---------------------------------  ------------  ------------  ------------  ------------
#                   model_inference       5.509ms      57.503ms      57.503ms             1
#                      aten::conv2d     231.000us      31.931ms       1.597ms            20
#                 aten::convolution     250.000us      31.700ms       1.585ms            20
#                aten::_convolution     336.000us      31.450ms       1.573ms            20
#          aten::mkldnn_convolution      30.838ms      31.114ms       1.556ms            20
#                  aten::batch_norm     211.000us      14.693ms     734.650us            20
#      aten::_batch_norm_impl_index     319.000us      14.482ms     724.100us            20
#           aten::native_batch_norm       9.229ms      14.109ms     705.450us            20
#                        aten::mean     332.000us       2.631ms     125.286us            21
#                      aten::select       1.668ms       2.292ms       8.988us           255
# ---------------------------------  ------------  ------------  ------------  ------------
# Self CPU time total: 57.549m

Here we see that, as expected, most of the time is spent in convolution (and specifically in mkldnn_convolution for PyTorch compiled with MKL-DNN support). Note the difference between self cpu time and cpu time - operators can call other operators, self cpu time excludes time spent in children operator calls, while total cpu time includes it. You can choose to sort by the self cpu time by passing sort_by="self_cpu_time_total" into the table call.

To get a finer granularity of results and include operator input shapes, pass group_by_input_shape=True (note: this requires running the profiler with record_shapes=True):

print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_time_total", row_limit=10))

URL: PyTorch Profiler Recipe

Suggested labels

@irthomasthomas irthomasthomas added Algorithms Sorting, Learning or Classifying. All algorithms go here. code-generation code generation models and tools like copilot and aider data-validation Validating data structures and formats Software2.0 Software development driven by AI and neural networks. labels Mar 5, 2024
@irthomasthomas
Copy link
Owner Author

irthomasthomas commented Mar 5, 2024

Related content

#690 - Similarity score: 1.0

#649 - Similarity score: 0.9

#498 - Similarity score: 0.88

#324 - Similarity score: 0.88

#625 - Similarity score: 0.88

#499 - Similarity score: 0.88

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algorithms Sorting, Learning or Classifying. All algorithms go here. code-generation code generation models and tools like copilot and aider data-validation Validating data structures and formats Software2.0 Software development driven by AI and neural networks.
Projects
None yet
Development

No branches or pull requests

1 participant