-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Structured Error Handling Mechanism #2279
Comments
There is a trade-off between a very detailed error messaging system vs the ease of programming. I agree that we should build a reasonable hierarchy in the python side of the error message. But sometimes adding one error type per function can be overkill In the C++ side, one of the easy ways to solve this problem is to have a specific error message encoding convention by string and allows re-dispatch. Either way, it is good to have a concrete proposal on the proposed error hierarchy and we should do it incrementally. |
Thanks @JennyChou5 for the proposal. I think it is a good step of moving TVM to the product line. What we should do is to define unified error types and encourage contributors to refer to as needed. We also should reasonably categorize the error types so that most of the functions can throw error in known types. In terms of the layers proposed above (model parsing layer, NNVM/Relay pass layer, TVM compute layer, TVM schedule layer), I think we should add TVM runtime layer. Also, I am not sure if TVM compute and schedule should be divided, perhaps combining them and adding codegen layer instead? |
Error Type Categories As proposed, we separate TVM into 4 major layers: framework parsing layer (parse TF, MX, PyTorch, ONNX models into NNVM or relay), nnvm/replay pass layer (infer shape and type, graph-level optimization, e.g. op fusion, precompute), TVM compute/schedule/codegen layer, TVM runtime layer. Layer 1: framework parsing Layer 2: nnvm/relay pass Layer 3: TVM compute/schedule/codegen Layer 4: TVM runtime Coding Format As observed that, some of the functionality APIs in TVM are for conversion between Python and C++. However, we prefer using a unified-formatted code which will be easier to change and maintain over time. We prefer having python API as base and call python functions from C++. For calling python functions from C++, we would like to use methods provided by the C-API (http://docs.python.org/c-api/). This is especially the case for libraries that support so-called “callback” functions. The Python program must pass the Python function object, PyObject. By using PyObject as the basic object for the C-API, we would be able to represent any kind of python basic types (string, float, int,...). Clean Place for Error Handling APIs We would like to create a new folder, names err_handling, under python/tvm. Inside this folder, there would be four files (error types in same layer will be placed in one file): FrameworkParsing_Err.py, Pass_Err.py, CompSche_Err.py, runtune_Err.py. Example 1: inside FrameworkParsing_Err.py
Therefore, in order to use framework parsing error APIs, we could import FrameworkParsing_Err in the beginning of the mxnet.py, tensorflow.py, etc under frontend folder and use the defined error type function for throwing the error messages. Example 2: inside Pass_Err.py
|
Thanks for the RFC. Sorry I was a bit late in recommend due to my latest travels. Here are some comments to the proposal
Perhaps I am being pedantic, but I really want us to build something that is minimum and elegant. Adding layers of abstraction always can solve problems in the short term, but could bring longer-term technical debt, as developers need to spend longer time understand how to use things. Starting simple, and clearly, document the error class and string formatting convention that a developer can copy paste into the codebase would be very useful for such consolidation. |
Would also like to see comments from other community folks @dmlc/tvm-committer |
I created #2838 to provide some minimum scaffolds on the topic. @jroesch and @wweic raise the question about whether should we provide a more attribute aware error constructor. def clear_throw():
raise OpNotImplemented("Operator relu is not implemented in the MXNet fronend")
def _op_not_implemented(op_name):
return OpNotImpelemented("Operator {} is not implemented.").format(op_name)
def wrapper_based():
raise _op_not_implemented("relu") As demonstrated by the above examples. I personally prefer clear_throw style.
|
I agree clear_throw approached is easy to use for developers. However, the detailed error message might not be unified and can be generated error by typo. Vote for the second approach:
This way we can unify the error message and it could be easy for developers/users to follow. Of note, it would be good to make sure the API name is clear and simple for developers to understand (_op_not_implemented means operator not implemented) |
I would like to point out that adopting clear_throw style does not prevent us to get a unified error message. We can encourage the unified format by introducing a standard "error message convention" (see https://github.com/dmlc/tvm/pull/2838/files#diff-c89317db650ce6bced36dadc47755c5fR51) developers encouraged to copy and then edit. Surprisingly, we can go quite far by using copy-editing for such kind of one-liner message and keep things consistent (this is not that different from writing similar docstrings). Secondly, there is always a need for refining the error message. For example, in the case of clear_throw, it says the operator is not supported "in the MXNet frontend". If we want to have a common function (_op_not_implemented):
The plus side is that the code is easier to maintain:
As a middle ground, I think I am fine introducing _op_not_implemented as long as it is in the same file. So it is still very easy to find and allows specific error message crafting to some extent |
Close as the first part scaffolding is in. Let us open new RFCs for new error class proposals |
Problem Statement
TVM is gaining users in the research community and in production deployments. Simplifying the user experience can drive faster adoption, particularly among developers and users unfamiliar with the stack. A critical aspect of UX is better error handling. A structured error handling mechanism would help users of the TVM stack reduce the time to debug and deliver long-term dividends in usability. A structured mechanism must separate error-handling code from regular code and provide the ability to propagate errors up the call stack. Although TVM stack uses built-in python/C++ exception handling, the error mechanism does not provide the ability to group and differentiate error types.
Current TVM Error Handling
The TVM stack currently throws built-in python/C++ exceptions as error messages. Parts of the code throw errors at the condition detecting site collecting information relevant to the error while parts of the code throw errors that are then passed implicitly to “exception” handlers, using the now classic try exception idiom. This exception class is an easy-to-use feature in modern object-oriented programming languages. However, the scheme is insufficient for production use when TVM is incorporated into a larger stack or service.
Proposed Error Handling Mechanism
We proposed a structured error-handling mechanism characterized by the following features:
o Represents all errors as functions or objects
o Allows you to define your own error types or classes
o Allows you to explicitly raise an error
o Allows you to catch particular error types in a particular layer
o Allow you to propagate error from the inner layer to the outer layer
First, we separate TVM into 4 major layers: model parsing layer (parse TF, MX, PyTorch, ONNX models into NNVM graph), NNVM pass layer (infer shape and type, graph-level optimization, e.g. op fusion, precompute), TVM compute layer, TVM schedule layer. For each layer, we catalog errors into pre-defined object and separate them in one util folder. All files will import from that file for using the error-handling related classes.
Example of the Proposed Error Handling Mechanism
For example, in the model parsing layer as parsing different frameworks into NNVM graphs, we can define some error-handling functions in one class. Here, we first categorize errors into four types in this layer: OperatorNotImplemented, OperatorAttributeNotImplemented, OperatorAttributeRequired, OperatorAttributeValueNotValid and provide three definitions for that (see functions: _required_attr, _raise_attr_value_error, _raise_not_supported). For all frame parsing files, such as mxnet.py, tenflow.py, onnx.py, they could import the corresponding error-handling class/functions and utilize them inside it. Of note, the following function are from mxnet.py.
In such case, we provide the unified way to categorize the error types in this layer. Users can add different error types easily and provide the ability to propagate error reporting up the call stack.
Risk and Limitation
We have to define the error-handling templates first in each layer and ask TVM’s contributors to follow. In addition, for the existing codebase, moving to structured error- handling scheme will need some effort.
The text was updated successfully, but these errors were encountered: