-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create template classes for model representations to support multiple data types (Part of #196) #198
Conversation
ae0fd7e
to
ab00d7b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one grammatical suggestion. Otherwise, everything looks great!
There's a few things being passed as pointers. Would it make sense to pass them by reference instead? Or would there be issues with that? |
Cpplint and Google C++ style guide warn against using non-const reference parameters in functions. |
Interesting, I'm curious why Google thinks a potentially null pointer is safer than a reference 😄 |
@jakirkham The idea is to explicitly mark "output" parameters. |
include/treelite/typeinfo.h
Outdated
return "float64"; | ||
default: | ||
throw std::runtime_error("Unrecognized type"); | ||
return ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is return
required after throw
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without it, some old compilers like GCC 5 complains about a missing return value.
include/treelite/typeinfo.h
Outdated
* \return Whatever that's returned by the dispatcher | ||
*/ | ||
template <template<class> class Dispatcher, typename ...Args> | ||
inline auto DispatchWithTypeInfo(TypeInfo type, Args&& ...args) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to understand what the intended usage of DispatchWithTypeInfo()
/ DispatchWithModelTypes()
is. Could you add an example (or reference a place in code) demonstrating it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the idea. Let's say we are given a TypeInfo
variable type
at runtime. We want to call Dispatcher<T>::Dispatch()
method where the template arg T
corresponds to type
, and Dispatcher<T>
is a class that contains a custom function that we'd like to run.
For a concrete example, see https://github.com/dmlc/treelite/pull/198/files#diff-409f38fcc1a201a94e70922bbf7c44a6R777-R788 and https://github.com/dmlc/treelite/pull/196/files#diff-65416f46630c396d5db3df0ac92a0a02R130-R148.
include/treelite/typeinfo.h
Outdated
} else if (std::is_same<T, double>::value) { | ||
return TypeInfo::kFloat64; | ||
} else { | ||
throw std::runtime_error(std::string("Unrecognized Value type") + typeid(T).name()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is runtime type information already used in treelite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean the built-in RTTI? Yes, since we use dynamic_cast<>
to downcast Model*
into ModelImpl<>*
.
Co-authored-by: William Hicks <[email protected]>
e42c3c4
to
09e908b
Compare
@canonizer Can I get another round of review for this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, provided that the remaining comments are addressed.
Model(const Model&) = delete; | ||
Model& operator=(const Model&) = delete; | ||
Model(Model&&) = default; | ||
Model& operator=(Model&&) = default; | ||
|
||
template <typename ThresholdType, typename LeafOutputType> | ||
inline static std::unique_ptr<Model> Create(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider moving this function into ModelImpl
, as it depends on ThresholdType
and LeafOutputType
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is actually not possible to move the function, since it needs access the base class's members threshold_type_
and leaf_output_type_
. The members were moved to the base class per your previous suggestion.
return leaf_output_type_; | ||
} | ||
template <typename Func> | ||
inline auto Dispatch(Func func); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these Dispatch()
functions used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's used in the C code generator (excluded from this PR). See #196 to search for actual uses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1e1c6e0
to
1306930
Compare
… data types (Part of #196) (#198) * The ModelImpl class (created in #201) becomes the template class ModelImpl<ThresholdType, LeafOutputType>. * Implement template classes ModelImpl<ThresholdType, LeafOutputType> and TreeImpl<ThresholdType, LeafOutputType> that contain the details of the tree ensemble model. The template classes are parameterized by the types of the thresholds and leaf outputs. Currently the following combinations are allowed: | Threshold type | Leaf output type | |----------------|------------------| | float32 | float32 | | float32 | uint32 | | float64 | float64 | | float64 | uint32 | |----------------|------------------| * Revise the zero-copy serialization protocol, to prepend the type information (threshold_type, leaf_output_type) to the serialized types so that the recipient will choose the correct ModelImpl<ThresholdType, LeafOutputType> to deserialize to. * A run-time type dispatching system using the enum type TypeInfo. Users are able to dispatch a correct version of ModelImpl<ThresholdType, LeafOutputType> by specifying a pair of TypeInfo values. We also implement a set of convenient functions, such as InferTypeInfoOf<T> that converts the template arg T into TypeInfo enum. Co-authored-by: William Hicks <[email protected]> Co-authored-by: Andy Adinets <[email protected]>
Addresses #95 and #111. Follow-up to #198, #199, #201 Trying again, since #130 failed. This time, I made the Model class to be polymorphic. This way, the amount of pointer indirection is minimized. Summary: Model is an opaque container that wraps the polymorphic handle ModelImpl<ThresholdType, LeafOutputType>. The handle in turn stores the list of trees Tree<ThresholdType, LeafOutputType>. To unbox the Model container and obtain ModelImpl<ThresholdType, LeafOutputType>, use Model::Dispatch(<lambda expression>). Also, upgrade to C++14 to access the generic lambda feature, which proved to be very useful in the dispatching logic for the polymorphic Model class. * Turn the Model and Tree classes into template classes * Revise the string templates so that correct data types are used in the generated C code * Rewrite the model builder class * Revise the zero-copy serializer * Create an abstract matrix class that supports multiple data types (float32, float64 for now). * Move the DMatrix class to the runtime. * Extend the DMatrix class so that it can hold float32 and float64. * Redesign the C runtime API using the DMatrix class. * Ensure accuracy of scikit-learn models. To achieve the best results, use float32 for the input matrix and float64 for the split thresholds and leaf outputs. * Revise the JVM runtime.
Extracted from #196 for easier reviewing.
(The CI tests in this PR won't pass due to missing parts. See #196 to look up the test results.)
ModelImpl
class (created in Refactor struct Model -> class Model + class ModelImpl (Part of #196) #201) becomes the template classModelImpl<ThresholdType, LeafOutputType>
.ModelImpl<ThresholdType, LeafOutputType>
andTreeImpl<ThresholdType, LeafOutputType>
that contain the details of the tree ensemble model. The template classes are parameterized by the types of the thresholds and leaf outputs. Currently the following combinations are allowed:Revise the zero-copy serialization protocol, to prepend the type information (threshold_type, leaf_output_type) to the serialized types so that the recipient will choose the correct
ModelImpl<ThresholdType, LeafOutputType>
to deserialize to.A run-time type dispatching system using the enum type
TypeInfo
. Users are able to dispatch a correct version ofModelImpl<ThresholdType, LeafOutputType>
by specifying a pair ofTypeInfo
values. We also implement a set of convenient functions, such asInferTypeInfoOf<T>
that converts the template argT
intoTypeInfo
enum.@canonizer @levsnv @jakirkham @wphicks Can you review this PR?