-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add example which implements YOLO object detection #576
Conversation
I have completed the implementation of this example and it is ready for review. Here are the results for the default image (dog.jpg):
I am using ggml to compute the output of all layers except the YOLO layers (you can find the model architecture in yolov3-tiny.cfg). The output of the YOLO layers is computed with the As you can see yolov3-tiny is quite fast but not very accurate. However, the same approach can be applied to infer more sophisticated YOLO models like v4, v5 and v7. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool 😄
Would be nice to add a test in ci/run.sh
result = ggml_sub(ctx, result, ggml_repeat(ctx, layer.rolling_mean, result)); | ||
result = ggml_div(ctx, result, ggml_sqrt(ctx, ggml_repeat(ctx, layer.rolling_variance, result))); | ||
result = ggml_mul(ctx, result, ggml_repeat(ctx, layer.scales, result)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These ggml_repeat
should be avoidable via implicit broadcast. ggml_mul
already supports broadcast - might be a good idea to add for ggml_sub
and ggml_div
in a similar way. For now, we can implement it just on the CPU and GGML_ASSERT
on the GPU backends when broadcast is necessary but not implemented yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ggml_mul
has partial broadcast support, it expects equal number of elements in the first dimension which is not the case here; I may try to address this in a follow-up patch
I have addressed the comments and added a CI test. I also realized that I don't need to create a second computation graph and the code now runs much faster:
|
@rgerganov @ggerganov Hello guys, I was working on implementing an upscaler in stable-diffusion.cpp, but it requires the LeakyReLU activation function, with a According to this implementation with inline static void ggml_vec_leaky_f32 (const int n, float * y, const float * x) { for (int i = 0; i < n; ++i) y[i] = (x[i] > 0.f) ? x[i] : 0.1f*x[i]; }
inline static void ggml_vec_leaky_relu_f32 (const int n, float * y, const float * x, const float ns) { for (int i = 0; i < n; ++i) y[i] = ((x[i] > 0.f) ? x[i] : 0.f) + ns * ((x[i] < 0.0f) ? x[i] : 0.f); }
// ggml_leaky
struct ggml_tensor * ggml_leaky(
struct ggml_context * ctx,
struct ggml_tensor * a) {
return ggml_unary(ctx, a, GGML_UNARY_OP_LEAKY);
}
struct ggml_tensor * ggml_leaky_relu(
struct ggml_context * ctx,
struct ggml_tensor * a, float negative_slope, bool inplace) {
bool is_node = false;
if (!inplace && (a->grad)) {
is_node = true;
}
struct ggml_tensor * result = inplace ? ggml_view_tensor(ctx, a) : ggml_dup_tensor(ctx, a);
ggml_set_op_params_i32(result, 0, (int32_t) GGML_UNARY_OP_LEAKY_RELU);
ggml_set_op_params_i32(result, 1, (int32_t) (negative_slope * 100.0f));
result->op = GGML_OP_UNARY;
result->grad = is_node ? ggml_dup_tensor(ctx, result) : NULL;
result->src[0] = a;
} I think the name should be very clear as well. It never occurred to me that ggml_leaky was an activation function. I think it should be renamed to ggml_leaky_relu to emphasize its use. |
In PyTorch, it seems to be called LeakyReLU so I think we should rename |
Fixes building for x86 processors missing F16C featureset MSVC not included, as in MSVC F16C is implied with AVX2/AVX512
This PR implements yolov3-tiny from https://github.com/pjreddie/darknet/. It is still WIP but most of the work is done.
I had to make two changes to ggml for this:
ggml_pool_2d()
; this one is a bit weird because it has to support odd number of padding elements; I changed the type ofp0
andp1
tofloat
to keep the current semantics and then usep=0.5
when one padding element is needed;