Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example which implements YOLO object detection #576

Merged
merged 3 commits into from
Oct 30, 2023

Conversation

rgerganov
Copy link
Collaborator

This PR implements yolov3-tiny from https://github.com/pjreddie/darknet/. It is still WIP but most of the work is done.
I had to make two changes to ggml for this:

  • add leaky relu activation
  • add padding support in ggml_pool_2d(); this one is a bit weird because it has to support odd number of padding elements; I changed the type of p0 and p1 to float to keep the current semantics and then use p=0.5 when one padding element is needed;

@rgerganov
Copy link
Collaborator Author

I have completed the implementation of this example and it is ready for review. Here are the results for the default image (dog.jpg):

$ ./yolov3-tiny -m yolov3-tiny.gguf -i dog.jpg        
Layer  0 output shape:  416 x 416 x   16 x   1
Layer  1 output shape:  208 x 208 x   16 x   1
Layer  2 output shape:  208 x 208 x   32 x   1
Layer  3 output shape:  104 x 104 x   32 x   1
Layer  4 output shape:  104 x 104 x   64 x   1
Layer  5 output shape:   52 x  52 x   64 x   1
Layer  6 output shape:   52 x  52 x  128 x   1
Layer  7 output shape:   26 x  26 x  128 x   1
Layer  8 output shape:   26 x  26 x  256 x   1
Layer  9 output shape:   13 x  13 x  256 x   1
Layer 10 output shape:   13 x  13 x  512 x   1
Layer 11 output shape:   13 x  13 x  512 x   1
Layer 12 output shape:   13 x  13 x 1024 x   1
Layer 13 output shape:   13 x  13 x  256 x   1
Layer 14 output shape:   13 x  13 x  512 x   1
Layer 15 output shape:   13 x  13 x  255 x   1
Layer 18 output shape:   13 x  13 x  128 x   1
Layer 19 output shape:   26 x  26 x  128 x   1
Layer 20 output shape:   26 x  26 x  384 x   1
Layer 21 output shape:   26 x  26 x  256 x   1
Layer 22 output shape:   26 x  26 x  255 x   1
dog: 57%
car: 52%
truck: 56%
car: 62%
bicycle: 59%
Detected objects saved in 'predictions.jpg' (time: 0.595000 sec.)

predictions

I am using ggml to compute the output of all layers except the YOLO layers (you can find the model architecture in yolov3-tiny.cfg). The output of the YOLO layers is computed with the apply_yolo() function. At the end, detected objects are extractred from the output of the YOLO layers.

As you can see yolov3-tiny is quite fast but not very accurate. However, the same approach can be applied to infer more sophisticated YOLO models like v4, v5 and v7.

@rgerganov rgerganov marked this pull request as ready for review October 17, 2023 12:17
Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool 😄

Would be nice to add a test in ci/run.sh

examples/yolo/yolo_image.h Outdated Show resolved Hide resolved
examples/yolo/yolo_image.cpp Outdated Show resolved Hide resolved
Comment on lines +137 to +139
result = ggml_sub(ctx, result, ggml_repeat(ctx, layer.rolling_mean, result));
result = ggml_div(ctx, result, ggml_sqrt(ctx, ggml_repeat(ctx, layer.rolling_variance, result)));
result = ggml_mul(ctx, result, ggml_repeat(ctx, layer.scales, result));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These ggml_repeat should be avoidable via implicit broadcast. ggml_mul already supports broadcast - might be a good idea to add for ggml_sub and ggml_div in a similar way. For now, we can implement it just on the CPU and GGML_ASSERT on the GPU backends when broadcast is necessary but not implemented yet

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ggml_mul has partial broadcast support, it expects equal number of elements in the first dimension which is not the case here; I may try to address this in a follow-up patch

examples/yolo/yolov3-tiny.cpp Outdated Show resolved Hide resolved
examples/yolo/yolov3-tiny.cpp Outdated Show resolved Hide resolved
examples/yolo/yolov3-tiny.cpp Outdated Show resolved Hide resolved
examples/yolo/yolov3-tiny.cpp Outdated Show resolved Hide resolved
include/ggml/ggml.h Show resolved Hide resolved
@rgerganov
Copy link
Collaborator Author

I have addressed the comments and added a CI test. I also realized that I don't need to create a second computation graph and the code now runs much faster:

./yolov3-tiny -m yolov3-tiny.gguf -i dog.jpg
...
dog: 57%
car: 52%
truck: 56%
car: 62%
bicycle: 59%
Detected objects saved in 'predictions.jpg' (time: 0.360000 sec.)

@ggerganov ggerganov merged commit 05ff36f into ggerganov:master Oct 30, 2023
4 checks passed
@FSSRepo
Copy link
Collaborator

FSSRepo commented Dec 4, 2023

@rgerganov @ggerganov Hello guys, I was working on implementing an upscaler in stable-diffusion.cpp, but it requires the LeakyReLU activation function, with a negative_slope parameter of 0.2. While reviewing ggml to see if there's a similar function, I came across ggml_leaky, but it doesn't have any other parameter to specify. Upon further inspection, it seems that they function in the same way, with the only difference being that ggml_leaky does not use 'min'.

According to this implementation with negative_slope of 0.1, YOLO-3 uses LeakyReLU. The architecture I am implementing requires specifying a negative slope of 0.2. My question is, should I extend the existing function or create a new one?

inline static void ggml_vec_leaky_f32 (const int n, float * y, const float * x) { for (int i = 0; i < n; ++i) y[i] = (x[i] > 0.f) ? x[i] : 0.1f*x[i]; }
inline static void ggml_vec_leaky_relu_f32 (const int n, float * y, const float * x, const float ns) { for (int i = 0; i < n; ++i) y[i] = ((x[i] > 0.f) ? x[i] : 0.f) + ns * ((x[i] < 0.0f) ? x[i] : 0.f); }

// ggml_leaky

struct ggml_tensor * ggml_leaky(
        struct ggml_context * ctx,
        struct ggml_tensor  * a) {
    return ggml_unary(ctx, a, GGML_UNARY_OP_LEAKY);
}

struct ggml_tensor * ggml_leaky_relu(
        struct ggml_context * ctx,
        struct ggml_tensor  * a, float negative_slope, bool inplace) {
    bool is_node = false;

    if (!inplace && (a->grad)) {
        is_node = true;
    }

    struct ggml_tensor * result = inplace ? ggml_view_tensor(ctx, a) : ggml_dup_tensor(ctx, a);

    ggml_set_op_params_i32(result, 0, (int32_t) GGML_UNARY_OP_LEAKY_RELU);
    ggml_set_op_params_i32(result, 1, (int32_t) (negative_slope * 100.0f));

    result->op   = GGML_OP_UNARY;
    result->grad = is_node ? ggml_dup_tensor(ctx, result) : NULL;
    result->src[0] = a;
}

I think the name should be very clear as well. It never occurred to me that ggml_leaky was an activation function. I think it should be renamed to ggml_leaky_relu to emphasize its use.

@ggerganov
Copy link
Owner

In PyTorch, it seems to be called LeakyReLU so I think we should rename ggml_leaky -> ggml_leaky_relu and add float negative_slope argument as you proposed. Probably no need to keep the ggml_leaky overload. Also rename GGML_UNARY_OP_LEAKY -> GGML_UNARY_OP_LEAKY_RELU

CCLDArjun pushed a commit to CCLDArjun/ggml that referenced this pull request Dec 18, 2023
Fixes building for x86 processors missing F16C featureset
MSVC not included, as in MSVC F16C is implied with AVX2/AVX512
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants