Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash during mosquitto_kick_client_by_clientid call #3080

Closed
D-r-P-3-p-p-3-r opened this issue Jul 16, 2024 · 19 comments
Closed

Crash during mosquitto_kick_client_by_clientid call #3080

D-r-P-3-p-p-3-r opened this issue Jul 16, 2024 · 19 comments

Comments

@D-r-P-3-p-p-3-r
Copy link

We're experiencing crashes when calling mosquitto_kick_client_by_clientid under Mosquitto 2.14. Because of issue #3056 we're not using a newer version of Mosquitto (which probably does not matter).

Scenario: We are running a self-written Mosquitto Plug-In. This plugin keeps a white list of devices that are allowed to connect to the broker. The white list content is set by an application via an REST API. If a device is removed from the white list and it is currently connected to the broker we kick it via mosquitto_kick_client_by_clientid . This happens in a thread that is started by the plugin (the thread that handles all the plug-in‘s logic). Occasionally during the calls to mosquitto_kick_client_by_clientid the broker crashes.

I suspect it happens when concurrently with the mosquitto_kick_client_by_clientid call an ACL check is performed in the plugin via Mosquitto.
Before entering the plug-in thread we're extracting information from the event_data (e.g. via mosquitto_client_id(), mosquitto_client_username(), mosquitto_client_certificate()) in Mosquittos own thread.

Is it not ok to call mosquitto_kick_client_by_clientid from a thread that is not Mosquitto's?

Or is the problem that mosquitto_kick_client_by_clientid and mosquitto_client_* are called at the same time from different threads?

Or are there other rules I am not aware of?

Unlike the libmosquitto documentation which states the following

libmosquitto provides thread-safe operation, with the exception of mosquitto_lib_init which is not thread safe.

the plug-in documentation does not talk about thread safety at all.

@dkelmatic
Copy link

Same experience here.
We authenticate via OAuth-plugin and at some point I cannot refresh the token anymore, which is fine. I`d just kick the client then and he can login again receiving a new token.
We have a client that sends a burst of messages with a big payload (its pretty much a small csv-file) in an asynchronous manner.
When sending messages and the token fails for whatever reason, I try to kick the client. As many messages are being sent, I see the log message that client xx will be kicked like 20 times until the crash (segfault of mosquitto) happens.

The only Idea I had so far was to slow down the messages what didnt have an effect.
@D-r-P-3-p-p-3-r could you add some more detail in what circumstances the problem occurs? Does it also occur in the circumstance that a client is kicked and tries to reconnect over and over agin?

@D-r-P-3-p-p-3-r
Copy link
Author

@dkelmatic:
I think the problem occurs when we kick a client and at the very same time another client accesses the broker and thus some code is executed in the auth plug-in.
My assumption is that another mosquitto-API is called and that it is not safe to do this at the same time mosquitto_kick_client_by_clientid is called.
This matches to the description of your problem. A lot of access to the broker and then a kick. This increases the chance that the kick happens at the very same time as another operation in the broker.

@Daedaluz
Copy link
Contributor

Daedaluz commented Aug 8, 2024

I think there is a tick event you can register to that runs in the brokers context.
you could probably use that to poll a list of things to do e.g kick clients.

I don't remember if this is available in 2.0.14 though.

@D-r-P-3-p-p-3-r
Copy link
Author

I think there is a tick event you can register to that runs in the brokers context. you could probably use that to poll a list of things to do e.g kick clients.

I don't remember if this is available in 2.0.14 though.

That sounds like a very good idea. I will try this approach. Thanks!

@dkelmatic
Copy link

Hey!
I checked out the tick event. Looks good. What I'm not so sure about is how to implement that.
If I create a global list of clients wo should be kicked on the next tick, how can I be sure to not access the list from multiple threads? It would mean to use semaphores, but how am I going to do that, if I dont wat to compile mosquitto myself?

@Daedaluz
Copy link
Contributor

Why would you need to recompile mosquitto for that?

Just trying to follow your train of thought here, on how you arrive to this.

@D-r-P-3-p-p-3-r
Copy link
Author

D-r-P-3-p-p-3-r commented Aug 14, 2024

I can confirm that "tick" is available.

We use and build our own Mosquitto plugin. Thus, it is no problem to have a thread-safe mechanism that handles the access to the list. I implemented it using C++ sync mechanisms. Testing is still pending.
I assume it's similiar for everyone else using mosquitto_kick_client_by_clientid. Where would you use that function if not in plugin code?

Why would you need to recompile mosquitto for that?

IMO you don't. You compile your plugin. See above.

@dkelmatic
Copy link

dkelmatic commented Aug 14, 2024

Maybe I'm thinking wrong, but when I use a list for the clients to be kicked on the next tick, I would fill the list from within the same thread I try to kick the clients in my current implementation. I would then kick the clients in my tick event, that has been triggered by the mosquitto main loop thread. When adding/deleting the entries from the list, I would probably have to make sure that the two threads are not accessing the list in the same time to avoid another race condition.

Wouldn't I have to set such a semaphore up within the mqoauitto main code then?

@D-r-P-3-p-p-3-r
Copy link
Author

The tick event can be handled in a plugin. You can register a handler function there. No need to modify Mosquitto itself, but you extend it with a plugin.

See https://mosquitto.org/api/files/mosquitto_plugin-h.html

@dkelmatic
Copy link

Yes, I understand the the plugin interface. I'm using an oauth plugin.
My idea would be to create a (global) list and if AuthAclCheck() fails with an invalid token, I'd add the client ID to the list.
In the Tick callback, I would then read the list, kick the client and delete the clientID from the list. But can I be sure that there is no other thread that tries to write another clientID to the kick-list from another thread that runs the AuthAclCheck() function in that moment for another client?

@Daedaluz
Copy link
Contributor

I'm not convinced mosquitto runs ACL checks in parallel with multiple threads, so maybe you don't need the mutex or semaphore.

However, just for thought, why can't the semaphore be next to the list within the plugin?

@dkelmatic
Copy link

Hey @Daedaluz ,
thanks for the inspiration. I will check what happens without any precautions and if it makes problems, I might be able to use some getter and setter functions that also handle the semaphore.
For some reason I thought I would have to place the semaphore initialization within mosquitto, but you're right, it might work initializing it within the plugin.

I will report back whn I have results.

@dkelmatic
Copy link

dkelmatic commented Aug 16, 2024

Hello everyone,
after updating my plugin to the V5 api, I created a linked list with the clientIDs to be kicked with mosquitto_malloc and assigned the userdata pointer to the head.

In the acl event I add clients to the list (if they are not already in there) with help of the received userdata pointer and in the tick event I kick the users.

I didn't find a safe way to reproduce my crashes with the old solution, but until now I didn't experience any crash in the testing environment with the new solution. Fingers crossed 🤞

Did you solve the issue for your plugin? @D-r-P-3-p-p-3-r ?

@D-r-P-3-p-p-3-r
Copy link
Author

@dkelmatic
Yes. The issue for my problem was solved by moving the kick call to the the tick handler.

Thus, this is a good/the solution.

It'd be great if this information would be added to the documentation. It makes totally sense that you have to do it this way. But it is not quite obvious IMO.

@Daedaluz
Copy link
Contributor

Daedaluz commented Oct 1, 2024

can we close? =)

or maybe follow up on DrP3pp3rs suggestion.

@ralight
Copy link
Contributor

ralight commented Oct 1, 2024

I've added a comment to mosquitto_broker.h that should address this.

@D-r-P-3-p-p-3-r
Copy link
Author

I like the explanation. Will it show up in the docs as well?
As far as I am concerned, this issue can be closed.

@Daedaluz
Copy link
Contributor

Daedaluz commented Oct 2, 2024

Yes, good explanation, but I think I would add the other case in here too;
Given @dkelmatic symptoms, I guess he calls the kick function within the acl check callback which also seems to cause issues. Most likely due to repetitive calls to the kick function without checking that it's already called for a specific client or not.

However, I do not know if it shold be possible / safe to kick clients from this specific callback or not. It just made more sense to me to do all the acl checking and then - in a tick callback - kick clients no longer welcome.

@dkelmatic
Copy link

Hello folks,
sorry for the late answer. I didn't have time to get back to the mosquitto problem. I realized that I might have made a(stupid) mistake regarding reserving memory for my kick list. I'm not finished with my tests, as I have trouble to reproduce the issue in my testing environment, but it looks fine so far.

@Daedaluz yes I agree. I most probably tried to kick clients twice by f*cking up my kick list.

@ralight ralight closed this as completed in 9a5c2bc Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants