-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eCAL 5.7.1 & iceoryx losing messages if having multiple subscribers #92
Comments
Sorry, one more important thing to know: I used iceoryx version 0.16.1 in this build. |
for iceoryx it normally should make no difference whether it is in one process or many. The question is if there is a difference between intra and inter-process in eCAL. |
Thank you for the bug report. We did not test that setup for now in all use cases. The iceoryx binding is still experimental. I will reproduce that and try to figure out the problem. You can use the the default ecal.ini file if you link against iceoryx no need to change anything here. Your use case is using "intra process" communication only, even there are two subscribers in the second process. |
Thank you for the replies. I had the Using the standard ecal layer gives proper results, all subscribers' requests are fulfilled. |
So multiple subscribers did not work in the same process (in the case of iceoryx binding), that seems to be the general issue, right ? |
Indeed, just only the last declared one seems to be in operation. |
In the introspection it looks like the other subscribers are not created. If you start RouDi with debug loglevel |
Currently looking in the eCAL iceoryx reader interface and it's most likely buggy. There seems to be one instance overwriting another. Stupid issue so far ... |
The issue is fixed on the current master. @Philip-Kovacs can you please confirm it to work ? |
Thank you for the quick fix. I upgraded my build to version 5.7.2, with iceoryx 17. The multiple sender-receiver sample now runs as expected(every instance receives), however it prints the following output several times, and many messages are lost:
This issue occures with iceoryx 0.16.1 as well. |
Hi, I fixed another issue with multiple subscribers in the same process in case of the same topic name reported by @budrus. This will be merged soon into the master. However that will not change the mentioned behaviour in your last comment. Maybe @budrus can check the icoryx log messages. How many publisher and subscriber did you run in your multiple send and multiple receive setup ? These samples are normally used to check the performance with lot's of connections with maximum send speed,, they should stress the transport layer at a maximum. Maybe something has to be preconfigured in the iceoryx toml configuration file to handle that many pub/subs ? |
Yes maybe you're right, maybe overloading the layer. I was running the sample first with the original binaries with 200 publishers, then I reduced this number to 3. The error was still raised for all PUB 1, 2, and 3. Anyways, I will do some more tests. |
I checked my setup on Ubuntu with iceoryx layer and 10 publications and 10 subscriptions. If I run them at maximum speed I get the same error like you after a few seconds runtime at publisher side. |
Hello! I've put a delay of 0.1 ... 1 ms after every send action (3 publishers). With the delay, the communication wasn't disrupted, same as you wrote. When I shut the receiver side down, there is a burst of this error on the publisher side for a second, then it goes back to normal idle state. |
This error comes when the memory pool is running out of chunks. We currently provide a segregated free list approach, where you have a configured number of memools, each with a size and a number of chunks. Another important point are the queues on subscriber side. You can configure the queue size for a subscriber via a c'tor I guess you now have the problem that you are sending with multiple publishers and as fast as possible. According to your configuration above you have 1000 chunks for 16KB payload. If your subscribers cannot consume as fast as your publishers provide new data, we start queueing up. When the queue capacity is reached we start dropping the oldest samples. We currently have no interference from subscribers to publishers. I.e. we do not block the publisher if there is no more chunk or a queue overflow. You end up in an error or in loosing chunks. Maybe we will provide the possibility to block the publisher until chunks are available or queues have free space in future, but currently this was not our use case. But I see that for such setups it might be the better option even if this has negative effects on publisher timings So I would assume that your samples start queuing up and if you have multiple publishers connected to multiple receivers with a queue size of 256 you reach the point that your 1000 samples are not enough. So what to do? The queue size can be reduced, then your are maybe start loosing samples earlier but you also need less in total. Or you can increase the number of chunks in your 16 KB mempool. Our do a combination of these two measures. Our philosophy is currently
The sleeps you guys are doing is a solution to ensure that the publishers are not providing samples faster than subscribers can consume. I guess eCAL is using callbacks to consume samples of iceoryx subscribers. The question for eCAL would be if and how many samples shall be queued if new samples arrive faster than they can be consumed |
@budrus thank you for that detailed explanation. So from my point of view this issue is fixed and to run specific scenarios iceoryx needs to be configured the right way (as you described). |
Hello!
I've built eCAL 5.7.1 with iceoryx with the following cmake command:
cmake .. -DCMAKE_BUILD_TYPE=Release -DECAL_THIRDPARTY_BUILD_PROTOBUF=ON -DECAL_THIRDPARTY_BUILD_CURL=OFF -DECAL_THIRDPARTY_BUILD_HDF5=ON -DHAS_CAPNPROTO=ON -DBUILD_APPS=OFF -DBUILD_SAMPLES=ON -DBUILD_TIME=ON -DECAL_LAYER_ICEORYX=ON
Running
ecal_sample_latency_snd
andecal_sample_latency_rec_cb
along withRouDi
give fine results, however, the samples running multiple instances of subscribers produce the following output:publisher:
receiver:
I suppose all subscribers should receive the corresponding message, not just one. I tried to run an other sample code:
In this case, only sub2 is triggered, given 'foo2' in the output. Swapping sub1 declaration with sub2, output will change to 'foo1'. In case of even more subscribers, only the last declarated one triggers.
However, splitting each subscriber into different processes, and running parallel each, everything works as they meant to be. But I need to have them all in one process.
The ecal.ini file(s) were left as default, having:
Anyways, if I change ecal.ini content, the situation above still exists.
If so, what am I doing wrong? I would appreciate if anyone could help me with this issue.
The text was updated successfully, but these errors were encountered: