Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the handling of RTI connections more robust #146

Open
cmnrd opened this issue Jan 20, 2023 · 5 comments
Open

Make the handling of RTI connections more robust #146

cmnrd opened this issue Jan 20, 2023 · 5 comments

Comments

@cmnrd
Copy link
Contributor

cmnrd commented Jan 20, 2023

There is a long-standing flaw in the RTI/federate mechanism for handling ports. The RTI tries to get a default port, and if is unavailable, it tries a port number that is one larger, and if that fails, it tries one more, etc. The federates go through a similar sequence, trying the default port number first, and if failing, trying one more.

However, this really doesn't work. In particular, if you start a federate before the RTI, it skips the default port, and it takes a very long time for it to circle around to try that default port again.

The problem this was trying to address is that when a program releases a port, the OS does not make the port available to other programs for some time. There is a good reason for this: the OS wants to prevent a program from grabbing a port and then receiving messages that were intended for a program that has exited. It therefore holds the port long enough that any messages that were in flight die before it releases the port.

This feature was making CI fail because it runs many federated programs in sequence.

I think a better solution is just that the RTI should just use a fixed port, perhaps optionally specified as a command-line argument (which the federates will also need to be told). Then we just have to figure out how to make CI work (wait long enough between federated tests?).

Originally posted by @edwardalee in lf-lang/lingua-franca#1556 (comment)

Also see the rest of the discussion in lf-lang/lingua-franca#1556

@Jakio815
Copy link
Collaborator

@edwardalee @cmnrd I don't think this issue is fixed. Were there further discussions about the port fixing?

@edwardalee
Copy link
Contributor

I think the situation is improved, in that you can now reliably start several RTIs/federations on the same machine. But I think you still have to start the RTI first for the federates to find it in reasonable time. What symptoms are you seeing?

@lhstrh has proposed designing a broker that would have a fixed IP/port and would hand out RTI IP/port addresses (as they say, all problems in CS can be satisfied with one more level of indirection). This broker would have to be run as a demon to be effective, like the mosquito broker in MQTT. I proposed that this broker could be itself an LF program. It does create the extra hassle of having to set the broker up on any machine that you would like an RTI to run. Alternatively, I guess we could run a global broker on lf-mac.eecs.berkeley.edu. Another alternative might be to look into using broadcast packets, but this would require relying lower-level networking APIs.

@Jakio815
Copy link
Collaborator

I didn't really find a problem. I was curious what happened to the discussion of incrementing the RTI's port.

The design looks interesting. Is it a future plan to implement, or is it on a branch?

@edwardalee
Copy link
Contributor

AFAIK, nobody has started working on it. If I were to do it, I would try to make an LF program (just for fun... I don't think it really needs LF features).

@lhstrh
Copy link
Member

lhstrh commented Mar 28, 2024

I haven't really researched the topic, but my off-the-cuff response is we need something along the lines of the following:

  • have a broker listen at a standard port for communication with federates
  • let federates either specify a known broker or discover one
  • discovery could be done using UDP broadcast
  • this means federates should also be listening for responses from brokers that receive their inquiry

One question that comes to mind is: what do we do if participants of the same federation discover different brokers? I think gossip protocols are typically used to address these kinds of situations.

First and foremost, before doing anything, I would research whether there are existing implementations that are stable, popular, and well maintained. I estimate the likelihood that we need to build something like this from scratch to be near zero. This is obviously a problem that has been solved in a thousand many different ways already...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants