Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another dlt-dameon recreate fifo file before exit error #569

Open
jack-liuhang opened this issue Nov 17, 2023 · 8 comments · May be fixed by #664
Open

Another dlt-dameon recreate fifo file before exit error #569

jack-liuhang opened this issue Nov 17, 2023 · 8 comments · May be fixed by #664
Assignees
Labels

Comments

@jack-liuhang
Copy link

static int dlt_daemon_init_fifo(DltDaemonLocal *daemon_local)

static int dlt_daemon_init_fifo(DltDaemonLocal *daemon_local)
{
    int ret;
    int fd = -1;
    int fifo_size;

    /* open named pipe(FIFO) to receive DLT messages from users */
    umask(0);

    /* Try to delete existing pipe, ignore result of unlink */
    const char *tmpFifo = daemon_local->flags.daemonFifoName;
    unlink(tmpFifo);

    ret = mkfifo(tmpFifo, S_IRUSR | S_IWUSR | S_IWGRP);

if a dlt-daemon is running, and someone else startup another dlt-daemon again, and it will exit error because of binding ip/port confilct.
But this dlt-daemon will recreate FIFO file, and the first daemon cannot receive any message from FIFO because the old file is deleted

@minminlittleshrimp
Copy link
Collaborator

Hello @jack-liuhang
Sorry for my late verification.
Yes your observation is right.
My reproducing steps:

  1. Run daemon with default port 3190
  2. Run user: success
  3. Check fifo activity using strace: read write okay
  4. Run another daemon on port 3195 and still keep old daemon running
  5. strace seeing the read from old daemon on fifo but the user now NOT writing to that old fifo anymore
  6. Now rerun client on new port 3195, fifo now working for reading and writing

Conclusion: fifo already been renewed and used by new daemon

Next actions:

  1. Stop the old daemon
  2. Stop the user
  3. Rerun the user, now fifo is removed and it is exactly the issue you described:
$ dlt-example-user hello -n 1000
[20089.689247]~DLT~582893~INFO     ~FIFO /tmp/dlt cannot be opened. Retrying later...

None of new daemon or new user process can communicate since no fifo there.
=> This is a real bug and new mechanism/fix needed here
Thanks a lots for the report!
FYI @michael-methner

@minminlittleshrimp
Copy link
Collaborator

minminlittleshrimp commented Jun 21, 2024

Proof:
When dlt-daemon on default port 3190 (pid 9593) is running and we start daemon on 3195 (pid 10392), the old fifo is deleted (fd 3 of 9593) and the new one is created and link to fd 3 of pid 10392.
Issue comes when we try to kill 9593, the fifo also be removed, no more fifo for 10392.

Dr.Mint@:/proc/10392/fd  
$ ll /proc/9593/fd/3
lrwx------ 1 lum3hc lum3hc 64 Jun 21 15:27 /proc/9593/fd/3 -> '/tmp/dlt (deleted)'
Dr.Mint@:/proc/10392/fd  
$ ll 3
lrwx------ 1 lum3hc lum3hc 64 Jun 21 15:34 3 -> /tmp/dlt|
Dr.Mint@:/proc/10392/fd  
$ ll /proc/9593/fd/3
ls: cannot access '/proc/9593/fd/3': No such file or directory
Dr.Mint@:/proc/10392/fd  
$ ll 3
lrwx------ 1 lum3hc lum3hc 64 Jun 21 15:34 3 -> '/tmp/dlt (deleted)'

@minminlittleshrimp minminlittleshrimp pinned this issue Jun 21, 2024
@minminlittleshrimp
Copy link
Collaborator

minminlittleshrimp commented Jun 24, 2024

Hello @duvanan13
Kindly work on this issue
Your task:

  1. Reproduce the scenario
  2. Propose a fix/new implement for new mechanism
  3. Crosscheck for Autosar std (any violations? harm? vulnerability? unittest?)
  4. PR and review

Regards

@minminlittleshrimp
Copy link
Collaborator

minminlittleshrimp commented Jun 26, 2024

RCA: In dlt init phase 1, setup for local fifo also trying to remove the exist one (unlink), however the fd not yet close (fd 3). Then later on, the old daemon at exit phase will try to remove the fifo if detecting fifo linking to fd.
Currently no solution proposed yet, kindly continue to research.
Proposing method: Check modified date fifo

@jack-liuhang
Copy link
Author

Thank you for revisiting the question and investing your time in it.

Solving this issue thoroughly is indeed challenging.

Consider not unlinking tmpFifo directly. For instance, quickly check for IP binding conflicts before unlinking FIFO operations. Additionally, preventing new daemons from being created on a multi-user machine poses its own challenges.

Best regards,

@minminlittleshrimp
Copy link
Collaborator

Hello @jack-liuhang
Welcome. But I dont think it is challenging since fifo is not used for multi daemons listening like that. Quite sure multi daemons can access to fifo at a time, but data will not be duplicated, hence this pointing out that only 1 daemon should be there on listening side.
The fix will focus on the current daemon, meaning new daemon will be blocked and cannot initialized due to fifo in use.
Any concern kindly raise for discussion.
Thanks

@minminlittleshrimp minminlittleshrimp linked a pull request Jul 16, 2024 that will close this issue
@minminlittleshrimp
Copy link
Collaborator

minminlittleshrimp commented Jul 16, 2024

Hello @jack-liuhang
The fix is available in #664 , could you kindly check that issue can be fixed on your side?
For the mechanism I will confirm with other maintainers to see if it is appropriate, since it seems make sense now for FIFO to just be used by only one daemon.
Thank you

@jack-liuhang
Copy link
Author

Maybe another question will raise, and I have mentioned in the MR
thanks so much,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants