Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Console] fix: console deinit always deadlocks - adds ability to unblock linenoise (try 2) (IDFGH-9188) #10580

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

chipweinberger
Copy link
Contributor

@chipweinberger chipweinberger commented Jan 20, 2023

Previous PR: #9983

I am re-opening this PR to get more traction on it.

This solves: #9974

Motivation:

  • currently, console deinit causes a deadlock
  • without deinit, we can't create consoles "as needed"
  • for example, USB Serial JTAG console should be destroyed when entering USB host mode
  • for example, "web" consoles should be destroyed when the browser window is closed

*
* @param[in] xRingbuffer The ring buffer who's rx will be unblocked.
*/
void vRingbufferUnblockRx(RingbufHandle_t xRingbuffer);
Copy link
Contributor Author

@chipweinberger chipweinberger Jan 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If usb_serial_jtag.c starts using something other than a ringbuffer internally (e.g. a StreamBuffer), we would have to add similar API for unblocking there; and given that StreamBuffer is upstream FreeRTOS code we would most likely want to avoid that.

^ From Ivan.

Can you re-evaluate your position on this? vRingbufferUnblockRx seems like the ideal solution.

If we ever switched to StreamBuffer we could come up with alternative solutions. For example:

  • A) a counting semaphore + StreamBuffer. 'give' the semaphore on new data + then call StreamBuffer with a 0 timeout. We'll then check if there enough data. During deinit, we'd "give" the semaphore another time in order to unblock the read, then return read() with error.
  • B) Use a long timeout with StreamBuffer, perhaps 10 seconds, and poll an exit condition
  • C) Make a Pull Request on FreeRTOS to add the necessary unblock feature to StreamBuffer

Copy link
Contributor Author

@chipweinberger chipweinberger Jan 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, ^^ solution (A), could be used to mitigate the need for vRingbufferUnblockRx in this PR as well.

I think the solution would look like this:

  • normal exclusive mutex. Not entirely necessary. usb_serial_jtag_read_bytes will grab this mutex first. This makes sure the counting semaphore only has a single thread using it at a time. i.e. so the first thread to call read() will always be satisfied first.
  • counting semaphore. usb_serial_jtag_isr_handler_default increments this counting semaphore once each time it appends to the ringbuffer, usb_serial_jtag_read_bytes will 'take' from the semaphore waiting until new bytes are ready (We could also use event_groups instead).
  • ring buffer, usb_serial_jtag_read_bytes calls the ringbuffer with a zero timeout, and will wait on the counting semaphore again until there is enough data.

Of course, I would consider the vRingbufferUnblockRx solution already used in this PR better.

@chipweinberger chipweinberger force-pushed the user/chip/console-unblock-try2 branch 2 times, most recently from bc01e0c to 29fd993 Compare January 20, 2023 00:56
@chipweinberger chipweinberger changed the title [Console] fix console deinit: add ability to unblock linenoise, to fix deadlock [Console] fix console deinit: add ability to unblock linenoise, to fix deadlock (try 2) Jan 20, 2023
@espressif-bot espressif-bot added the Status: Opened Issue is new label Jan 20, 2023
@github-actions github-actions bot changed the title [Console] fix console deinit: add ability to unblock linenoise, to fix deadlock (try 2) [Console] fix console deinit: add ability to unblock linenoise, to fix deadlock (try 2) (IDFGH-9188) Jan 20, 2023
@espressif-bot espressif-bot added Status: Selected for Development Issue is selected for development and removed Status: Opened Issue is new labels Jan 25, 2023
@SoucheSouche
Copy link
Collaborator

Hi @chipweinberger, I started looking into your PR. Can I ask you to rebase and fix the build issues and conflict with upstream?

@chipweinberger
Copy link
Contributor Author

sure, one moment.

@chipweinberger chipweinberger force-pushed the user/chip/console-unblock-try2 branch 2 times, most recently from 0523e98 to 4a8c859 Compare April 18, 2023 06:11
@chipweinberger
Copy link
Contributor Author

chipweinberger commented Apr 18, 2023

@SoucheSouche updated.

@chipweinberger chipweinberger force-pushed the user/chip/console-unblock-try2 branch 3 times, most recently from 55e025a to 10bfe50 Compare April 18, 2023 06:27
configASSERT(pxRingbuffer);

// is the semaphore taken?
if (uxSemaphoreGetCount(rbGET_RX_SEM_HANDLE(pxRingbuffer)) == 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rbGET_RX_SEM_HANDLE doesn't seem to be declared anywhere.

Copy link
Contributor Author

@chipweinberger chipweinberger Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code must have changed. @SoucheSouche, I'm pretty behind and stressed about my own work at the moment. If you are able to update it that would be a great help.

edit: I updated the code to remove the semaphore. Threading needs more thought.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure no problem!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should move this from the flags, and into its own variable. That way we are less likely to hit threading problems from multiple flags sharing the same memory location.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ I'm not familiar enough with ringbuf to make this judgement call.

@CLAassistant
Copy link

CLAassistant commented Apr 20, 2023

CLA assistant check
All committers have signed the CLA.

@espressif-bot espressif-bot added Status: In Progress Work is in progress and removed Status: Selected for Development Issue is selected for development labels Apr 20, 2023
@chipweinberger
Copy link
Contributor Author

@SoucheSouche how are things going?

@SoucheSouche
Copy link
Collaborator

SoucheSouche commented Apr 20, 2023

@chipweinberger, there has been some changes in the ring buffer so I am now looking at your PR again since the semaphores for rx and tx were removed.

But tomorrow is my last working day before my extended leaves so I will most likely not finish up before. Someone from my team will take over the console related PRs.

@chipweinberger
Copy link
Contributor Author

chipweinberger commented Apr 21, 2023

since the semaphores for rx and tx were removed.

I updated the PR and removed the semaphores.

But as I was saying before I think we should move this bit: #define rbBUFFER_UNBLOCK_RX_FLAG ( ( UBaseType_t ) 32 ) to its own variable, to simplify race condition worries.

@SoucheSouche
Copy link
Collaborator

SoucheSouche commented Apr 21, 2023

The other issue is that this comment is still valid and your current solution won't be accepted.

I am implementing an alternative to your changes using xTaskAbortDelay proposed solution.

@chipweinberger
Copy link
Contributor Author

chipweinberger commented Apr 21, 2023

@SoucheSouche , did you understand Ivans comment? "Thinking about this again, select on a pair of file descriptors (stdin and eventfd) is probably the cleanest solution." I'm not sure what that solution would look like.

If you use xTaskAbortDelay, make sure the thread is not blocked in user code. 1. suspend the linenoise thread. 2. check its location.

@chipweinberger
Copy link
Contributor Author

chipweinberger commented Apr 26, 2023

@SoucheSouche another idea - we could send a "enter" key event to vfs buffers to force the thread to exit line noise.

something like:

  1. esp_linenoise_quit //prevent any further linenoise stdin reads or command lines being returned
  2. usb_serial_jtag_write('\n')

@espressif-bot espressif-bot added Status: Reviewing Issue is being reviewed Status: Selected for Development Issue is selected for development and removed Status: In Progress Work is in progress Status: Reviewing Issue is being reviewed labels Oct 16, 2023
@chipweinberger chipweinberger changed the title [Console] fix console deinit: add ability to unblock linenoise, to fix deadlock (try 2) (IDFGH-9188) [Console] fix: console deinit always deadlocks - adds ability to unblock linenoise (try 2) (IDFGH-9188) Dec 17, 2023
Copy link

Warnings
⚠️

Some issues found for the commit messages in this PR:

  • the commit message "[Console] fix console deinit: add ability to unblock linenoise, to fix deadlock":
    • summary looks empty
    • type/action looks empty

Please fix these commit messages - here are some basic tips:

  • follow Conventional Commits style
  • correct format of commit message should be: <type/action>(<scope/component>): <summary>, for example fix(esp32): Fixed startup timeout issue
  • allowed types are: change,ci,docs,feat,fix,refactor,remove,revert
  • sufficiently descriptive message summary should be between 20 to 72 characters and start with upper case letter
  • avoid Jira references in commit messages (unavailable/irrelevant for our customers)

TIP: Install pre-commit hooks and run this check when committing (uses the Conventional Precommit Linter).

⚠️

The source branch "user/chip/console-unblock-try2" incorrect format:

  • contains more than one slash (/). This can cause troubles with git sync.
    Please rename your branch.

👋 Hello chipweinberger, we appreciate your contribution to this project!


📘 Please review the project's Contributions Guide for key guidelines on code, documentation, testing, and more.

🖊️ Please also make sure you have read and signed the Contributor License Agreement for this project.

Click to see more instructions ...


This automated output is generated by the PR linter DangerJS, which checks if your Pull Request meets the project's requirements and helps you fix potential issues.

DangerJS is triggered with each push event to a Pull Request and modify the contents of this comment.

Please consider the following:
- Danger mainly focuses on the PR structure and formatting and can't understand the meaning behind your code or changes.
- Danger is not a substitute for human code reviews; it's still important to request a code review from your colleagues.
- Resolve all warnings (⚠️ ) before requesting a review from human reviewers - they will appreciate it.
- To manually retry these Danger checks, please navigate to the Actions tab and re-run last Danger workflow.

Review and merge process you can expect ...


We do welcome contributions in the form of bug reports, feature requests and pull requests via this public GitHub repository.

This GitHub project is public mirror of our internal git repository

1. An internal issue has been created for the PR, we assign it to the relevant engineer.
2. They review the PR and either approve it or ask you for changes or clarifications.
3. Once the GitHub PR is approved, we synchronize it into our internal git repository.
4. In the internal git repository we do the final review, collect approvals from core owners and make sure all the automated tests are passing.
- At this point we may do some adjustments to the proposed change, or extend it by adding tests or documentation.
5. If the change is approved and passes the tests it is merged into the default branch.
5. On next sync from the internal git repository merged change will appear in this public GitHub repository.

Generated by 🚫 dangerJS against f85938e

@espressif-bot espressif-bot added Status: Opened Issue is new and removed Status: Selected for Development Issue is selected for development labels Jan 22, 2024
@espressif-bot espressif-bot added Status: Selected for Development Issue is selected for development and removed Status: Opened Issue is new labels Apr 18, 2024
@SoucheSouche
Copy link
Collaborator

Hi @chipweinberger,
Sorry that I left this PR not updated for so long. I will get back at it this week. It's been a while so it might take a bit of time to get back into it but I will update you as I go through it.
Hopefully we will finally be able to merge it this time.

@espressif-bot espressif-bot added Status: In Progress Work is in progress and removed Status: Selected for Development Issue is selected for development labels May 10, 2024
@SoucheSouche
Copy link
Collaborator

Hi @chipweinberger, a little update on the PR.
Since it was fairly old, I first had to rebase it on the latest master and realized that your solution was no longer working.
The concept of adding 2 states to the console repl is great and works just fine but the solution that you were using to force linenoise to return didn't seem to work.

I experimented with select as @igrr suggested and ended up finding a solution.
The very drafty logic implemented in linenoiseDumb() and linenoiseRaw() is the following:

[...]
        if (select(in_fd + 1, &read_fds, NULL, &except_fds, NULL) < 0) {
            // select returned with -1, handle the error
        }

        // std_in is either ready to read or an exceptional condition occurred
        if(FD_ISSET(in_fd, &except_fds)) {
            // exceptional condition occurred, return from linenoise back into the while loop
            // of `esp_console_repl_task` where the repl state will be reevaluated
            break;
        }
        else if(FD_ISSET(in_fd, &read_fds)) {
            // do the reading since std_in is ready for reading
        }
        else {
            continue;
        }

I kept uart_unblock_reads and usb_serial_jtag_unblock_reads and I make them fire an "exceptional condition" event that will make the select return and if(FD_ISSET(in_fd, &except_fds)) evaluated to true.

@chipweinberger
Copy link
Contributor Author

seems like a reasonable solution!

impressive :) fun :)

@SoucheSouche
Copy link
Collaborator

I had to implement the select for the usb serial jtag driver so I will have to first merge this added feature before proceeding with the actual deadlock fix.
I will keep you posted.

@chipweinberger
Copy link
Contributor Author

chipweinberger commented May 29, 2024

related: #9964 ([USB Serial JTAG] calling select() & poll() crashes)

@SoucheSouche
Copy link
Collaborator

Hi @chipweinberger,
a little update on the fix.
I am still waiting to merge the select() feature for the usb cdc and usb serial jtag drivers before I can merge the fix for the deadlock.
I will let you know when the select() features have been merged but it might take a little while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: In Progress Work is in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants