-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EPIC][Error Handling] Test Run Page Error Handling Improvements #2040
Comments
@olha23 I think the efforts for this issue should be combined with Improve diagnostics returned to a user when executing a test trigger#1788 We also need to consider allowing users to:
CC: @kdhamric |
Yes yes. This is our second to the top priority, only behind knocking out the configuration work which I want the team to swarm on as it is blocking other activity. If we get to a spot where @jorgeepc or @xoscar do not have an area they can contribute to the config changes, we will want to focus on this. |
@kdhamric i added additional mockups for trace and test mode for failed cases. I need you help with copy there. Think we need to point to some troubleshooting tests docs maybe |
I left some notes. Agree that we need a 'I did not get my trace, how do I troubleshoot it page'. Plan on their being a help link in your message that says 'we did not get the trace successfully' - we will work on adding that next week. |
Added some comments, if we are in the clear about config stuff I will start working on this Monday morning! |
Hello every one, here's my take on what should be added to the test run page to improve the user experience Acceptance Criteria: AC2
AC3 The idea with this is to allow users to have easier ways to debug what's happening within the system, if we found a problem or if something else is happening. This can also help them tweak their polling settings to have the best result for them CC: @olha23 |
On AC2, it would be nice to show progress on gathering info on the trace. Maybe show the '# of spans' received so far? If you see it getting some spans you know things: |
Technical DetailsThe main goal of this epic is to provide a better experience for new and familiar Tracetest users, focusing on displaying more information for users to better understand what the app is doing after running a test. Event Log SystemCurrently, the test run process uses a web-based system to communicate updates to the clients, depending on the checkpoints defined by us where key parts of the runs are updated. In this case, we'll be leveraging that same idea by extending it to provide even more information, separating the checkpoints to an events entity where we can store everything that is happening while executing a test run. Events will have a generic structure that can be used to define a basic event type based on stage and description, here's a class diagram describing the base event structure and how we can structure the more specific event types: classDiagram
Event <|-- DataStoreConnection
Event <|-- Polling
Event <|-- Output
Event : string type
Event : enum[trigger, trace, test] stage
Event : string description
Event : date created_at
Event : string test_id
Event : string run_id
class DataStoreConnection{
DataStoreTestConnection info
}
class Polling{
int number_of_spans
int iteration_number
string reason_of_next_iteration
boolean is_complete
}
class Output{
string warning
string error
string output_name
}
The advantage of having a setup like this is that we can create and register different types of events depending on what we need, if we have to add more trigger types or different polling mechanisms we can add different event types depending on the new processes. Another key point is this will allow us to have the full event log for any trace run, if we decide to export it or display it fully for users as a text-based log that can be another way of allowing users to understand what happened. Example DB structureHttp Trigger Unreachable Host Event {
"type": "UNREACHABLE_HOST",
"description": "The host (http://localhost:8081) is unreachable",
"stage": "TRIGGER",
"definition": "{}"
} Http/GRPC Docker Host Machine Mismatch {
"type": "DOCKER_HOST_MISMATCH",
"description": "We identified Tracetest is running in docker compose, to connect to the host machine use the `host.docker.internal` hostname. For more information see https://docs.docker.com/docker-for-mac/networking/#use-cases-and-workarounds",
"stage": "TRIGGER",
"definition": "{}"
} Polling Iteration {
"type": "POLLING_ITERATION",
"description": "Polling iteration",
"stage": "TRIGGER",
"definition": "{"number_of_spans": 1, "iteration_number": 2,"reason_of_next_iteration": "","is_complete": true}"
} |
Frontend Rendering FactoryThe clients should only render the events that are required when its required, FE should only display the components based on the selected mode and test run status. ExamplesScenario 1The HTTP test trigger failed to reach the host because of the wrong docker host usage Expected events
Scenario 2The HTTP trigger returned a successful response Expected events
Scenario 3The HTTP trigger returned a successful response Expected events
Scenario 4The HTTP trigger returned a successful response Expected events
|
ExploratoryInstead of having a "custom" event solution, we could generate an internal otel trace for each test run which is going to keep track of every event that happens, then clients should have a way to visualize it based on attributes similar to what we would store in an event, like stage and type (span name). Requirements for this to work:
|
TODO:
|
Future troubleshooting features:
|
Closing in favor of #2331 |
As we come across different user sessions, the team has identified multiple areas of opportunity regarding the error handling messaging when executing a test run.
Currently, a test run goes through three significant steps:
Each step has its own set of success and failure scenarios that need to be appropriately displayed to the user.
Today, Tracetest uses only two fields from the test run to validate possible errors.
lastErrorState
which contains the string info for the last known error.state
which controls the status of the test.This was a good starting but now is not sufficient for the clients (CLI/UI) to display enough information so the user can understand how to fix potential problems. As well as providing good user feedback on what is the serverside executing at any given time.
In this case, we have identified a matrix of possible scenarios depending on the test run state, results, and what we should be displaying to the user.
Test Run Flow Chart
State Matrix for Test Runs
Tickets and Tasks
Follow-up release
Nice to have
Mockups
https://www.figma.com/file/LBN4SKVPq3ykegrPKbHT2Y/0.8-0.9-Release-Tracetest?node-id=1994-32394&t=5M47CI4J8VFbgit2-0
The text was updated successfully, but these errors were encountered: