submit onboarding task #393

YQiu-oo · 2024-10-02T08:31:17Z

Hi all!
Here is my onboarding task result. You can find testrun.rar in the google drive: https://drive.google.com/file/d/1GDbRX6s0zrnv_1KY-dZEIJqkZFh_Wl71/view?usp=drive_link.

Alarm explanation was explained in the summary.md file.

Thanks,
Yukang Qiu

TZ-zzz · 2024-10-06T07:34:14Z

Hi @YQiu-oo, thank you for submitting the onboarding task.

Could you double-check the crd.yaml and operator.yaml you provided to Acto (the one specified in the config.json)? It seems like the operator isn't being deployed correctly during the tests, as it's consistently reporting the following error:

E1002 05:03:19.021840       1 reflector.go:147] k8s.io/[email protected]/tools/cache/reflector.go:229: Failed to watch *v1alpha1.TidbDashboard: failed to list *v1alpha1.TidbDashboard: the server could not find the requested resource (get tidbdashboards.pingcap.com)

One way to ensure the crd.yaml and the operator.yaml is correct would be to deploy the operator with the files you provided manully before running acto.

YQiu-oo · 2024-10-07T17:38:31Z

Hi, @TZ-zzz! I am working on the issue, but my computer now consistently crash at the "deploy operator" stage(previously it worked successfully using the same stuff) and fails to proceed further. I don't know if Tidbdashboard is important to tidb-operator or actor. I mean will it impact acto to process further or will acto need this resource?

TZ-zzz · 2024-10-07T18:27:59Z

@YQiu-oo, yes this issue is critical since the operator was not working at all and all the tests acto runs are essentially no-ops. You can check the operator-logs in the testrun dir and you can see that the operator stuck at the deploying stage and was not taking actions.

YQiu-oo · 2024-10-08T00:44:19Z

@TZ-zzz, ok, I got you, but the operator log should be generated after it is deployed? Because my situation is I don't see operator log when stuck at the deploying stage.

TZ-zzz · 2024-10-08T18:25:31Z

@YQiu-oo, you can find the operator logs of each test case in the testrun_.../trial... folders.

YQiu-oo · 2024-10-10T02:15:45Z

@TZ-zzz Tidb-operator is weird on my computer, so I switched to MongoDB. Could you see my commit? Here is the link for the testrun.rar:https://drive.google.com/file/d/1ZKEs3y4an0kbzpbNdFdRGr62F6JDtl_h/view?usp=sharing

TZ-zzz · 2024-10-10T21:44:39Z

@YQiu-oo, I think the first alarm is due to that the previous changes haven't been reconciled. It seems the operator is stuck waiting for the pods to be reconciled, so the bug likely originates in an earlier phase, even though it's manifesting in the step that acto reports. It might be worth investigating why the previous state hasn't converged.

Btw, spec.statefulSet.meta is mapped to Kubernetes core resource StatefulSet so the changes to the metadata should trigger the change of StatefulSet. And labels and annotations are really important as lots of operators and controllers manage the Kubernetes resources based on these metadata.

For the second alarm, it's a misoperation instead of false alarm. The previous step generates some invalid values causing the internal state of mongoDB to be unready, which in turn prevents the operator from acting on the metadata.label changes.

TZ-zzz · 2024-10-13T05:39:48Z

@YQiu-oo, could you investigate the first alarm again, as the root cause is still a bit not clear.

YQiu-oo · 2024-10-13T18:57:50Z

@TZ-zzz I found the Agent in the Pod repeatedly failed to reach its goal state, preventing the MongoDB ReplicaSet from becoming ready. And then, I checked the previous operator log about configuration. The configuration could not find the required passwordSecretName, which is necessary for SCRAM authentication. I guess this unmatched configuration causes Agent to reach its goal state. (the previous changes haven't been reconciled)

tylergu · 2024-10-13T19:10:25Z

@YQiu-oo Nice observation! Are you able to pinpoint the root cause in the mongodb operator which causes it to not reconcile after the system got into an error state?

YQiu-oo · 2024-10-14T00:37:22Z

@tylergu I double checked the mongodb operator repo. If targetConfigVersion is not equal to targetConfigVersion, then it will trigger an agent's issue and output The Agent in the Pod '%s' hasn't reached the goal state yet (goal: %d, agent: %s) (https://github.com/mongodb/mongodb-kubernetes-operator/blob/c83d4d487e36c835f022092d516ce622321172b0/pkg/agent/agent_readiness.go#L110).
And GetAllDesiredMembersAndArbitersPodState( https://github.com/mongodb/mongodb-kubernetes-operator/blob/c83d4d487e36c835f022092d516ce622321172b0/pkg/agent/agent_readiness.go#L67) is the function to check and return states of all desired pods in a replica set. Since goal state is not reached, replicat set is not ready and then reconcile then replicat set is not ready and then reconcile........... until stop.

submit onboarding task

b5cff4a

switch to mongodb

0adfcf3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

submit onboarding task #393

submit onboarding task #393

YQiu-oo commented Oct 2, 2024

TZ-zzz commented Oct 6, 2024 •

edited

Loading

YQiu-oo commented Oct 7, 2024

TZ-zzz commented Oct 7, 2024

YQiu-oo commented Oct 8, 2024 •

edited

Loading

TZ-zzz commented Oct 8, 2024 •

edited

Loading

YQiu-oo commented Oct 10, 2024 •

edited

Loading

TZ-zzz commented Oct 10, 2024

TZ-zzz commented Oct 13, 2024

YQiu-oo commented Oct 13, 2024 •

edited

Loading

tylergu commented Oct 13, 2024

YQiu-oo commented Oct 14, 2024

submit onboarding task #393

Are you sure you want to change the base?

submit onboarding task #393

Conversation

YQiu-oo commented Oct 2, 2024

TZ-zzz commented Oct 6, 2024 • edited Loading

YQiu-oo commented Oct 7, 2024

TZ-zzz commented Oct 7, 2024

YQiu-oo commented Oct 8, 2024 • edited Loading

TZ-zzz commented Oct 8, 2024 • edited Loading

YQiu-oo commented Oct 10, 2024 • edited Loading

TZ-zzz commented Oct 10, 2024

TZ-zzz commented Oct 13, 2024

YQiu-oo commented Oct 13, 2024 • edited Loading

tylergu commented Oct 13, 2024

YQiu-oo commented Oct 14, 2024

TZ-zzz commented Oct 6, 2024 •

edited

Loading

YQiu-oo commented Oct 8, 2024 •

edited

Loading

TZ-zzz commented Oct 8, 2024 •

edited

Loading

YQiu-oo commented Oct 10, 2024 •

edited

Loading

YQiu-oo commented Oct 13, 2024 •

edited

Loading