Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure Shows Successful Badge on Deployment Failure #2998

Open
lmondigo opened this issue Aug 29, 2024 · 11 comments
Open

Azure Shows Successful Badge on Deployment Failure #2998

lmondigo opened this issue Aug 29, 2024 · 11 comments
Labels
more information required Issue requires more information or a response from the customer validated Version information for this issue has been validated

Comments

@lmondigo
Copy link

lmondigo commented Aug 29, 2024

Summary

On August 20, we noticed our pipelines are showing as successful but upon checking the deployment status, it reported error and should have failed the pipeline.

Steps To Reproduce

After replicating issue with similar variables (latest SF CLI version at that time which appears to be 2.56.4, components being deployed, tests being run), the issue no longer seem to persist and cannot be reproduced anymore.

Command used

sf project deploy start -d delta -o TargetOrg -l RunSpecifiedTests -t $(test_classes) -w 120 -c

Expected result

Failed deployments on pipelines should appear failed.

Actual result

Failed deployments showing up as succeeded.

Additional information

False positive deployment:
image

After replicating and getting the expected behavior:
image

System Information

Operating System
Ubuntu
22.04.4
LTS

JSON

{
  "architecture": "linux-x64",
  "cliVersion": "@salesforce/cli/2.56.4",
  "nodeVersion": "node-v18.20.4",
  "osVersion": "Linux 6.5.0-1025-azure",
  "rootPath": "/usr/local/lib/node_modules/@salesforce/cli",
  "shell": "bash",
  "pluginVersions": [
    "@oclif/plugin-autocomplete 3.2.0 (core)",
    "@oclif/plugin-commands 4.0.11 (core)",
    "@oclif/plugin-help 6.2.8 (core)",
    "@oclif/plugin-not-found 3.2.16 (core)",
    "@oclif/plugin-plugins 5.4.4 (core)",
    "@oclif/plugin-search 1.2.5 (core)",
    "@oclif/plugin-update 4.5.3 (core)",
    "@oclif/plugin-version 2.2.10 (core)",
    "@oclif/plugin-warn-if-update-available 3.1.11 (core)",
    "@oclif/plugin-which 3.2.10 (core)",
    "@salesforce/cli 2.56.4 (core)",
    "apex 3.4.2 (core)",
    "auth 3.6.48 (core)",
    "data 3.6.1 (core)",
    "deploy-retrieve 3.10.0 (core)",
    "info 3.3.29 (core)",
    "limits 3.3.25 (core)",
    "marketplace 1.2.22 (core)",
    "org 4.4.8 (core)",
    "packaging 2.8.0 (core)",
    "schema 3.3.24 (core)",
    "settings 2.3.13 (core)",
    "sobject 1.4.29 (core)",
    "source 3.5.14 (core)",
    "telemetry 3.6.7 (core)",
    "templates 56.3.12 (core)",
    "trust 3.7.23 (core)",
    "user 3.5.25 (core)",
    "sfdx-git-delta 5.42.1 (user) published 2 days ago (Mon Aug 26 2024)"
  ]
}
@lmondigo lmondigo added the investigating We're actively investigating this issue label Aug 29, 2024
Copy link

Hello @lmondigo 👋 It looks like you didn't include the full Salesforce CLI version information in your issue.
Please provide the output of version --verbose --json for the CLI you're using (sf or sfdx).

A few more things to check:

  • Make sure you've provided detailed steps to reproduce your issue.
    • A repository that clearly demonstrates the bug is ideal.
  • Make sure you've installed the latest version of Salesforce CLI. (docs)
    • Better yet, try the rc or nightly versions. (docs)
  • Try running the doctor command to diagnose common issues.
  • Search GitHub for existing related issues.

Thank you!

@github-actions github-actions bot added more information required Issue requires more information or a response from the customer and removed investigating We're actively investigating this issue labels Aug 29, 2024
Copy link

Thank you for filing this issue. We appreciate your feedback and will review the issue as soon as possible. Remember, however, that GitHub isn't a mechanism for receiving support under any agreement or SLA. If you require immediate assistance, contact Salesforce Customer Support.

Copy link

Hello @lmondigo 👋 None of the versions of sf you shared match the latest release.

Shared: 2.56.4
Latest: 2.56.7

Update to the latest version of Salesforce CLI (docs) and confirm that you're still seeing your issue.
You can also try the rc and nightly releases! (docs)

After updating, share the full output of sf version --verbose --json

@cristiand391
Copy link
Member

We set the exit code for project deploy start based on the deployment result status here:

https://github.com/salesforcecli/plugin-deploy-retrieve/blob/167dc693d2f06bf2a5e0c2a63e384e14d5dc6c65/src/commands/project/deploy/start.ts#L263

https://github.com/salesforcecli/plugin-deploy-retrieve/blob/167dc693d2f06bf2a5e0c2a63e384e14d5dc6c65/src/utils/deploy.ts#L225

https://github.com/salesforcecli/plugin-deploy-retrieve/blob/167dc693d2f06bf2a5e0c2a63e384e14d5dc6c65/src/utils/errorCodes.ts#L14

that happens after the deployment finished being processed in your org and by looking at the Status: Failed line it suggest that was the deploy status:

(this is where Status: Failed is printed, the Failed string comes from the deploy result payload:
https://github.com/salesforcecli/plugin-deploy-retrieve/blob/167dc693d2f06bf2a5e0c2a63e384e14d5dc6c65/src/utils/progressBar.ts#L52

I tried a few possible scenarios but couldn't get the determineExitCode function to return 0 on a non-successful deploy (using sf v2.57.7).

does azure logs somewhere the exit code of the command?

@lmondigo
Copy link
Author

Thanks for checking @cristiand391

I have checked the JSON response from both jobs via sf project deploy report (see below) and both seem to be identical in terms of status, result.status, and result.success.

Would it be right to assume that status is 0 because the job is completed from the org?
I'm also curious about which field is being returned to the terminal to be parsed by the script.

Failed Job

{
  "status": 0,
  "result": {
    "checkOnly": false,
    "completedDate": "2024-08-28T04:06:20.000Z",
    "createdBy": "0052P000000K5Mw",
    "createdByName": "Script User",
    "createdDate": "2024-08-28T04:06:15.000Z",
    "details": {
      "componentFailures": [...],
      "componentSuccesses": [...],
      "runTestResult": {...}
    },
    "done": true,
    "id": "0Af9p00000xxx",
    "ignoreWarnings": false,
    "lastModifiedDate": "2024-08-28T04:06:20.000Z",
    "numberComponentErrors": 1,
    "numberComponentsDeployed": 0,
    "numberComponentsTotal": 1,
    "numberTestErrors": 0,
    "numberTestsCompleted": 0,
    "numberTestsTotal": 0,
    "rollbackOnError": true,
    "runTestsEnabled": false,
    "startDate": "2024-08-28T04:06:16.000Z",
    "status": "Failed",
    "success": false,
    "files": [...]
  },
  "warnings": []
}

Failed job that showed success

{
  "status": 0,
  "result": {
    "checkOnly": false,
    "completedDate": "2024-08-20T00:52:42.000Z",
    "createdBy": "0052P000000K5Mw",
    "createdByName": "Script User",
    "createdDate": "2024-08-20T00:52:34.000Z",
    "details": {
      "componentFailures": [...],
      "componentSuccesses": [...],
      "runTestResult": {...}
    },
    "done": true,
    "id": "0Af9p00000xxx",
    "ignoreWarnings": false,
    "lastModifiedDate": "2024-08-20T00:52:42.000Z",
    "numberComponentErrors": 25,
    "numberComponentsDeployed": 0,
    "numberComponentsTotal": 25,
    "numberTestErrors": 0,
    "numberTestsCompleted": 0,
    "numberTestsTotal": 0,
    "rollbackOnError": true,
    "runTestsEnabled": true,
    "startDate": "2024-08-20T00:52:34.000Z",
    "status": "Failed",
    "success": false,
    "files": [...]
  },
  "warnings": []
}

@cristiand391
Copy link
Member

Would it be right to assume that status is 0 because the job is completed from the org?
I'm also curious about which field is being returned to the terminal to be parsed by the script.

No, we set the status key in the JSON output of any sf commands to the same exit code from the process:

here we set it to whatever process.exitCode is (if number) or fallback to 0:
https://github.com/salesforcecli/sf-plugins-core/blob/c61adc2093e035acac06b71322bac9fac13325bb/src/sfCommand.ts#L364

same with results but fallback to 1:
https://github.com/salesforcecli/sf-plugins-core/blob/c61adc2093e035acac06b71322bac9fac13325bb/src/errorHandling.ts#L44

In the 2 JSON results you shared above I see the the org is returning "status": "Failed" (which should make the CLI set exit code = 1).

I'll check our telemetry and see if I can find similar scenarios.

@lmondigo
Copy link
Author

lmondigo commented Sep 5, 2024

@cristiand391, it happened again on one of our pipelines running in an older container which is still using sfdx (sfdx-cli version v7.194.1) commands which leads me to believe that the issue is not related to whether using sf or sfdx and Azure but as to how the API behaves. Are you aware of any API changes from Salesforce that may cause this issue?

@iowillhoit iowillhoit added the validated Version information for this issue has been validated label Sep 9, 2024
@forcedotcom forcedotcom deleted a comment from github-actions bot Sep 9, 2024
@cristiand391
Copy link
Member

it happened again on one of our pipelines running in an older container which is still using sfdx (sfdx-cli version v7.194.1) commands which leads me to believe that the issue is not related to whether using sf or sfdx and Azure but as to how the API behaves. Are you aware of any API changes from Salesforce that may cause this issue?

Nope, but for the CLI to exit with 0 the API should be returning "status": "Succeeded" with a bad deploy...

If you can share the deploy ID of a deploy that failed but made sf exit with code 0 we could look into what the API returned from our side (CLI telemetry isn't enough to link a deploy to a command exec).

@lmondigo
Copy link
Author

lmondigo commented Sep 13, 2024

Here's several deployment IDs that exited 0 and failed in the org:
Deployment in Org A - 0Af9p00000KNB7OCAX
Deployment in Org A - 0Af9p00000KP7ScCAL
Prod validation in Org B - 0AfOZ000000ZPun0AG (after the false positive was thrown, quick deployment failed with an error INVALID_ID_FIELD: Source validate did not run tests in the org even though we triggered the tests)

Copy link

This issue has not received a response in 7 days. It will auto-close in 7 days unless a response is posted.

@cristiand391
Copy link
Member

hey @lmondigo sorry for the late reply

I can't find 0Af9p00000KNB7OCAX and 0Af9p00000KP7ScCAL in our logs (I checked 2 weeks ago too, but there were some changes to the retention logs in the last month so not sure if those were gone).

For 0AfOZ000000ZPun0AG I see it failed and numTestsExecuted=0 (so no tests).

I think it'll be tricky to debug this since you mentioned it's intermittent, some ideas:

  1. pass --dev-debug to all calls to sf project deploy start ... to get debug logs printed to stdout, you can share with us one where a false-positive happens (ideally you could open a support case for the CLI team to share this, but if you can't feel free to post it here/DM privately).
  2. we could do a prerelease of the deploy commands with some logging instrumentation to help debug further if logs aren't enough

If it's really an MDAPI issue (first API call returns status=successful on a bad deploy -> CLI exits with 0 -> next requests show status=failure) then it might require a support case to get us and the Metadata API team to investigate further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
more information required Issue requires more information or a response from the customer validated Version information for this issue has been validated
Projects
None yet
Development

No branches or pull requests

3 participants