Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HostModel: Retry ResourceUpdate on Win32 error #32347

Merged
merged 1 commit into from
Feb 19, 2020

Conversation

swaroop-sridhar
Copy link
Contributor

@swaroop-sridhar swaroop-sridhar commented Feb 15, 2020

This change attempts to fix a non-deterministic customer reported failure.

Several customers have observed failure during resource update when the HostModel updates the AppHost (to transfer resources from the managed app). The failure is not detereminisitc, not reproducible on our machines, and depends on specific computers/software running.
This indicates interference by other software while the HostWriter is updating the AppHost.

The current implementation retries the resource update if an update because the device or drive is locked (say by an antivurus) HRESULT 0x21 and 0x6C. However, the failures reported have errors 0x5 (Access violation) and 0x6# (Open failed). Windows/Defender team said that file-locking with these error-codes is not expected. However, different AVs work differently about examining files.

We believe that the correct fix for this issue is to complete:

However the above is a fairly large change for servicing .net core 3.1.
So, this change implements a simpler fix intended for servicing 3.1 branch: Always retry the resource-update on Win32 error. While this may cause unnecessary retries on legitimate failures (ex: file missing) such cases are rare, because the SDK supplies the apphost, and the HostModel itself creates the file to update.

Fixes #3832

This change attempts to fix a non-deterministic customer reported failure.

Several customers have observed failure during resource update when the HostModel updates the AppHost (to transfer resources from the managed app).
The failure is not detereminisitc, not reproducible on our machines, and depends on specific computers/software running.
This indicates interference by other software while the HostWriter is updating the AppHost.

The current implementation retries the resource update if an update because the device or drive is locked (say by an antivurus) HRESULT 0x21 and 0x6C.
However, the failures reported have errors 0x5 (Access violation) and 0x6# (Open failed).
Windows/Defender team said that file-locking with these error-codes is not expected.
However, different AVs work differently about examining files.

We believe that the correct fix for this issue is to complete:

To implement dotnet#3828 and dotnet#3829
Ship AppHost with an extension/permissions not indicating an executable.
However the above is a fairly large change for servicing .net core 3.1.

So, this change implements a simpler fix intended for servicing 3.1 branch:
Always retry the resource-update on Win32 error, unless the failure is a knwon irrecoverable code
(listed a few error codes relevent to File IO).

This change may cause unnecessary retries on legitimate failures (about 50 seconds).
But such cases are rare, because the SDK supplies the apphost, and the HostModel itself creates the file to update.

Fixes dotnet#3832
@swaroop-sridhar swaroop-sridhar merged commit c4ab2a7 into dotnet:master Feb 19, 2020
swaroop-sridhar added a commit to swaroop-sridhar/core-setup that referenced this pull request Feb 28, 2020
dotnet/runtime#3832

Building a WinExe with resources fails non-deterministically

Several customers have observed failure during resource update when the HostModel updates the AppHost (to transfer resources from the managed app).
The failure is not detereminisitc, not reproducible on our machines, and depends on specific computers/software running.
This indicates interference by other software while the HostWriter is updating the AppHost.

The current implementation retries the resource update if an update because the device or drive is locked (say by an antivurus) HRESULT 0x21 and 0x6C. However, the failures reported have errors 0x5 (Access violation) and 0x6# (Open failed). Windows/Defender team said that file-locking with these error-codes is not expected.
However, different AVs work differently about examining files.

We believe that the correct fix for this issue is to complete:

* To implement dotnet#3828 and dotnet#3829
* Ship AppHost with an extension/permissions not indicating an executable.

However the above is a fairly large change for servicing .net core 3.1.

So, this change implements a simpler fix intended for servicing 3.1 branch: Always retry the resource-update on Win32 error. While this may cause unnecessary retries on legitimate failures (ex: file missing) such cases are rare, because the SDK supplies the apphost, and the HostModel itself creates the file to update.

Low

dotnet/runtime#32347
swaroop-sridhar added a commit to swaroop-sridhar/core-setup that referenced this pull request Feb 28, 2020
dotnet/runtime#3832

Building a WinExe with resources fails non-deterministically

Several customers have observed failure during resource update when the HostModel updates the AppHost (to transfer resources from the managed app).
The failure is not detereminisitc, not reproducible on our machines, and depends on specific computers/software running.
This indicates interference by other software while the HostWriter is updating the AppHost.

The current implementation retries the resource update if an update because the device or drive is locked (say by an antivurus) HRESULT 0x21 and 0x6C. However, the failures reported have errors 0x5 (Access violation) and 0x6# (Open failed). Windows/Defender team said that file-locking with these error-codes is not expected.
However, different AVs work differently about examining files.

We believe that the correct fix for this issue is to complete:

* To implement dotnet#3828 and dotnet#3829
* Ship AppHost with an extension/permissions not indicating an executable.

However the above is a fairly large change for servicing .net core 3.1.

So, this change implements a simpler fix intended for servicing 3.1 branch: Always retry the resource-update on Win32 error. While this may cause unnecessary retries on legitimate failures (ex: file missing) such cases are rare, because the SDK supplies the apphost, and the HostModel itself creates the file to update.

Low

dotnet/runtime#32347
Comment on lines +51 to +75
case 0x00000001: // ERROR_INVALID_FUNCTION
case 0x00000002: // ERROR_FILE_NOT_FOUND
case 0x00000003: // ERROR_PATH_NOT_FOUND
case 0x00000006: // ERROR_INVALID_HANDLE
case 0x00000008: // ERROR_NOT_ENOUGH_MEMORY
case 0x0000000B: // ERROR_BAD_FORMAT
case 0x0000000E: // ERROR_OUTOFMEMORY
case 0x0000000F: // ERROR_INVALID_DRIVE
case 0x00000012: // ERROR_NO_MORE_FILES
case 0x00000035: // ERROR_BAD_NETPATH
case 0x00000057: // ERROR_INVALID_PARAMETER
case 0x00000071: // ERROR_NO_MORE_SEARCH_HANDLES
case 0x00000072: // ERROR_INVALID_TARGET_HANDLE
case 0x00000078: // ERROR_CALL_NOT_IMPLEMENTED
case 0x0000007B: // ERROR_INVALID_NAME
case 0x0000007C: // ERROR_INVALID_LEVEL
case 0x0000007D: // ERROR_NO_VOLUME_LABEL
case 0x0000009A: // ERROR_LABEL_TOO_LONG
case 0x000000A0: // ERROR_BAD_ARGUMENTS
case 0x000000A1: // ERROR_BAD_PATHNAME
case 0x000000CE: // ERROR_FILENAME_EXCED_RANGE
case 0x000000DF: // ERROR_FILE_TOO_LARGE
case 0x000003ED: // ERROR_UNRECOGNIZED_VOLUME
case 0x000003EE: // ERROR_FILE_INVALID
case 0x00000651: // ERROR_DEVICE_REMOVED
Copy link

@brcaswell brcaswell Mar 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide some insight on how you came up with this list of codes, and whether or not you anticipate more codes to be added upon further review?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote down a list of error code by inspecting the windows error codes, scanning for errors that cannot succeed on retry. The list is conservative; that is, there could be more failures that shouldn't be retried. But I do not anticipate adding more codes to this list. I believe tripping into one of these cases is rare, since the file to be accessed is created by the HostModel library itself.

Anipik pushed a commit to dotnet/core-setup that referenced this pull request Mar 25, 2020
dotnet/runtime#3832

Building a WinExe with resources fails non-deterministically

Several customers have observed failure during resource update when the HostModel updates the AppHost (to transfer resources from the managed app).
The failure is not detereminisitc, not reproducible on our machines, and depends on specific computers/software running.
This indicates interference by other software while the HostWriter is updating the AppHost.

The current implementation retries the resource update if an update because the device or drive is locked (say by an antivurus) HRESULT 0x21 and 0x6C. However, the failures reported have errors 0x5 (Access violation) and 0x6# (Open failed). Windows/Defender team said that file-locking with these error-codes is not expected.
However, different AVs work differently about examining files.

We believe that the correct fix for this issue is to complete:

* To implement #3828 and #3829
* Ship AppHost with an extension/permissions not indicating an executable.

However the above is a fairly large change for servicing .net core 3.1.

So, this change implements a simpler fix intended for servicing 3.1 branch: Always retry the resource-update on Win32 error. While this may cause unnecessary retries on legitimate failures (ex: file missing) such cases are rare, because the SDK supplies the apphost, and the HostModel itself creates the file to update.

Low

dotnet/runtime#32347
@stimpy77
Copy link

stimpy77 commented May 1, 2020

When does this propagate to the wild? I have updated all the NuGet packages and have latest Visual Studio 2019 and it happens to me till I clean and sometimes re-clean and sometimes restart VS, and we also rely on Azure DevOps Pipelines (we haven't switched to .NET Core 3.x in the pipeline but won't till this is fixed).
EDIT: Disabling Trend Micro (antivirus) actually fixed this locally, but it isn't tenable as TM doesn't allow us to wildcard the folders for the exclusion list.

@swaroop-sridhar
Copy link
Contributor Author

@Anipik when is this release expected to be available? Thanks.

@glasssd
Copy link

glasssd commented May 4, 2020

Is there a beta patch available? Disabling virus protection on our on-prem Azure DevOps Agent workspace folder did not help.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-HostModel Microsoft.NET.HostModel issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HostWriter: ResourceUpdate fails non-deterministically
6 participants