Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPC Publisher 2.9.8 can't startup when in docker container when cgroup v2 is used #2261

Closed
morti12 opened this issue Jun 21, 2024 · 13 comments
Closed
Labels
bug Something isn't working
Milestone

Comments

@morti12
Copy link

morti12 commented Jun 21, 2024

Describe the bug

On Linux machines where the cgroup v2 is used an exception gets thrown when the /sys/fs/cgroup/memory.max contains the default value max

Unhandled exception. Autofac.Core.DependencyResolutionException: An exception was thrown while activating Azure.IIoT.OpcUa.Publisher.Services.PublisherDiagnosticCollector -> λ:Microsoft.Extensions.Diagnostics.ResourceMonitoring.IResourceMonitor -> Microsoft.Extensions.Diagnostics.ResourceMonitoring.ResourceMonitorService -> Microsoft.Extensions.Diagnostics.ResourceMonitoring.Linux.LinuxUtilizationProvider.
---> Autofac.Core.DependencyResolutionException: An exception was thrown while invoking the constructor 'Void .ctor(Microsoft.Extensions.Options.IOptions`1[Microsoft.Extensions.Diagnostics.ResourceMonitoring.ResourceMonitoringOptions], Microsoft.Extensions.Diagnostics.ResourceMonitoring.Linux.ILinuxUtilizationParser, System.Diagnostics.Metrics.IMeterFactory, System.TimeProvider)' on type 'LinuxUtilizationProvider'.
---> System.InvalidOperationException: Could not parse '/sys/fs/cgroup/memory.max' content. Expected to find available memory in bytes but got 'max
' instead.
   at Microsoft.Shared.Diagnostics.Throw.InvalidOperationException(String message)
   at Microsoft.Extensions.Diagnostics.ResourceMonitoring.Linux.LinuxUtilizationParserCgroupV2.GetAvailableMemoryInBytes()
   at Microsoft.Extensions.Diagnostics.ResourceMonitoring.Linux.LinuxUtilizationProvider..ctor(IOptions`1 options, ILinuxUtilizationParser parser, IMeterFactory meterFactory, TimeProvider timeProvider)
   at lambda_method223(Closure, Object[])
   at Autofac.Core.Activators.Reflection.BoundConstructor.Instantiate()
   --- End of inner exception stack trace ---
   at Autofac.Core.Activators.Reflection.BoundConstructor.Instantiate()
   at Autofac.Core.Activators.Reflection.ReflectionActivator.<>c__DisplayClass14_0.<UseSingleConstructorActivation>b__0(ResolveRequestContext context, Action`1 next)
   at Autofac.Core.Resolving.Middleware.DelegateMiddleware.Execute(ResolveRequestContext context, Action`1 next)
   at Autofac.Core.Resolving.Pipeline.ResolvePipelineBuilder.<>c__DisplayClass14_0.<BuildPipeline>b__1(ResolveRequestContext context)
   at Autofac.Core.Resolving.Middleware.DisposalTrackingMiddleware.Execute(ResolveRequestContext context, Action`1 next)
   at Autofac.Core.Resolving.Pipeline.ResolvePipelineBuilder.<>c__DisplayClass14_0.<BuildPipeline>b__1(ResolveRequestContext context)
   at Autofac.Core.Resolving.Middleware.ActivatorErrorHandlingMiddleware.Execute(ResolveRequestContext context, Action`1 next)
   --- End of inner exception stack trace ---
   at Autofac.Core.Resolving.Middleware.ActivatorErrorHandlingMiddleware.Execute(ResolveRequestContext context, Action`1 next)
   at Autofac.Core.Resolving.Pipeline.ResolvePipelineBuilder.<>c__DisplayClass14_0.<BuildPipeline>b__1(ResolveRequestContext context)
   at Autofac.Core.Pipeline.ResolvePipeline.Invoke(ResolveRequestContext context)
   at Autofac.Core.Resolving.Middleware.RegistrationPipelineInvokeMiddleware.Execute(ResolveRequestContext context, Action`1 next)
   at Autofac.Core.Resolving.Pipeline.ResolvePipelineBuilder.<>c__DisplayClass14_0.<BuildPipeline>b__1(ResolveRequestContext context)
   at Autofac.Core.Resolving.Middleware.SharingMiddleware.<>c__DisplayClass5_0.<Execute>b__0()
   at Autofac.Core.Lifetime.LifetimeScope.CreateSharedInstance(Guid id, Func`1 creator)
   at Autofac.Core.Lifetime.LifetimeScope.CreateSharedInstance(Guid primaryId, Nullable`1 qualifyingId, Func`1 creator)
   at Autofac.Core.Resolving.Middleware.SharingMiddleware.Execute(ResolveRequestContext context, Action`1 next)
   at Autofac.Core.Resolving.Pipeline.ResolvePipelineBuilder.<>c__DisplayClass14_0.<BuildPipeline>b__1(ResolveRequestContext context)
   at Autofac.Core.Resolving.Pipeline.ResolvePipelineBuilder.<>c__DisplayClass14_0.<BuildPipeline>b__1(ResolveRequestContext context)
   at Autofac.Core.Resolving.Middleware.CircularDependencyDetectorMiddleware.Execute(ResolveRequestContext context, Action`1 next)
   at Autofac.Core.Resolving.Pipeline.ResolvePipelineBuilder.<>c__DisplayClass14_0.<BuildPipeline>b__1(ResolveRequestContext context)
   at Autofac.Extensions.DependencyInjection.KeyedServiceMiddleware.Execute(ResolveRequestContext context, Action`1 next)
   at Autofac.Core.Resolving.Pipeline.ResolvePipelineBuilder.<>c__DisplayClass14_0.<BuildPipeline>b__1(ResolveRequestContext context)
   at Autofac.Core.Pipeline.ResolvePipeline.Invoke(ResolveRequestContext context)
   at Autofac.Core.Resolving.ResolveOperation.GetOrCreateInstance(ISharingLifetimeScope currentOperationScope, ResolveRequest& request)
   at Autofac.Core.Resolving.ResolveOperation.ExecuteOperation(ResolveRequest& request)
   at Autofac.Core.Resolving.ResolveOperation.Execute(ResolveRequest& request)
   at Autofac.Core.Lifetime.LifetimeScope.ResolveComponent(ResolveRequest& request)
   at Autofac.Core.Container.ResolveComponent(ResolveRequest& request)
   at Autofac.Core.Container.Autofac.IComponentContext.ResolveComponent(ResolveRequest& request)
   at Autofac.Builder.StartableManager.StartStartableComponents(IDictionary`2 properties, IComponentContext componentContext)
   at Autofac.ContainerBuilder.Build(ContainerBuildOptions options)
   at Autofac.Extensions.DependencyInjection.AutofacServiceProviderFactory.CreateServiceProvider(ContainerBuilder containerBuilder)
   at Microsoft.Extensions.Hosting.Internal.ServiceFactoryAdapter`1.CreateServiceProvider(Object containerBuilder)
   at Microsoft.Extensions.Hosting.HostBuilder.InitializeServiceProvider()
   at Microsoft.Extensions.Hosting.HostBuilder.Build()
   at Azure.IIoT.OpcUa.Publisher.Module.Program.RunAsync(String[] args, CancellationToken ct) in /__w/1/s/Industrial-IoT/src/Azure.IIoT.OpcUa.Publisher.Module/src/Program.cs:line 71
   at Azure.IIoT.OpcUa.Publisher.Module.Program.Main(String[] args) in /__w/1/s/Industrial-IoT/src/Azure.IIoT.OpcUa.Publisher.Module/src/Program.cs:line 59

This blocks the startup of the OPC Publisher.

We also tried to specify the max memory over the argument --memory=2g for the docker container. This temporally fixes the issue for the memory parsing, but then the cpu.max throws an error.

To Reproduce

  • I can only reproduce it with our host server on RHEL 9.3 in a docker container.
  • Just wanted to start up the OPC Publisher in the docker container with the image mcr.microsoft.com/iotedge/opc-publisher:2.9.8.

Expected behavior

  • No exception is thrown and the Publisher startups up
  • ?When there is an error in the ResourceMonitoring the OPC Publisher still can startup?

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

$ cat /etc/os-release
NAME="Red Hat Enterprise Linux"
VERSION="9.3 (Plow)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="9.3"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Red Hat Enterprise Linux 9.3 (Plow)"
ANSI_COLOR="0;31"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:redhat:enterprise_linux:9::baseos"
HOME_URL="[https://www.redhat.com/"](https://www.redhat.com/%22)
DOCUMENTATION_URL="[https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9"](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9%22)
BUG_REPORT_URL="[https://bugzilla.redhat.com/"](https://bugzilla.redhat.com/%22)
 
REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 9"
REDHAT_BUGZILLA_PRODUCT_VERSION=9.3
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.3"

Docker

Docker version 26.1.4-1

Additional context
Between OPC publisher versino 2.9.6 and 2.9.8 new metrics (services.AddResourceMonitoring();) were added which is actually nice 😊. This was done over a library called Microsoft.Extensions.Diagnostics.ResourceMonitoring. It seems that there is a bug in version 8.6.0 which can't handle the default value called max.

I created an issue for the bug in the library: dotnet/extensions#5241

@marcschier marcschier added the bug Something isn't working label Jun 21, 2024
@marcschier marcschier added this to the 2.9.9 milestone Jun 21, 2024
@marcschier
Copy link
Collaborator

Looks like workaround is to fallback to 8.4.0. That should be easy until it is fixed in an upcoming release.

@marcschier
Copy link
Collaborator

I reverted the package back, there is a preview build of 2.9.9 with 8.4.0 that has tag 2.9.9-preview2 (mcr.microsoft.com/iotedge/opc-publisher:2.9.9-preview2).

@morti12, it would be fantastic if you could try this (ASAP) and let me know if this fixes your issue? I would like to release 2.9.9 mid this week and this would help decide which build to go with. Otherwise (or there are CVE's introduced by the older version) we will go with 8.6.0 for the 2.9.9 release.

@marcschier marcschier modified the milestones: 2.9.9, 2.9.10 Jun 24, 2024
@marcschier
Copy link
Collaborator

Moving to 2.9.10 but waiting for validation on 2.9.9-preview2 that the issue is worked around.

@morti12
Copy link
Author

morti12 commented Jun 25, 2024

We tried it with the opc-publisher:2.9.9-preview2 image and we still get an error but a different one:

Unhandled exception. Autofac.Core.DependencyResolutionException: An exception was thrown while activating Azure.IIoT.OpcUa.Publisher.Services.PublisherDiagnosticCollector -> λ:Microsoft.Extensions.Diagnostics.ResourceMonitoring.IResourceMonitor -> Microsoft.Extensions.Diagnostics.ResourceMonitoring.ResourceMonitorService -> Microsoft.Extensions.Diagnostics.ResourceMonitoring.Linux.LinuxUtilizationProvider.
---> Autofac.Core.DependencyResolutionException: An exception was thrown while invoking the constructor 'Void .ctor(Microsoft.Extensions.Options.IOptions`1[Microsoft.Extensions.Diagnostics.ResourceMonitoring.ResourceMonitoringOptions], Microsoft.Extensions.Diagnostics.ResourceMonitoring.Linux.LinuxUtilizationParser, System.Diagnostics.Metrics.IMeterFactory, System.TimeProvider)' on type 'LinuxUtilizationProvider'.
---> System.IO.DirectoryNotFoundException: Could not find a part of the path '/sys/fs/cgroup/memory/memory.limit_in_bytes'.

We checked the version 8.4.0 of the Microsoft.Extensions.Diagnostics.ResourceMonitoring and it does not support cgroup v2. This was added in the version 8.5.0.

Would it be an option to just disable the ResourceMonitoring over an command line option until they fix this issue in the ResourceMonitor?

@marcschier
Copy link
Collaborator

That is a bummer. If you are ok with a switch, I will add this to 2.9.9.

@marcschier marcschier modified the milestones: 2.9.10, 2.9.9 Jun 25, 2024
@morti12
Copy link
Author

morti12 commented Jun 25, 2024

This would be really nice 👍 Thanks! I think the fix in the ResourceMonitoring library will take some time.

@marcschier
Copy link
Collaborator

@morti12, could you try out the "2.9.9-preview3" tag of opc-publisher container image, running it with the --dr command line option? It will disable resource monitoring. if it works for you, let me know, we will release 2.9.9 then this week (barring any unforeseen issues).

I will track this again under 2.9.10 and monitor the linked issue.

@marcschier marcschier modified the milestones: 2.9.9, 2.9.10 Jun 25, 2024
@morti12
Copy link
Author

morti12 commented Jun 26, 2024

@marcschier, we checked it and it works. The OPC Publisher 2.9.9-preview3 is starting up 💪 thanks 👍

@marcschier
Copy link
Collaborator

2.9.9 image has been released.

@alxy
Copy link

alxy commented Jul 9, 2024

I have just tried to upgrade as well to 2.9.9-preview3 in a fairly standard Iot edge setup (running on Ubuntu, IotEdge is installed as per documentation), and I got the same problem:

                                   2.9.9.3+926a31bdda (.NET 8.0.6/linux-x64/OPC Stack 1.5.374.70)

Unhandled exception. Autofac.Core.DependencyResolutionException: An exception was thrown while activating Azure.IIoT.OpcUa.Publisher.Services.PublisherDiagnosticCollector -> λ:Microsoft.Extensions.Diagnostics.ResourceMonitoring.IResourceMonitor -> Microsoft.Extensions.Diagnostics.ResourceMonitoring.ResourceMonitorService -> Microsoft.Extensions.Diagnostics.ResourceMonitoring.Linux.LinuxUtilizationProvider.
 ---> Autofac.Core.DependencyResolutionException: An exception was thrown while invoking the constructor 'Void .ctor(Microsoft.Extensions.Options.IOptions`1[Microsoft.Extensions.Diagnostics.ResourceMonitoring.ResourceMonitoringOptions], Microsoft.Extensions.Diagnostics.ResourceMonitoring.Linux.ILinuxUtilizationParser, System.Diagnostics.Metrics.IMeterFactory, System.TimeProvider)' on type 'LinuxUtilizationProvider'.
 ---> System.InvalidOperationException: Could not parse '/sys/fs/cgroup/memory.max' content. Expected to find available memory in bytes but got 'max
' instead.
   at Microsoft.Shared.Diagnostics.Throw.InvalidOperationException(String message)
   at Microsoft.Extensions.Diagnostics.ResourceMonitoring.Linux.LinuxUtilizationParserCgroupV2.GetAvailableMemoryInBytes()
   at Microsoft.Extensions.Diagnostics.ResourceMonitoring.Linux.LinuxUtilizationProvider..ctor(IOptions`1 options, ILinuxUtilizationParser parser, IMeterFactory meterFactory, TimeProvider timeProvider)
   at lambda_method229(Closure, Object[])
   at Autofac.Core.Activators.Reflection.BoundConstructor.Instantiate()
   --- End of inner exception stack trace ---
   at Autofac.Core.Activators.Reflection.BoundConstructor.Instantiate()
   at Autofac.Core.Activators.Reflection.ReflectionActivator.<>c__DisplayClass14_0.<UseSingleConstructorActivation>b__0(ResolveRequestContext context, Action`1 next)
   at Autofac.Core.Resolving.Middleware.DelegateMiddleware.Execute(ResolveRequestContext context, Action`1 next)
   at Autofac.Core.Resolving.Pipeline.ResolvePipelineBuilder.<>c__DisplayClass14_0.<BuildPipeline>b__1(ResolveRequestContext context)
   at Autofac.Core.Resolving.Middleware.DisposalTrackingMiddleware.Execute(ResolveRequestContext context, Action`1 next)
   at Autofac.Core.Resolving.Pipeline.ResolvePipelineBuilder.<>c__DisplayClass14_0.<BuildPipeline>b__1(ResolveRequestContext context)
   at Autofac.Core.Resolving.Middleware.ActivatorErrorHandlingMiddleware.Execute(ResolveRequestContext context, Action`1 next)
   --- End of inner exception stack trace ---
   at Autofac.Core.Resolving.Middleware.ActivatorErrorHandlingMiddleware.Execute(ResolveRequestContext context, Action`1 next)
   at Autofac.Core.Resolving.Pipeline.ResolvePipelineBuilder.<>c__DisplayClass14_0.<BuildPipeline>b__1(ResolveRequestContext context)
   at Autofac.Core.Pipeline.ResolvePipeline.Invoke(ResolveRequestContext context)
   at Autofac.Core.Resolving.Middleware.RegistrationPipelineInvokeMiddleware.Execute(ResolveRequestContext context, Action`1 next)
   at Autofac.Core.Resolving.Pipeline.ResolvePipelineBuilder.<>c__DisplayClass14_0.<BuildPipeline>b__1(ResolveRequestContext context)
   at Autofac.Core.Resolving.Middleware.SharingMiddleware.<>c__DisplayClass5_0.<Execute>b__0()
   at Autofac.Core.Lifetime.LifetimeScope.CreateSharedInstance(Guid id, Func`1 creator)
   at Autofac.Core.Lifetime.LifetimeScope.CreateSharedInstance(Guid primaryId, Nullable`1 qualifyingId, Func`1 creator)
   at Autofac.Core.Resolving.Middleware.SharingMiddleware.Execute(ResolveRequestContext context, Action`1 next)
   at Autofac.Core.Resolving.Pipeline.ResolvePipelineBuilder.<>c__DisplayClass14_0.<BuildPipeline>b__1(ResolveRequestContext context)
   at Autofac.Core.Resolving.Pipeline.ResolvePipelineBuilder.<>c__DisplayClass14_0.<BuildPipeline>b__1(ResolveRequestContext context)
   at Autofac.Core.Resolving.Middleware.CircularDependencyDetectorMiddleware.Execute(ResolveRequestContext context, Action`1 next)
   at Autofac.Core.Resolving.Pipeline.ResolvePipelineBuilder.<>c__DisplayClass14_0.<BuildPipeline>b__1(ResolveRequestContext context)
   at Autofac.Extensions.DependencyInjection.KeyedServiceMiddleware.Execute(ResolveRequestContext context, Action`1 next)
   at Autofac.Core.Resolving.Pipeline.ResolvePipelineBuilder.<>c__DisplayClass14_0.<BuildPipeline>b__1(ResolveRequestContext context)
   at Autofac.Core.Pipeline.ResolvePipeline.Invoke(ResolveRequestContext context)
   at Autofac.Core.Resolving.ResolveOperation.GetOrCreateInstance(ISharingLifetimeScope currentOperationScope, ResolveRequest& request)
   at Autofac.Core.Resolving.ResolveOperation.ExecuteOperation(ResolveRequest& request)
   at Autofac.Core.Resolving.ResolveOperation.Execute(ResolveRequest& request)
   at Autofac.Core.Lifetime.LifetimeScope.ResolveComponent(ResolveRequest& request)
   at Autofac.Core.Container.ResolveComponent(ResolveRequest& request)
   at Autofac.Core.Container.Autofac.IComponentContext.ResolveComponent(ResolveRequest& request)
   at Autofac.Builder.StartableManager.StartStartableComponents(IDictionary`2 properties, IComponentContext componentContext)
   at Autofac.ContainerBuilder.Build(ContainerBuildOptions options)
   at Autofac.Extensions.DependencyInjection.AutofacServiceProviderFactory.CreateServiceProvider(ContainerBuilder containerBuilder)
   at Microsoft.Extensions.Hosting.Internal.ServiceFactoryAdapter`1.CreateServiceProvider(Object containerBuilder)
   at Microsoft.Extensions.Hosting.HostBuilder.InitializeServiceProvider()
   at Microsoft.Extensions.Hosting.HostBuilder.Build()
   at Azure.IIoT.OpcUa.Publisher.Module.Program.RunAsync(String[] args, CancellationToken ct) in /__w/1/s/Industrial-IoT/src/Azure.IIoT.OpcUa.Publisher.Module/src/Program.cs:line 71
   at Azure.IIoT.OpcUa.Publisher.Module.Program.Main(String[] args) in /__w/1/s/Industrial-IoT/src/Azure.IIoT.OpcUa.Publisher.Module/src/Program.cs:line 59

Do I need to do something else to get this release running?

@morti12
Copy link
Author

morti12 commented Jul 9, 2024

In the preview 2.9.9-preview3 @marcschier introduced an cli argument / flag to disable the Resource Monitoring. You just need to add the --dr argument on the startup. Changes can be found here: 926a31b

@marcschier marcschier modified the milestones: 2.9.10, 2.9.11 Jul 27, 2024
@marcschier marcschier modified the milestones: 2.9.11, 2.9.12 Aug 8, 2024
@marcschier
Copy link
Collaborator

Still waiting for official build of the nuget, no updates therefore moving to 2.9.12.

@marcschier marcschier modified the milestones: 2.9.12, 2.9.11 Aug 15, 2024
@marcschier
Copy link
Collaborator

This appears went into 8.8.0 nuget (release/8.8 branch, but there is no tag yet in this repo nor release note). 2.9.11 uses 8.8.0 nuget so will close this in 2.9.11 now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants