Skip to content
This repository has been archived by the owner on May 6, 2020. It is now read-only.

Dockerhub image from linuxserver works with runc but not cc-runtime #986

Open
eadamsintel opened this issue Feb 2, 2018 · 14 comments
Open

Comments

@eadamsintel
Copy link

eadamsintel commented Feb 2, 2018

When testing a popular docker hub image called linuxserver/radarr (10 million pulls) you can't connect to port 7878 from a browser when using cc-runtime but runc works as expected.

First create a config directory at /config

mkdir /config

Run the container and attempt to go to http://:7878 and it works under runc but won't connect under cc-runtime.

docker run -d --runtime=runc --name=radarr -v /config:/config -p 7878:7878 linuxserver/radarr
This works and you can go to http://localhost:7878

docker run -d --runtime=cc-runtime --name=radarr -v /config:/config -p 7878:7878 linuxserver/radarr
This does not work and http://localhost:7878 times out

Trying the same thing with an nginx container works fine but the nginx container monitors port 80 but passing in 7878 as the host port to use still works.

cc-runtime version 3.0.16
runc version 1.0.0-rc4+dev
docker version 17.09.1
Clear Linux version 20650

@grahamwhaley
Copy link
Contributor

Let's start with an ack - I can re-create the issue here as well.
Now, I wonder where the problem is - is it going to be inside the container/VM, or outside... my gut tells me there will either be maybe:

  • something funky about mapping high number ports inside the container out to the host - maybe our kernel or agent namespacing is 'blocking' them somehow
  • or something slightly 'further out' onto the host side, to do with QEMU/KVM maybe.

I'm going to look into how we check out and track the port mappings both in the container (which might mean we have to enable the VM OS debug shell), and on the host side (which might mean digging into docker namespaces).

@sboeuf @amshinde - any ideas from your side around the agent/networking/port mapping side?

@sboeuf
Copy link
Contributor

sboeuf commented Feb 22, 2018

@grahamwhaley no idea on the top of my head, this needs further investigations.

@jodh-intel
Copy link
Contributor

Hi @eadamsintel - please can you:

@grahamwhaley
Copy link
Contributor

I'm having a peek at this btw...

@grahamwhaley
Copy link
Contributor

OK, some more info.
I noticed inside the container that with cc we are cycling through pids for

abc 2149 202 0 14:05 ? 00:00:00 mono --debug Radarr.exe -nobrows

whereas we don't with runc.

If you run the docker command with -ti and drop the -d, then you find that for cc we get a repeating

Press enter to exit...

prompt appearing over and over. I suspect therefore that something is upsetting and/or not working for the mono invocation, and it is stuck in a retry loop. Hence, the server is not up, so we cannot connect to the 7878 port. afaict, the port looks mapped on the host side btw - I think this is therefore likely not a portmap issue, but a mono execution issue.

@grahamwhaley
Copy link
Contributor

Not sure how much this is going to help somebody (I have yet to digest it), but...

  • if you run the container with a bash shell
  • and go down to /var/run/s6/services/radarr
  • copy the run file there to a backup, and then make that run benign with something like a tail -f /dev/null to stop the system trying to restart the broken-ness
  • and then hand run the command:

cd /opt/radarr; mono --debug Radarr.exe --nobrowser -data=/config

Then I end up with:

[Fatal] ConsoleApp: EPIC FAIL!

[v0.2.0.935] NzbDrone.Core.Datastore.CorruptDatabaseException: Database file: /config/nzbdrone.db is corrupt, restore from backup if available. See: https://github.com/Radarr/Radarr/wiki/FAQ#i-am-getting-an-error-database-disk-image-is-malformed ---> System.Data.SQLite.SQLiteException: disk I/O error
disk I/O error
  at System.Data.SQLite.SQLite3.Prepare (System.Data.SQLite.SQLiteConnection cnn, System.String strSql, System.Data.SQLite.SQLiteStatement previous, System.UInt32 timeoutMS, System.String& strRemain) [0x0033c] in <61a20cde294d4a3eb43b9d9f6284613b>:0
  at System.Data.SQLite.SQLiteCommand.BuildNextCommand () [0x000f6] in <61a20cde294d4a3eb43b9d9f6284613b>:0
  at System.Data.SQLite.SQLiteCommand.GetStatement (System.Int32 index) [0x00008] in <61a20cde294d4a3eb43b9d9f6284613b>:0
  at (wrapper remoting-invoke-with-check) System.Data.SQLite.SQLiteCommand.GetStatement(int)
  at System.Data.SQLite.SQLiteDataReader.NextResult () [0x0011e] in <61a20cde294d4a3eb43b9d9f6284613b>:0
  at System.Data.SQLite.SQLiteDataReader..ctor (System.Data.SQLite.SQLiteCommand cmd, System.Data.CommandBehavior behave) [0x00090] in <61a20cde294d4a3eb43b9d9f6284613b>:0
  at (wrapper remoting-invoke-with-check) System.Data.SQLite.SQLiteDataReader..ctor(System.Data.SQLite.SQLiteCommand,System.Data.CommandBehavior)
  at System.Data.SQLite.SQLiteCommand.ExecuteReader (System.Data.CommandBehavior behavior) [0x0000c] in <61a20cde294d4a3eb43b9d9f6284613b>:0
  at System.Data.SQLite.SQLiteCommand.ExecuteNonQuery (System.Data.CommandBehavior behavior) [0x00006] in <61a20cde294d4a3eb43b9d9f6284613b>:0
  at System.Data.SQLite.SQLiteCommand.ExecuteNonQuery () [0x00006] in <61a20cde294d4a3eb43b9d9f6284613b>:0
  at System.Data.SQLite.SQLiteConnection.Open () [0x00959] in <61a20cde294d4a3eb43b9d9f6284613b>:0
  at FluentMigrator.Runner.Processors.GenericProcessorBase.EnsureConnectionIsOpen () [0x0000e] in C:\Users\Mark\Source\Repos\fluentmigrator\src\FluentMigrator.Runner\Processors\GenericProcessorBase.cs:54
  at FluentMigrator.Runner.Processors.SQLite.SQLiteProcessor.Exists (System.String template, System.Object[] args) [0x00000] in C:\Users\Mark\Source\Repos\fluentmigrator\src\FluentMigrator.Runner\Processors\SQLite\SQLiteProcessor.cs:78
  at FluentMigrator.Runner.Processors.SQLite.SQLiteProcessor.TableExists (System.String schemaName, System.String tableName) [0x00000] in C:\Users\Mark\Source\Repos\fluentmigrator\src\FluentMigrator.Runner\Processors\SQLite\SQLiteProcessor.cs:47
  at FluentMigrator.Runner.VersionLoader.get_AlreadyCreatedVersionTable () [0x00000] in C:\Users\Mark\Source\Repos\fluentmigrator\src\FluentMigrator.Runner\VersionLoader.cs:124
  at FluentMigrator.Runner.VersionLoader.LoadVersionInfo () [0x00028] in C:\Users\Mark\Source\Repos\fluentmigrator\src\FluentMigrator.Runner\VersionLoader.cs:160
  at FluentMigrator.Runner.VersionLoader..ctor (FluentMigrator.Runner.IMigrationRunner runner, FluentMigrator.Infrastructure.IAssemblyCollection assemblies, FluentMigrator.IMigrationConventions conventions) [0x00077] in C:\Users\Mark\Source\Repos\fluentmigrator\src\FluentMigrator.Runner\VersionLoader.cs:50
  at FluentMigrator.Runner.MigrationRunner..ctor (FluentMigrator.Infrastructure.IAssemblyCollection assemblies, FluentMigrator.Runner.Initialization.IRunnerContext runnerContext, FluentMigrator.IMigrationProcessor processor) [0x00167] in C:\Users\Mark\Source\Repos\fluentmigrator\src\FluentMigrator.Runner\MigrationRunner.cs:102
  at FluentMigrator.Runner.MigrationRunner..ctor (System.Reflection.Assembly assembly, FluentMigrator.Runner.Initialization.IRunnerContext runnerContext, FluentMigrator.IMigrationProcessor processor) [0x00000] in C:\Users\Mark\Source\Repos\fluentmigrator\src\FluentMigrator.Runner\MigrationRunner.cs:72
  at NzbDrone.Core.Datastore.Migration.Framework.MigrationController.Migrate (System.String connectionString, NzbDrone.Core.Datastore.Migration.Framework.MigrationContext migrationContext) [0x000b5] in C:\projects\radarr-usby1\src\NzbDrone.Core\Datastore\Migration\Framework\MigrationController.cs:58
  at NzbDrone.Core.Datastore.DbFactory.Create (NzbDrone.Core.Datastore.Migration.Framework.MigrationContext migrationContext) [0x00048] in C:\projects\radarr-usby1\src\NzbDrone.Core\Datastore\DbFactory.cs:84
   --- End of inner exception stack trace ---
  at NzbDrone.Core.Datastore.DbFactory.Create (NzbDrone.Core.Datastore.Migration.Framework.MigrationContext migrationContext) [0x00121] in C:\projects\radarr-usby1\src\NzbDrone.Core\Datastore\DbFactory.cs:116
  at NzbDrone.Core.Datastore.DbFactory.Create (NzbDrone.Core.Datastore.Migration.Framework.MigrationType migrationType) [0x00000] in C:\projects\radarr-usby1\src\NzbDrone.Core\Datastore\DbFactory.cs:56
  at NzbDrone.Core.Datastore.DbFactory.RegisterDatabase (NzbDrone.Common.Composition.IContainer container) [0x00000] in C:\projects\radarr-usby1\src\NzbDrone.Core\Datastore\DbFactory.cs:36
  at Radarr.Host.NzbDroneServiceFactory.Start () [0x00037] in C:\projects\radarr-usby1\src\NzbDrone.Host\ApplicationServer.cs:60
  at Radarr.Host.Router.Route (Radarr.Host.ApplicationModes applicationModes) [0x00067] in C:\projects\radarr-usby1\src\NzbDrone.Host\Router.cs:38
  at Radarr.Host.Bootstrap.Start (Radarr.Host.ApplicationModes applicationModes, NzbDrone.Common.EnvironmentInfo.StartupContext startupContext) [0x0003d] in C:\projects\radarr-usby1\src\NzbDrone.Host\Bootstrap.cs:71
  at Radarr.Host.Bootstrap.Start (NzbDrone.Common.EnvironmentInfo.StartupContext startupContext, Radarr.Host.IUserAlert userAlert, System.Action`1[T] startCallback) [0x00075] in C:\projects\radarr-usby1\src\NzbDrone.Host\Bootstrap.cs:39
  at NzbDrone.Console.ConsoleApp.Main (System.String[] args) [0x0000e] in C:\projects\radarr-usby1\src\NzbDrone.Console\ConsoleApp.cs:27

Press enter to exit...

ah, ok, that is a 'database fail' on /config, which smells like 9pfs issues to me... let's try...

mkdir /dev/shm/config
cd /opt/radarr; mono --debug Radarr.exe --nobrowser -data=/dev/shm/config

to place the db on a tmpfs (ramfs) in the container - and - voila - we don't get the catastrophic failure, and I can browse the container on 7878.

/cc @eadamsintel - I think there is the root of the issue ;-)

@sboeuf
Copy link
Contributor

sboeuf commented Feb 23, 2018

@grahamwhaley oh nice and quick debug !
What's the next step ? Because it's 9p issue, does that mean we cannot expect this to work ?

@grahamwhaley
Copy link
Contributor

:-( I'd have to take the next step in debug to be decisive - we'd have to know exactly what failed with the 9pfs mounted files - I suspect it will be one of the 'unlink' related issues. Normally I use strace to find that, but for mono, which is a JIT'd VM, I wonder how well that will work? :-)

Short term, at least we know what the problem is.
Mid term, we could re-visit the 9p patch sets and also look at what runv is carrying and see if we can improve the situation.
Long term, we need a more POSIX compliant fs solution.

@sboeuf
Copy link
Contributor

sboeuf commented Feb 23, 2018

@grahamwhaley using devmapper might solve this issue then (unless the file that needs to be accessed is passed through 9p as an extra mount on top of the rootfs).

@grahamwhaley
Copy link
Contributor

yeah, I considered that - it is a -v volume mapping, which I think always goes as a 9p mount, doesn't it? (/cc @amshinde ) Which, surprised me a couple of weeks ago, but having seen a recent conversation, I think we don't block mount volumes apart from the (readonly?) rootfs, as then the 'device' would be double mounted - once in host and once on the guest, and there could then be fs write races between the two that [cw]ould then corrupt the FS....

@sboeuf
Copy link
Contributor

sboeuf commented Feb 23, 2018

Oh yeah... I haven't realized this was a -v assignment. In this case, we use 9p because we don't have the ability to package that into a block device that we could hotplug...

@amshinde
Copy link
Contributor

@grahamwhaley Yes the -v bindings are always passed using 9pfs. We havent implemented checks for verifying if the volume passed with -v is a mount backed by a block device. We do need to implement that, as we just handle this case with --device.

Maybe we can try this out, loopmount an image and pass the loop device as --device /dev/loop#/config and see if that helps.

@grahamwhaley
Copy link
Contributor

That's an idea @amshinde - hmm, I wonder if that is viable as an interim 'hack' to mount volumes into the VMs as block devices, by a loopback and device mount. It's worth a try to see if it does work and fixes the issue initially anyhow... I'll add it to my list.

@sboeuf
Copy link
Contributor

sboeuf commented Feb 23, 2018

This should work but don't expect good performances.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants