Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mono support #62

Closed
rogeralsing opened this issue Mar 3, 2014 · 63 comments
Closed

Mono support #62

rogeralsing opened this issue Mar 3, 2014 · 63 comments

Comments

@rogeralsing
Copy link
Contributor

We need to ensure that Akka.NET works on mono.

@rogeralsing rogeralsing added this to the Version 1 milestone Mar 3, 2014
@Aaronontheweb
Copy link
Member

Noob question: What are the steps for accomplishing this?

I've used Mono on OS X systems before but I'm not sure how to go about ensuring Mono compatibility with a library written from the ground-up in Visual Studio.

@rogeralsing
Copy link
Contributor Author

We have to ler this one wait, there is a bug in concurrentqueue on mono. There is a fix for it but dont know when they release it.

Other than that, it does seem to run on mono.

@rogeralsing
Copy link
Contributor Author

https://bugzilla.xamarin.com/show_bug.cgi?id=18182

There is also some issue with remoting on mono, it doesn't load the transports.
I think it might be the Ask support that fails there.

@Aaronontheweb
Copy link
Member

Got it. That's a bummer, but Mono support is probably outside the scope of Milestone 1 for the time being. I'm wholly supportive of doing it in the near future though.

@rogeralsing
Copy link
Contributor Author

This is apparently patched in mono 3.2.7, the latest installer for windows is however 3.2.3..
Will give it a try once there is an updated installer

@ghost
Copy link

ghost commented Mar 21, 2014

There is a 3.2.7 installer for Windows here: http://www.go-mono.com/mono-downloads/download.html

@rogeralsing
Copy link
Contributor Author

The actual download points to 3.2.3, or am I missing something?

@ghost
Copy link

ghost commented Mar 21, 2014

Right you are

@rogeralsing
Copy link
Contributor Author

Got a pro-tip from Greg Young, they have a local copy of the ConcurrentQueue in their codebase.
Inheriting ConcurrentQueue<t> in .NET, using custom code for Mono.

@mattnischan
Copy link
Contributor

I have a Debian install with a freshly compiled 3.4 that I can run the tests on in the next day or two.

@mattnischan
Copy link
Contributor

Ok,

Building is not going so well so far with Fake. I'm running debian stable and the fsharp package is not available, so I'm attempting to build it from sources, without success. I do have a linux build.sh which so far appears to do the right thing, though, just that Fake needs fsharpi to do its magic.

@mattnischan
Copy link
Contributor

Using Fake.Boot seems to be a problem here. Fake.Boot uses a part of NuGet.Core which references WeakEventManager. In Mono, WeakEventManager is mostly a bunch of throw new NotImplementedException();. So, Akka.Net will not build under Mono right now. Not sure which layer to attack that at, Fake, NuGet, or Mono.

Two questions: 1) Are we set on Fake, and 2) should I build elsewhere and just run the tests in Mono for now?

@mattnischan
Copy link
Contributor

Hey guys, I know I'm new to the party, but I would like to assist with this and I seem to mostly be talking to myself.

I've got a WeakEventManager implementation done now that NuGet will work with, but it's not really a release worthy solution to patch that for everyone's Mono builds that need to work with this. Would it be acceptable to create make script parallel to the fake build? The docs mention a Rake script also, but I don't see it in the current source.

@Aaronontheweb
Copy link
Member

Hi @mattnischan

So the Rake script is gone - I'll try to update that in the documentation.

As for your suggestion:

  • Long-term: let's see if we page the F# team @akkadotnet/fsharpteam to make a recommendation for how to add production-grade Mono support to FAKE.
  • Short-term: I think a build.mono.fsx file (or whatever) would be fine for the time being. Submit a PR and we'll take a look.

@Aaronontheweb
Copy link
Member

@mattnischan btw, could you point out where in the documentation we mention the RAKE script? We've... expanded our docs a lot since I wrote that :p

@mattnischan
Copy link
Contributor

@mattnischan
Copy link
Contributor

Made some progress with getting things to build using FAKE. I had to totally bypass Fake.Boot for the time being. However:

CS0234: Actor/FSM.cs(4,14): The type or namespace name 'Claims' does not exist in the namespace 'System.Security'. Are you missing an assembly reference?

After some research, it looks like Mono 3.4 doesn't have any of the ClaimsIdentity stuff at all. Taking a look at the source for 3.6 (which is finally out), it seems like some quick and dirty parts of it are working, which may be sufficient. I'm going to blow away my dev VM and start over with 3.6.

In the meantime, the NuGet.targets for pacakge restore is quite old and definitely doesn't work with xbuild, so I'll put in a PR to update that as well as adding a build.mono.fsx that at least won't barf.

@Aaronontheweb
Copy link
Member

@mattnischan the FSM actor shouldn't depend on claims at all - looks like that was some garbage that might have accidentally been ReSharper'd into the file by yours truly. I'll send a PR to remove it.

Aaronontheweb added a commit that referenced this issue Aug 21, 2014
removed Security.Claims header from FSM, which broke Mono buildper #62
@mattnischan
Copy link
Contributor

@Aaronontheweb I pulled that change into my branch and we have build success!

I started running the tests in Akka.Tests but I'm finding the tests are hanging the xunit runner about 30-45 tests in (at different points each run). I'm not sure if there are race conditions or if the xunit runner is not totally Mono safe yet, but I'll try and start tracking those answers down.

mattnischan added a commit to mattnischan/akka.net that referenced this issue Aug 21, 2014
mattnischan added a commit to mattnischan/akka.net that referenced this issue Aug 21, 2014
@rogeralsing
Copy link
Contributor Author

What version of mono are you using?

torsdag 21 augusti 2014 skrev Matt Nischan [email protected]:

@Aaronontheweb https://github.com/Aaronontheweb I pulled that change
into my branch and we have build success!

I started running the tests in Akka.Tests but I'm finding the tests are
hanging the xunit runner about 30-45 tests in (at different points each
run). I'm not sure if there are race conditions or if the xunit runner is
not totally Mono safe yet, but I'll try and start tracking those answers
down.


Reply to this email directly or view it on GitHub
#62 (comment).

@Aaronontheweb
Copy link
Member

@mattnischan I think @HCanber found an issue with the XUnit test runner in general. some of our tests randomly hang as well.

@HCanber
Copy link
Contributor

HCanber commented Sep 2, 2014

Ahh, ok. :)

@OlegZee
Copy link
Contributor

OlegZee commented Sep 26, 2014

Verified on Mono 3.8.0 (under OSX) and on 3.2.8 (under raspberry pi). Hangs after "Starting remoting" message. I have a will to contribute, just need some fresh hints/directions :)

@Aaronontheweb
Copy link
Member

@OlegZee Raspberry Pi - cool! Which example was it that hung on Mono?

@OlegZee
Copy link
Contributor

OlegZee commented Sep 26, 2014

@Aaronontheweb I created my own sample based on this Balancing Workload Across Nodes with Akka 2 article.
I'd like to run Workers in heterogeneous environment just as PoC.

@Aaronontheweb
Copy link
Member

@OlegZee very cool! Just a heads up though, the FluentConfiguration you're using to configure your application is experimental and may not make it into the v1.0 release (although it's really cool to see someone using it!)

So is it the Worker project's Program.cs that fails to start? I'm not a big Mono user, but the issue might very well be Helios (the socket middleware Akka.Remote runs on) - I've never tested to see if it can run stand-alone on Mono.

Here's something you could do that would really help: could you try cloning the Helios source and see if you can get this example to run on Mono? https://github.com/Aaronontheweb/helios/tree/master/samples/TimeService

If it doesn't then the issue is probably that some of my socket code has some Windows-specific dependencies.

@OlegZee
Copy link
Contributor

OlegZee commented Sep 26, 2014

@Aaronontheweb Worker Program starts, prints "Starting remoting" and keeps silence. So I'm going to look into helios. Thank you!

@OlegZee
Copy link
Contributor

OlegZee commented Sep 27, 2014

@Aaronontheweb I forgot mbp at office so was playing mostly on windows and old macmini with no mono debugger.
The good news is that Helios does work (I tried TimeService).
The bad news is that Akka does not work under Mono on Windows too. The log is pretty much long but behavior is different from OSX Mono. Here're the logs under windows and OSX.

As I have no clue on what's going on inside Akka, I need a debugger to get better feeling of code, but my installation of Xamarin under Windows does start .NET debugger, runs code under MS's .NET, and the code runs fine.

@rogeralsing
Copy link
Contributor Author

@Aaronontheweb does helios use ConcurrentQueue, if so, you need to snatch the code we use in order to be compatible with earlier versions of Mono for Windows (installer still use bugged code)

@mattnischan
Copy link
Contributor

Getting this running Worker.exe under Mono 3.8 OSX:

Cause: System.Net.Sockets.SocketException: The requested address is not valid in this context
at System.Net.Sockets.Socket.Bind (System.Net.EndPoint local_end) [0x00000] in :0
at Helios.Reactor.ReactorBase.Start () [0x00000] in :0
at Helios.Reactor.ReactorBase+ReactorConnectionAdapter.Open () [0x00000] in :0
at Akka.Remote.Transport.Helios.HeliosHelpers.Open () [0x00000] in :0
at Akka.Remote.Transport.Helios.CommonHandlers.Open () [0x00000] in :0
at Akka.Remote.Transport.Helios.HeliosTransport.Listen () [0x00000] in :0
etc.

@OlegZee
Copy link
Contributor

OlegZee commented Sep 27, 2014

@mattnischan I watched the same if I set wrong master path in .config file. Once config is ok, it just hangs.

@mattnischan
Copy link
Contributor

OK,

I'm not familiar enough to know what is going on, but it seems to be hanging waiting for a Task result in Remoting.Start():

var addressPromise = new TaskCompletionSource<IList<ProtocolTransportAddressPair>>();
_endpointManager.Tell(new EndpointManager.Listen(addressPromise));

addressPromise.Task.Wait(Provider.RemoteSettings.StartupTimeout); //Hangs here

@mattnischan
Copy link
Contributor

Some more details,

In EndpointManager.OnReceive, the block inside Listens.ContinueWith is never hit, thus the promise from Remoting is never fulfilled. I'm not sure why.

I do see a Prune message come through OnReceive() periodically, but I don't see a message handler for it except in Accept(), which doesn't get hit.

@Aaronontheweb
Copy link
Member

@rogeralsing the Helios Fibers use a BlockingCollection internally. Does that have any known compatibility issues with Mono?

@mattnischan @OlegZee so it sounds like the issue is that some part of the ProtocolStateActor doesn't hit the state it's supposed to hit (which completes the Promise you highlighted) in Mono.

Can you confirm the following for me?

  1. Are both workers able to successfully establish socket connections with each other? This happens as a prerequisite for the Endpoint and ProtocolStateActor.
  2. Is the ProtocolStateActor able to receive Heartbeat messages? This means that the connection is healthy and the FailureDetector on either end of the connection is working.

What this sounds like to me is an issue where either a network message is being dropped / never sent / never processed (the Associate message specifically) or there's something different about the TAP implementation on Mono that makes the chained promises coming from the AkkaProtocolHandle / ProtocolStateActor not work as expected.

Also: if you have a website you can link me to that can help me figure out how to put together a debugging setup for Mono on Windows or OS X, I'd appreciate it!

@mattnischan
Copy link
Contributor

@Aaronontheweb If you've got OS X, just load up MonoDevelop and Mono 3.8 from the Mono websites. There's really not much more to it.

@mattnischan
Copy link
Contributor

@Aaronontheweb If I set a breakpoint in the protected constructor for ProtocolStateActor, it doesn't get hit at all.

It's hard for me to tell without knowing more about Helios whether or not the socket connection is successful. I will say that HeliosTransport.Listen() doesn't appear to do anything funny, and newServerChannel.Open() seems to get an available port.

Occasionally when I pause debug after a hang, the call stack seems to indicate waiting on a WaitOne from BlockingCollection, called from Helios.Concurrency.Impl.DedicatedThreadPoolFiber.SpawnThreads.

@rogeralsing
Copy link
Contributor Author

@Aaronontheweb it depends on how you use it.
if you just new up a BlockingCollection, it will have bugs, since it will initialize a ConcurrentQueue<T> behind the scenes.

If you initialize it with some other sort of backing store, then it should be safe.

@mattnischan
Copy link
Contributor

Well, I'm testing on Mono 3.8, and I think OlegZee was also. Mono 3.6 and above have the fix for ConcurrentQueue.

@OlegZee
Copy link
Contributor

OlegZee commented Oct 1, 2014

Ok, all I found is pretty much clear without any debugging - something is wrong with transport initialization. Obviously it's hard to track the code flow and I give up.

@Aaronontheweb
Copy link
Member

So this might be the culprit here: https://github.com/Aaronontheweb/helios/blob/master/src/Helios/Concurrency/Impl/DedicatedThreadPoolFiber.cs#L18

 public class DedicatedThreadPoolFiber : IFiber
    {
        private readonly int _numThreads;
        private List<Thread> _threads;

        private readonly BlockingCollection<Action> _blockingCollection = new BlockingCollection<Action>(25000);

        public DedicatedThreadPoolFiber(int numThreads)
            : this(new BasicExecutor(), numThreads)
        {
        }

Occasionally when I pause debug after a hang, the call stack seems to indicate waiting on a WaitOne from BlockingCollection, called from Helios.Concurrency.Impl.DedicatedThreadPoolFiber.SpawnThreads.

@mattnischan sounds like, as @rogeralsing described, that the infamous ConcurrentQueue bug might be rearing its head under the hood here - no?

@mattnischan
Copy link
Contributor

@Aaronontheweb I don't think so. As was mentioned previously, we're both testing on version of mono (3.6+) that don't suffer from the same race condition in ConcurrentQueue that previous versions did. That's not to say that there might not be another race condition in it still, but I know the one that was previously reported has been resolved.

@mattnischan
Copy link
Contributor

I wonder if the WaitOne is really just a false positive while GetConsumingEnumerable blocks looking for queue items. That would seem to indicate a transport problem but I don't know why that would prevent the ActorSystem with remoting to start. Does an actor system using remoting communicate internally over the transport even if the system actors are local?

@Aaronontheweb
Copy link
Member

Does an actor system using remoting communicate internally over the transport even if the system actors are local?

No, remoting is only used when communicating with actors who have a non-local address. All local communication is still done in-memory.

If the actor system itself is locking, it's probably not a transport / Helios issue - it's probably something inside the initialization steps for the RemoteActorRefProvider that's blocking.

@OlegZee
Copy link
Contributor

OlegZee commented Oct 3, 2014

@Aaronontheweb @mattnischan I replaced BlockingCollection to List with a couple locks + making snapshot before I figured out _blockingCollection.Add() is not called ever a single time. So BlockingCollection is not the showstopper here.

As far as I understand transport is activated only after AssociationListenerPromise is raised but it never happens. As a result EndpointManager.Listens is never complete.

@rogeralsing
Copy link
Contributor Author

related to #694

@rogeralsing
Copy link
Contributor Author

This works fine now. closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants