-
Notifications
You must be signed in to change notification settings - Fork 845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Marathon on CENTOS 7 Fails to start #7136
Comments
I have the same also. I deleted everthing in /var/lib/mesos and /var/lib/zookeeper and still I have this shit. I reported that mesos-dns is not correctly reporting ip addresses almost a year ago and it is still not fixed. |
Which version are you using? |
this 1.9.109-1.el7 and 1.9.136-1.el7 the same |
@f1-outsourcing, you should not blindly delete files. Also, remember that this is all open source. We are happy to accept bug fixes from you. @seanfulton, sorry for replying so late. Do you still have this issue? If I understood correctly, Marathon fails to load the old state. Was this after an upgrade? |
I am deleting files and remove configuration options to see if that results to something. Whatever I am changing I am only able to get marathon-1.7.216-9e2a9b579 working with mesos-1.10.0-2.0.1 This is what I posted to the marathon-framework mailing list: All of a sudden I having problems with marathon ui getting stuck at 'loading' and end points like http://m01.local:8081/v2/info are not responding (http://m01.local:8081/ping gives pong). I have now downgraded the test cluster to one node, running only mesos-master and zookeeper and marathon. Cleaning between tests the /var/lib/zookeeper and the /var/lib/mesos directories. I have also removed many of the configuration options I had, like ssl etc. I am only able to get to run marathon-1.7.216-9e2a9b579. marathon-1.8.222-86475ddac and marathon-1.10.17-c427ce965 are having the above mentioned errors/problem. I have been comparing the marathon 1.7 and marathon 1.8 logs and this what I have noticed. There are quite a bit of log statements missing between 'All services up and running. (mesosphere.marathon.MarathonApp:main' and 'akka://marathon/deadLetters' in the 1.8 log. Anyone had something similar? [@mesos-master]# rpm -qa | grep java [@mesos-master]# uname -a [@mesos-master]# cat /etc/redhat-release CentOS Linux release 7.8.2003 (Core) marathon 1.8 (unresponsive)Jun 7 17:40:59 m01 marathon: [2020-06-07 17:40:59,696] INFO All services up and running. (mesosphere.marathon.MarathonApp:main) Jun 7 17:41:13 m01 marathon: [2020-06-07 17:41:13,879] INFO Message [mesosphere.marathon.MarathonSchedulerActor$TasksReconciled$] from Actor[akka://marathon/user/MarathonScheduler/$a#1746491390] to Actor[akka://marathon/deadLetters] was not delivered. [1] dead letters encountered. If this is not an expected behavior, then [Actor[akka://marathon/deadLetters]] may have terminated unexpectedly, This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. (akka.actor.DeadLetterActorRef:marathon-akka.actor.default-dispatcher-7) marathon 1.7 (ok)Jun 7 17:37:02 m01 marathon: [2020-06-07 17:37:02,681] INFO All services up and running. (mesosphere.marathon.MarathonApp:main) Jun 7 17:37:16 m01 marathon: [2020-06-07 17:37:16,459] INFO Message [mesosphere.marathon.MarathonSchedulerActor$TasksReconciled$] from Actor[akka://marathon/user/MarathonScheduler/$a#-463341905] to Actor[akka://marathon/deadLetters] was not delivered. [1] dead letters encountered. If this is not an expected behavior, then [Actor[akka://marathon/deadLetters]] may have terminated unexpectedly, This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. (akka.actor.DeadLetterActorRef:marathon-akka.actor.default-dispatcher-8) |
@f1-outsourcing, could you attach the complete logs of Marathon 1.8 from when you start until you made a request to |
|
Is this useful? |
Is this being looked at still? |
If your strategy at D2iQ/mesosphere is to give 'shitty' support to marathon, because you want to push people into using DCOS. You should consider there is a flip side to that approach, I perceive this as:
Someone else reported the same issue[1] in March on your JIRA and his work-a-round of downgrading to Marathon 1.7. He also did not get any attention for 6 month's. Whether or not your software is open source, you should attend to such issues quicker, where people need to downgrade so many versions. |
Also not working What about this message: [info] [2020-07-18 13:14:42,675] INFO Message [mesosphere.marathon.MarathonSchedulerActor$TasksReconciled$] from Actor[akka://marathon/user/MarathonScheduler/$a#-1125671270] to Actor[akka://marathon/deadLetters] was not delivered. [1] dead letters encountered. If this is not an expected behavior, then [Actor[akka://marathon/deadLetters]] may have terminated unexpectedly, This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. (akka.actor.DeadLetterActorRef:marathon-akka.actor.default-dispatcher-8) |
debug log of http://test2.local:7070/v2/info [info] [2020-07-18 13:26:09,845] DEBUG Current State: LaunchTokens:100 OffersWanted:false Matchers:0 OfferQueues:0 UnprocessedOffers:0 (mesosphere.marathon.core.matcher.manager.impl.OfferMatcherManagerActor:marathon-akka.actor.default-dispatcher-9) |
tracelog of 1.8 after deploying a task with 1.7 [info] Loading project definition from /home/software/marathon2/project/project [�[33mwarn] Canceling execution... [�[31merror] I0718 15:24:56.504509 1575186 sched.cpp:2166] Asked to stop the driver [�[31merror] I0718 15:24:56.504631 1575172 sched.cpp:1204] Stopping framework 5262ced9-70e2-4c0d-9064-ab4173118409-0000 |
This is the v2/info request compared between 1.7.236 an 1.10.25 1.7.236
1.10.25
1.10.25 the request ends with these lines
where 1.7.236 continues like this
|
I have a new install of marathon/mesos/zookeeper on centos 7. I am using the RPMs (1.9.109). Everything fires up OK, but when I go to the marathon interface, it spins with Loading Applications ... and never finishes. The only thing I can find in the logs is this on the master:
INFO Found no roles suitable for revive repetition. (mesosphere.marathon.core.launchqueue.impl.ReviveOffersStreamLogic$ReviveRepeaterLogic:marathon-akka.actor.default-dispatcher-3)
Feb 08 10:35:25 nj-dcos01-cl01 marathon[581]: [2020-02-08 10:35:25,513] INFO Found no roles suitable for revive repetition. (mesosphere.marathon.core.launchqueue.impl.ReviveOffersStreamLogic$ReviveRepeaterLogic:marathon-akka.actor.default-dispatcher-2)
Feb 08 10:35:30 nj-dcos01-cl01 marathon[581]: [2020-02-08 10:35:30,513] INFO Found no roles suitable for revive repetition. (mesosphere.marathon.core.launchqueue.impl.ReviveOffersStreamLogic$ReviveRepeaterLogic:marathon-akka.actor.default-dispatcher-7)
Feb 08 10:35:32 nj-dcos01-cl01 marathon[581]: [2020-02-08 10:35:32,723] INFO Prompting Mesos for a heartbeat via explicit task reconciliation (mesosphere.marathon.core.heartbeat.MesosHeartbeatMonitor$$anon$1:marathon-akka.actor.default-dispatcher-8)
Feb 08 10:35:32 nj-dcos01-cl01 marathon[581]: [2020-02-08 10:35:32,726] INFO Received fake heartbeat task-status update (mesosphere.marathon.core.heartbeat.MesosHeartbeatMonitor:Thread-172)
Firewall/iptables is off. I can't submit a test job to marathon or create an app. Mesos seems to be working OK.
sean
The text was updated successfully, but these errors were encountered: