-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to push app in Cloud Foundry on Mesos environment #3
Comments
And the following is the log of the docker container which acted as Mesos executor and created from the image "jianhuiz/diego-cell":
|
And I also see something in rep.stdout.log, not sure if it is related.
|
And this is what I see in garden.stdout.log:
|
It appears that the consul agent wasn't successfully started, you may want to check the consul for details. You can also start the diego-cell container with |
The timestamps are quite different, so it's a bit hard to guess at what's happening. The slave logs end at So, for instance, the garden logs suggest it was never asked to create a container for the staging task, but the timestamps are so far behind when then mesos slave was even asked to create the cell, I wonder if those logs are just misleading. Or maybe the NTP config on your VMs are out of whack? Based on the rep logs, looks like at least 2 important things aren't working. Inside the executor container, garden should be running, bound to port 7777, and listening on either 127.0.0.1 or 0.0.0.0, but seems like the rep is not able to talk to it (both rep and garden should be running in the same executor container). Also looks like it can't look up the bbs address, which may be an issue with how consul is running inside that executor container. You could check if the consul agent is running in the executor container in agent mode, and has joined the consul membership ring by checking its logs. |
Thanks @jianhuiz and @Amit-PivotalLabs. I manually ran the diego-cell container with the command below:
And then inside the container, I ran "/entrypoint.sh /executor -logtostderr=true" (I changed the entrypoint.sh by adding a "sleep 10000" in it so that it can hang there), and found:
So it seems indeed consul agent wasn't successfully started, and then I manually ran this in the container:
So the exit code shows something wrong, but no any other detailed info, there is no consul agent log:
So is there a way that I can debug why consul agent can not started? Thanks! |
The /var/vcap/jobs/consul_agent/bin/agent_ctl script should log std{out,err} to /var/vcap/sys/log/monit/consul_agent.{out,err}.log If it gets to the point of actually running (the thin wrapper around) the /var/vcap/sys/log/consul_agent/consul_agent.std{out,err}.log Can you check the logs in those locations? On Fri, Jan 1, 2016 at 1:32 AM, eric-nuaa [email protected] wrote:
|
I found the root cause why consul agent was not started. In the script "/var/vcap/jobs/consul_agent/bin/agent_ctl"in the diego-cell container, there is a line:
This command failed! I reran it manually in the diego-cell container:
The reason that command failed is the storage driver of my docker host is aufs
However, the command "setcap" does not work on aufs, see moby/moby#5650 for details. So I changed the storage driver from aufs to devicemapper by following the steps in this link: http://muehe.org/posts/switching-docker-from-aufs-to-devicemapper/ Now I can see consul agent starts, but later it still failed, here is its log:
It seems consul agent always fails to join ... |
When I manually start consul agent in diego-cell container, I see the following in consul server's log:
So it appears that consul server has encryption configured, but the consul agent in diego-cell container does not have it configured. How can I disable encryption for consul when deploying it with Cloud Foundry? |
Assuming you generated the BOSH-Lite manifest for Cloud Foundry using the
|
Thanks @Amit-PivotalLabs, I have successfully deployed Cloud Foundry on Mesos :-) |
It seems my CF on Mesos env not stable, sometimes I can push app successfully, but sometimes I can not:
Any idea about what happened? |
Couple questions:
|
Can you please let me know where I can get app staging logs? Here is what I see for a successful push:
And in Mesos slave log, I see "Staring container ...":
I will check more logs for the failed push. |
Hmm, not sure what
The following push shows much more output (the name of my app is
I'm not sure yet why those logs aren't showing up for you, but perhaps for simplicity you would like to open that as a separate issue. For why it's actually not working, mesos slave is starting the executor container. Then you need to see what's happening inside the executor container. There should be several processes in there, |
Sorry to comment on an old thread, but since it's still open I figured it might be okay. First off -- a big thanks to all the commenters in here. I hit essentially all of the same problems, and was also to get through them by walking through this. I can successfully push apps, and scale them through The problem I am having, though, is when I launch the test appliation from @jianhuiz (https://github.com/jianhuiz/cf-apps), I get a 502 error code when trying to visit the URL. When I test the app with just the Diego backend, everything works, but once I switch over to Mesos, I get 502s. I've traced through haproxy and the gorouter, and that all seems fine. I dumped the routes in the gorouter, and found then when I tried to hit the URLs I get connection failed. For example, when I dump the gorouter routes, I see this for my app (with two instances):
I can ping that IP just fine, but no luck with curl
This was done from the gorouter VM. That IP address is the IP for the mesos-slave. where I get really confused is that when I visit the mesos slave, there is nobody listening on those TCP ports.
If I From within the Docker container, there isn't anybody listening there either. When I look at how the Garden container was launched, the config seems to be right, with flags like but again, using netstat, no one is listening on port 60000. Am I missing something? When I watched @jianhuiz youtube video demo (https://youtu.be/2XZK3Mu32-s) I notice that he never visits the app in a browser. Can you comment on whether this part of things actually should be working? Otherwise, I have been able to recreate everything I've seen in the demos. Cool stuff! |
Hi,
I followed the steps in https://github.com/mesos/cloudfoundry-mesos/blob/master/docs/getting-started.md to set up a Cloud Foundry on Mesos env, basically in my env, there is only one physical machine which has vagrant and virtualbox installed, and there are two VMs on this physical machine:
And I also patched auctioneer so that it can register Mesos as a framework, see the Mesos master log:
Everything looks good at this point, and then I pushed the hello world app (https://github.com/jianhuiz/cf-apps/tree/master/hello) from the physical machine, but it failed:
The following is what I saw in Mesos slave log for how a task was handled:
In the above log, I can see the task can be started and running, but later (about 1 min) it finished, and then the executor exited which seems not correct ...
Any help will be appreciated!
BTW, before I patched auctioneer, the app can be pushed successfully and running very well in the original Cloud Foundry + Diego env (without integrating with Mesos)
The text was updated successfully, but these errors were encountered: