Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple Docker Hosts discovery #20

Closed
dannygu opened this issue Jul 12, 2017 · 8 comments
Closed

Multiple Docker Hosts discovery #20

dannygu opened this issue Jul 12, 2017 · 8 comments
Labels

Comments

@dannygu
Copy link

dannygu commented Jul 12, 2017

Hi,

We have 2 hosts running several docker instances, we are trying to leverage consul to assist us in Hazelcast discovery.

I managed to get Hazelcast nodes listed In consul but seems as if HC refuses to connect:

12-Jul-2017 19:28:38.353 INFO [hz._hzInstance_1_docker_int.IO.thread-in-0] com.hazelcast.nio.tcp.TcpIpConnection.null [172.17.0.3]:5701 [docker_int] [3.8.3] Connect
ion[id=1, /172.17.0.3:42637->/172.17.0.6:5701, endpoint=[172.17.0.6]:5701, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side

This somewhat makes sense - we have 2 hosts running at 10.0.1.160 and 10.0.2.71, our HC nodes are set to listen on port 5701 and map to a random port on the host, i.e 10.0.1.160:42637 -> 172.17.0.3:5701.

Not really sure why the nodes are trying to connect in their internal IP, this obviously wont work, each docker container is running on a different hosts and they cannot communicate between 172.17.0.3 to 172.17.0.6, they can only communicate between 10.0.1.160 to 10.0.2.71 which is the real host IP.

I switched org.bitsofinfo.hazelcast.discovery.consul.LocalDiscoveryNodeRegistrator to DoNothingRegistrator and used Regisrator to automatically register my nodes in Consul, this worked fine and got the correct IP and Port from consul but now HC is refusing to connect stating 'This node is not requested endpoint' since it's looking for 172.17.0.3:5701 but consul returned 10.0.1.160:randomport.

Is it even possible to make this work on 2 different hosts with random ports ?

Thanks for the help!

here is my configuration:

     <properties>
          <property name="consul-host">10.0.1.160</property>
          <property name="consul-port">8500</property>
          <property name="consul-service-name">hz-cluster</property>
          <property name="consul-healthy-only">true</property>
          <property name="consul-service-tags">hazelcast, test1</property>
          <property name="consul-discovery-delay-ms">10000</property>

          <property name="consul-acl-token"></property>
          <property name="consul-ssl-enabled">false</property>
          <property name="consul-ssl-server-cert-file-path"></property>
          <property name="consul-ssl-server-cert-base64"></property>
          <property name="consul-ssl-server-hostname-verify">false</property>

          <property name="consul-registrator">org.bitsofinfo.hazelcast.discovery.consul.LocalDiscoveryNodeRegistrator</property>
          <property name="consul-registrator-config"><![CDATA[
                                    {
                                      "preferPublicAddress":true,
                                      "healthCheckProvider":"org.bitsofinfo.hazelcast.discovery.consul.TcpHealthCheckBuilder",
                                      "healthCheckTcp":"#MYIP:8080",
                                      "healthCheckTcpIntervalSeconds":30
                                    }
              ]]></property>
    </properties>
  </discovery-strategy>
</discovery-strategies>
@bitsofinfo
Copy link
Owner

bitsofinfo commented Jul 12, 2017

also did you read the README section on "Containerization (Docker) notes"?

I.E. the LocalDiscoveryNodeRegistrator just leverages the ip detected from WITHIN the container and registers that, (your 172..) so obviously that won't work

Have you looked at just running your stuff in docker swarm mode w/ https://github.com/bitsofinfo/hazelcast-docker-swarm-discovery-spi ?

Its way better way to go if running hz apps in docker, hazelcast has too many interface/binding issues when running in containers mentioned in numerous hazelcast issues.

see below for how I did it

@bitsofinfo
Copy link
Owner

bitsofinfo commented Jul 12, 2017

i.e. what I had to do for a non-swarm docker host to docker host hz app setup was below. Its a chicken before the egg kind of issue.

  • Have the app leverage: https://github.com/bitsofinfo/docker-discovery-registrator-consul this will register a non-hz service in consul for the app. This service will be registered w/ the docker host's ip and mapped port -> 5701

  • Next start each container on each docker host w/ the -e env vars per the docker-discovery-registrator-consul docs + the registrator required env vars

  • BEFORE initializing HZ in the app, use the docker-discovery-registrator-consul library to determine the container's hosts public ip/port (i.e. your 10.x addr/mapped port), and set that to a system property (i.e. System.setProperty("hz.consul.client.ip",xx.xx.xx.xx) and System.setProperty("hz.consul.client.port",xxxx))

  • Next in the hazelcast apps' config I have

<property name="consul-registrator-config"><![CDATA[
	  {
		"registerWithIpAddress":"${hz.consul.client.ip}",
		"registerWithPort":${hz.consul.client.port},
		"healthCheckProvider":"${hz.consul.health.check.provider}",
		"healthCheckTcp":"#MYIP:#MYPORT",
		"healthCheckTcpIntervalSeconds":30,
		
		"healthCheckScript":"exec 6<>/dev/tcp/#MYIP/#MYPORT || (exit 3)",
		"healthCheckScriptIntervalSeconds":30,
		"healthCheckHttp":"http://#MYIP:80",
		"healthCheckHttpIntervalSeconds":30
	  }
  ]]></property>
  • THEN initialize hazelcast in the app against that conf, then when HZ starts and this SPI kicks off, it will register the 10.x addr detected by the docker-discovery-registrator-consul lib, then the source/dest issue will go away

@dannygu
Copy link
Author

dannygu commented Jul 12, 2017

Hi,

Thanks for the quick replay, highly appreciated.

currently we are trying to run this on AWS ECS, docker swarm isn't really an option we've considered.

This is what consul returns for the nodes IP:

curl http://10.0.1.160:8500/v1/catalog/service/hz-cluster

{
"ID": "85309773-62fa-f1ea-11fb-d95fe3a5bc3a",
"Node": "ip-10-0-1-160",
"Address": "10.0.1.160",
"Datacenter": "dc1",
"TaggedAddresses": {
"lan": "10.0.1.160",
"wan": "10.0.1.160"
},
"NodeMeta": {},
"ServiceID": "hz-cluster-172.17.0.3-172.17.0.3-5701",
"ServiceName": "hz-cluster",
"ServiceTags": [
"hazelcast",
" test1"
],
"ServiceAddress": "172.17.0.3",
"ServicePort": 5701,
"ServiceEnableTagOverride": false,
"CreateIndex": 18515,
"ModifyIndex": 18712
},
{
"ID": "85309773-62fa-f1ea-11fb-d95fe3a5bc3a",
"Node": "ip-10-0-1-160",
"Address": "10.0.1.160",
"Datacenter": "dc1",
"TaggedAddresses": {
"lan": "10.0.1.160",
"wan": "10.0.1.160"
},
"NodeMeta": {},
"ServiceID": "hz-cluster-172.17.0.6-172.17.0.6-5701",
"ServiceName": "hz-cluster",
"ServiceTags": [
""
],
"ServiceAddress": "172.17.0.6",
"ServicePort": 5701,
"ServiceEnableTagOverride": false,
"CreateIndex": 18544,
"ModifyIndex": 18544
}
]

@bitsofinfo
Copy link
Owner

bitsofinfo commented Jul 12, 2017

see my above comment, unfortunately again, see the hazelcast issues for "docker" and "this node is not requested endpoint" etc, tons of issues, complaints and outstanding todos on their end for all these stupid interface/binding issues. Some these SPIs tools I've made can make it all work

hazelcast/hazelcast#4537

@bmudda
Copy link
Collaborator

bmudda commented Jul 13, 2017

Late to the party, but @dannygu I think you need to specify your Hazelacast public address IP and port in the public-address property. Replace [HZ_PUBLIC_ADDRESS_IP] and [HZ_PUBLIC_ADDRESS_PORT] with the appropriate IP and port.

<network>
<port auto-increment="true">5701</port>
<public-address>[HZ_PUBLIC_ADDRESS_IP]:[HZ_PUBLIC_ADDRESS_PORT]</public-address>
<join>
  <multicast enabled="false"/>
  <aws enabled="false"/>
  <tcp-ip enabled="false"/>
  <discovery-strategies>
    <discovery-strategy class="org.bitsofinfo.hazelcast.discovery.consul.ConsulDiscoveryStrategy" enabled="true">
      <properties>
       ...
      </properties>
    </discovery-strategy>
  </discovery-strategies>
</join>
</network>

@dannygu
Copy link
Author

dannygu commented Jul 13, 2017

Hi,

@bitsofinfo @bmudda Thanks for all the suggestions, i managed to get everything working by utilizing public-address option in hazelcast.xml

I pull all of the information (host+randomPort) from AWS metadata before i bootstrap Hazelcast, works great.

bottom line discover using consul and bootstrap with AWS = win.

Thanks Again!

@bitsofinfo
Copy link
Owner

great! please star the project if you found it useful!

@bitsofinfo
Copy link
Owner

also @dannygu would be great if you could contribute a little PR w/ a markdown file describing a how-to w/ your use-case and solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants