This project leverages Vagrant and Apache Ambari to create multi-VMs PivotalHD 3.0 or Hortonworks HDP2.x Hadoop cluster including HAWQ 1.3 (SQL on Hadoop) and Spring XD 1.2.
The logical structure of the cluster is defined in a Blueprint
. Related Host-Mapping
defines how the blueprint is mapped into physical machines. The Vagrantfile script provisions Virtual Machines (VMs) for the hosts defined in the Host-Mapping
and with the help of the Ambari Blueprint API deploys theBlueprint
in the cluster. Vagrant supports PivotalHD3.0 (PHD
) and Hortonworks 2.x (HDP
) blueprint stacks.
The default All-Services-Blueprint creates four virtual machines — one for Apache Ambari and three for the Pivotal HD cluster where Apache Hadoop® (HDFS, YARN, Pig, Zookeeper, HBase), HAWQ (SQL-on-Hadoop) and SpringXD are installed.
- From a hardware standpoint, you need 64-bit architecture, the default blueprint requires at least 16GB of physical memory and around 120GB of free disc space (you can configure with only 24GB of disc space but you will not be able to install all Pivotal services together.
- Install Vagrant (1.7.2+).
- Install VirtualBox or VMware Fusion (note that VMWare Fusion requires paid Vagrant license).
- Clone this project
git clone https://github.com/tzolov/vagrant-pivotalhd.git
- Follow the Packages download instructions to collect all required tarballs and store them inside the
/packages
subfolder. - Edit the Vagrantfile
BLUEPRINT_FILE_NAME
andHOST_MAPPING_FILE_NAME
properties to select theBlueprint
/Host-Mapping
pair to deploy. All blueprints and mapping files are in the/blueprint
subfolder. By default the 4 nodes, All-Services blueprint is used.
From the top directory run
vagrant up --provider virtualbox
Depends on the blueprint stack either PivotalHD or Hortonworks clusters will be created. The default blueprint/host-mapping
will create 4 Virtual Machines.
When the vagrant up
command returns, the VMs are provisioned, the Ambari Server is installed and the cluster deployment is in progress. Open the Ambari interface to monitor the deployment progress:
http://10.211.55.100:8080
(username: admin
, password: admin
)
The following Vagrantfile configuration properties can be used to customize a cluster deployment.
For instructions how to create a custom Blueprint
or Host-Mapping
read the blueprints section.
Property | Description | Default Value |
---|---|---|
BLUEPRINT_FILE_NAME | Specifies the Blueprint file name to deployed. File must exist in the /blueprints subfolder. | phd-all-services-blueprint.json |
HOST_MAPPING_FILE_NAME | Specifies the Host-Mapping file name to deployed. File must exist in the /blueprints subfolder. | 4-node-all-services-hostmapping.json |
CLUSTER_NAME | Sets the cluster name as it will appear in Ambari | CLUSTER1 |
VM_BOX | Vagrant box name to use. Tested options are: - bigdata/centos6.4_x86_64 - 40G disk, - bigdata/centos6.4_x86_64_small - just 8G of disk space and - chef/centos-6.6 - CentOS6.6 box. |
chef/centos-6.6 |
AMBARI_NODE_VM_MEMORY_MB | Memory (MB) allocated for the Ambari VM | 768 |
PHD_NODE_VM_MEMORY_MB | Memory (MB) allocated for every PHD VM | 2048 |
AMBARI_HOSTNAME_PREFIX | Set the Ambari host name prefix. The suffix is fixed to '.localdomain'.Note: THE FQDN NAME SHOULD NOT be in the phd[1-N].localdomain range. | ambari |
DEPLOY_BLUEPRINT_CLUSTER | Set TRUE to deploy a cluster defined by BLUEPRINT_FILE_NAME and HOST_MAPPING_FILE_NAME. Set to FALSE if you prefer to install the cluster with the Ambari wizard. | TRUE |