hadoop

Hadoop is comprised of four main layers:

Hadoop Common is the collection of utilities and libraries that support other Hadoop modules.

HDFS, which stands for Hadoop Distributed File System, is responsible for persisting data to disk.
YARN, short for Yet Another Resource Negotiator, is the “operating system” for HDFS.
MapReduce is the original processing model for Hadoop clusters. It distributes work within the cluster or map, then organizes and reduces the results from the nodes into a response to a query. Many other processing models are available for the 3.x version of Hadoop. Installation hadoop instruction
Ubuntu 20.04/Debian

install Java

let's first update the repository first

sudo apt update

then install java

sudo apt install default-jdk

check if it's worked by use this command

java -version; javac -version

fun then

install ssh

for external access & authintace access

install ssh server

apt install ssh -y
or
sudo apt install openssh-server openssh-client -y

let's then generate private key & public key

ssh-keygen -t ed25519

then authrise access without password for localhost

cat ~/.ssh/id_ed25519.pub >> ~/.ssh/authorized_keys

let's test it

ssh localhost

should access without need of password

install hadoop

let's download from latest One by hadoop.apache.org

let's choose hadoop-3.3.3.tar.gz

first make sure at home directorly so

cd ~/

then

wget -c https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.3.3/hadoop-3.3.3.tar.gz](https://dlcdn.apache.org/hadoop/common/hadoop-3.3.3/hadoop-3.3.3.tar.gz

-c for continuance

check the chechsum sha512

SHA512 (hadoop-3.3.3.tar.gz) = 1f5762682cef3daff8b2379fe7e40efca107bb7e8dcaa4a513e3bc0c082067759dd05d493ec997433dde2c89ea63dbc93aee0bba60045f89d3ec2d3f687f58b3

extract data at home directory

call the home directory

cd ~/

then extract the download hadoop by

tar -xf hadoop-3.3.1.tar.gz

move filolder to another directory

sudo mv hadoop-3.3.1 /usr/local/hadoop

Configuring Hadoop’s Java Home

let's check first the loction of java

readlink -f /usr/bin/java | sed "s:bin/java::"

To begin, open hadoop-env.sh:

sudo nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh

export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")

If you have trouble finding these lines, use CTRL+W to quickly search through the text. Once you’re done, exit with CTRL+X and save your file.

test hadoop

/usr/local/hadoop/bin/hadoop

add vars

sudo nano .bashrc

#Hadoop Related Options
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

active vars

source ~/.bashrc

core-site.xml

sudo nano $HADOOP_HOME/etc/hadoop/core-site.xml

<configuration>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/hdoop/tmpdata</value>
</property>
<property>
  <name>fs.default.name</name>
  <value>hdfs://127.0.0.1:9000</value>
</property>
</configuration>

hdfs-site.xml

sudo nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

<configuration>
<property>
<name>dfs.data.dir</name>
<value>/home/hdoop/dfsdata/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hdoop/dfsdata/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

Edit mapred-site.xml File

sudo nano $HADOOP_HOME/etc/hadoop/mapred-site.xml

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

Edit yarn-site.xml File

sudo nano $HADOOP_HOME/etc/hadoop/yarn-site.xml

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>127.0.0.1</value>
</property>
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>   
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>

start hadoop

first format NameNode

hdfs namenode -format

start hadoop

./start-dfs.sh

start yarn

./start-yarn.sh

check if all done

jps

then try access web browser by

http://localhost:9870

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hadoop

install Java

install ssh

install ssh server

install hadoop

Configuring Hadoop’s Java Home

test hadoop

add vars

core-site.xml

hdfs-site.xml

Edit mapred-site.xml File

Edit yarn-site.xml File

start hadoop

About

Releases

Packages

ZamanOof/hadoop

Folders and files

Latest commit

History

Repository files navigation

hadoop

install Java

install ssh

install ssh server

install hadoop

Configuring Hadoop’s Java Home

test hadoop

add vars

core-site.xml

hdfs-site.xml

Edit mapred-site.xml File

Edit yarn-site.xml File

start hadoop

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages