Skip to content

Latest commit

 

History

History
645 lines (406 loc) · 19.1 KB

Readme.md

File metadata and controls

645 lines (406 loc) · 19.1 KB

Index

  1. Setup EC2 AWS
  2. Create alarm to stop ec2 instance if inactivity
  3. Creating S3 Bucket
  4. Access Key to S3 Bucket
  5. AWS CLI
  6. Cloning git repository
  7. Installing Docker
  8. Building Docker image
  9. Launching Jupyter Notebook
  10. Adding libraries
  11. Writing and Reading from s3 with Pandas

1 Setup EC2 AWS

1.1 Choose an AMI

An AMI (Amazon Machine Image) is virtual machine image ready to run a determined SO (Linux or Windows). I recommend using the latest Ubuntu Server realease. At this time is Ubuntu Server 18.04 LTS (HVM) , SSD Volume Type.

choose_ami

1.2 Choose an instance Type

A small instance is enough for the initial setup.

choose_instance_type

1.3 Configure instance details

For our purpose, this tab can be ignore. Just so you know, here you can choose to create several instances, request spot instances, choose vpc between ohers.

choose_configure_instance_details

1.4 Add Storage

I recommend to use 16 GBs. That would be enoguh to install docker and jupyter image. To store datasets we are going to use S3.

choose_add_storage

1.5 Add Tags

For our purpose, this tab can be ignore.

choose_add_tags

1.6 Configure Security Group

Choose "Create a new security group". Then give a name for the security group, for example "jupyter-docker-security-group". Then give a description for example "Ports: 22,8888,2376,443,80".

Ports and usages:

  • 22 : SSH
  • 80 : HTTP
  • 443 : HTTPS
  • 2376 : DockerHub
  • 8888 : Jupyter

Then configure the following security rules as in the image:

configure_security_group

1.7 Review Instance Launch

This tab shows a resume for the configuration of the ec2 instance. Verify and then just click launch.

review_instance_launch

1.8 Create a Key Pair

They are going to ask you to select or create a new key pair. We are going to create a new one. Just put a name to the key and then save it. Its important not to lose the key pair because you wont be able to get later.

1.9 Setting a Static IP Adress

Before connecting to the ec2 instance, we are going to set a static ip adress. Each time an instance is run, AWS assigns a public ip to reach through internet. So each time you run the instance you are going to get a different public ip. You can deal with that but I prefer to have a static ip. This way your string connection won't change.

To create a static ip you have to allocate an elastic ip adress. To do that go to the left panel in "Network & Security" group you are going to find "Elastic IPs". Then click on "Allocate Elastic IP address"

Then just click on "Allocate".

Then we have to associate the elastic ip address to the ec2 instance we launched before. To do that click on "Actions", then "Associate Elastic IP Adress".

Then select the instance and then click on "Associate".

1.10 Connect to your instance

To connect to your instance you need an ssh client. It can be git bash, putty, moba exterm, or even power shell. I going to use "visual studio code" because you can edit files like they are local and send commands through console in the same workspace.

So after installing vscode. Go to extensions in the left panel and type "remote development". Install the first option.

Then go to aws, make sure the instance is running.

Locate your aws key pair into "C:\Users\UserName"

Then add a new connection in SSH Targets.

ssh -i "[aws key pair name]" ubuntu@[elastic ip address]

Press control + "ñ" to open the terminal.

Upgrade Ubuntu.

sudo apt update
sudo apt-get upgrade -y

Install pip.

sudo apt install python-pip -y

2 Create alarm to stop ec2 instance if inactivity

Since this instance is going to be use for development purpose, its recommended creating a cloudwatch alarm to stop the ec2 instance in case it is not used. This way we avoid unexpected charges.

Go to Services type CloudWatch and then go to alarms in the left panel.

2.1 Creating Cloudwatch Alarm

2.1.2 Specify metric and conditions

First you will select the metric to use to set a threshold. Just click en select metric.

Then select "EC2" metrics.

Then select "Per-Instance Metrics"

Then filter your instanceid and filter "cpuutilization"(1). Then select your instanceid-metric(2). Then just click the button "Select metric" (3).

Then you have to select the statistic and the period. In Statictics, select "Average". In period, select "1 minute". It should appear as in the image.

Then you have to set the conditions. First select "Static" as thresold type(1). Then in "Whenever CPUUtilization is..." select "Lower"(2). Then define the threshold at 5 percent(3). At this time I recommend to use this value. If you experiment that your ec2 instance turns off unexpectedly while your are working on the ec2 instance then decrease this value.

Then define the number of datapoints will cause the ALARM state. Each datapoint has one minute of period. "a out of b" means if "a" points of the last "b" points are out of threshold, the alarm will go to ALARM state. I recommend to use 30 of 30 (4). This means that is going to pass 30 minutes of cpu utilization less than 5% . Again if you experiment that your ec2 instance turns off unexpectedly while your are working on the ec2 instance then increase this value.

Finally select "Treat missing data as bad (breaching threshold)" just in case we dont have the datapoint we are going to treat it as a bad point (5).

2.1.2 Configure Actions

First, remove the notification. We dont want to be aware of alarms states. We want to stop the ec2 instance smoothly.

Then click on "Add ec2 action".

In "Whenever this alarm state is" select "in Alarm". Then select the action "Stop this instance"

2.1.3 Add Description

Then define a unique name and a description for the alarm. For example, "Stop inactive ec2 instance : i-0d1e1d80f7d901dd8" as an alarm name. And a description like "Stop ec2 instance when cpu utilization is less than 5%". Then click on "Next"

2.1.4 Preview and Create

Finally you get a preview just to confirm the configurations. Click on "Create".

Then you are going to be send to the panel of active alarms. You can see in the image that the alarm has been setup.

Then go to ec2 services and wait until the instance has a in alarma state and it turns off. It would take a minute at least. Then start the instance.

2.2 Add a cpu stress in the initialization file of the ec2 instance

The alarm we just created, have two status "In Alarm" and "OK". When the instance is stopped the default status is "In Alarm". When we run the instance, it doesn't change to "OK" automatically, because the startup process don't consume enough cpu to make change the alarm to status. And if the alarm doesn't change to "OK", the instance wont turn off because the alarm needs a change in the status from "OK" to "In Alarm" to turn off the instance.

To force to change the status to "OK" we are going to add a cpu stress when the instance initialize. First install "stress-ng" and then create the file "rc.local". This file is executed at startup.

https://wiki.ubuntu.com/Kernel/Reference/stress-ng

sudo apt install stress-ng -y
sudo nano /etc/systemd/system/rc-local.service

Copy and paste this code below.

[Unit]
Description=/etc/rc.local Compatibility
ConditionPathExists=/etc/rc.local

[Service]
Type=forking
ExecStart=/etc/rc.local start
TimeoutSec=0
StandardOutput=tty
RemainAfterExit=yes
SysVStartPriority=99

[Install]
WantedBy=multi-user.target

Then create the /etc/rc.local file executing this command.

printf '%s\n' '#!/bin/bash' 'exit 0' | sudo tee -a /etc/rc.local

Then add execute permission to /etc/rc.local file.

sudo chmod +x /etc/rc.local

After that, enable the service on system boot. Then start the service and check its status.

sudo systemctl enable rc-local
sudo systemctl start rc-local.service
sudo systemctl status rc-local.service

Then add the stress command in the /etc/rc.local file.

sudo nano /etc/rc.local

Copy the code below.

#!/bin/bash
stress-ng -c 0 -l 50 -t 120

exit 0

Then reboot the system.

sudo reboot

3 Creating S3 Bucket

In order to share datasets and models we make, we are going to use "S3".

Then click on "Create Bucket".

Then put a name to your bucket. For example my bucket name is "s3-jupyter-docker-aws-20200102-1238". Bucket name is unique across all existing bucket names in S3, So in my example I added the date I created "yyyyMMdd-hhmm".

This step can be ignore.

For default all s3 buckets are block. So leave this step asis.

Finally click on "Create Bucket"

4 Access Key to S3 Bucket

To save files in s3 programtically, we are going to use awscli (command line) and boto3 (python). In order to do that we need credentials or access keys. So go "IAM" service.

Then on the left panel click on "Users".

Then "Add user".

Put a Name to your user. For example my user name is "iam-user-jupyter-docker". Then checked both ways of access type. Then unchecked "Require password reset". Then click on "Next: Permissions".

Then we are going to set permissions to the user. We are going to create an specific policy. So, click on "Attach existing policies directly" and then click on "Create Policy".

A new window is going to open.

Go to "JSON" tab. Copy and paste the snippet above. Change the bucket name with yours.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketLocation",
                "s3:ListAllMyBuckets"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::s3-jupyter-docker-aws-20200102-1238"
            ]
        },
        {
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::s3-jupyter-docker-aws-20200102-1238/*"
            ]
        }
    ]
}

Then click on "Review policy".

Then click on "Create policy".

Then you are going to see a success message.

Then we are going to apply the policy to the user. Go back to the "Add User" tab.

Then type the name of the policy to search and then select it.

Skip this window.

Then click on "Create user".

Then you have to download the credentials before close the window.

Here you can see the credentials.

5 AWS CLI

AWS CLI is the command line interface to interact with aws services programatically. You can install it with pip. Then configure the access key as is in the pictures. In "region name" put "us-east-2". In "output format" just press enter.

sudo pip install awscli
aws configure

To test the connection execute this command.

aws s3 ls

To test you can upload files.

touch file.txt
aws s3 cp file.txt s3://s3-jupyter-docker-aws-20200102-1238/file.txt
aws s3 ls s3-jupyter-docker-aws-20200102-1238

You can verify the file has been upload in aws console.

6 Cloning git repository

Clone the git repository to get the basic Dockerfile to build the image.

mkdir wd
cd wd
git clone https://github.com/ArnoldHueteG/jupyter-docker.git

7 Installing Docker

We are going to use docker to install Jupyter. In order to install it, copy and paste the code above. The final command will restart the instance.

curl -sSL https://get.docker.com/ | sh
sudo usermod -aG docker ubuntu
sudo reboot

8 Building Docker image for Jupyter

Then we have to build the image we are going to use with a Dockerfile. In this file we can put the libraries we need additionally from the base image "jupyter/scipy-notebook".

Verify you have the credential files in "./aws" folder.

cd
cat .aws/credentials

cat .aws/config

Then just copy the folder ".aws" inside "wd" folder.

cd ~/wd/jupyter-docker
cp -avr ~/.aws .aws
chmod 664 .aws/*

Then build the image.

cd ~/wd/jupyter-docker
docker build -t jupyter_docker_aws .

9 Launching Jupyter Notebook

To launch Jupyter execute the command above.

docker run --name jupyter \
-v /home/ubuntu/wd:/home/jovyan/work/ \
-d -p 8888:8888 \
-e GRANT_SUDO=yes --user root \
jupyter_docker_aws 
  • "--name" : gives an name to the container
  • "-v" : link a directory from the instance to a directory to the container. It lets you persist files.
  • "-d" : run in background
  • "-p" : link a port in the instance to a port in the container.
  • "-e GRANT_SUDO=yes --user root" : to use sudo whithin a container. Useful to install libraries.
  • "jupyter_docker_aws" : is the name of the image we just built before.

It wont launch the token. To see the token.

docker logs jupyter

Change the link below with your elastic ip and token. Then copy the link to a browser.

http://3.136.67.189:8888/lab?token=62e558c74e33bf55714bdeb367bba5f3f42bf4a8aab83bcf

Then we are going to set up a password. To do that connect to bash your container.

docker exec -it jupyter bash

Then use the command below to generate the encripted password.

ipython -c "from notebook.auth import passwd; passwd()"

Then you are going to get the encripted password.

Control+D to get out from the container.

Then rerun the jupyter container with the code below. Change the encripted password with yours.

docker stop jupyter && docker rm jupyter && \
docker run --name jupyter \
-v /home/ubuntu/wd:/home/jovyan/work/ \
-d -p 8888:8888 \
-e GRANT_SUDO=yes --user root \
jupyter_docker_aws \
start-notebook.sh --NotebookApp.password='sha1:06a9d429c494:136c8587efbdd31e09929e53519388e6ff99773a'   

To launch jupyter at the startup of the instance add the code below in "/etc/rc.local".

sudo nano /etc/rc.local
#!/bin/bash
docker stop jupyter && docker rm jupyter && \
docker run --name jupyter \
-v /home/ubuntu/wd:/home/jovyan/work/ \
-d -p 8888:8888 \
-e GRANT_SUDO=yes --user root \
jupyter_docker_aws \
start-notebook.sh --NotebookApp.password='sha1:06a9d429c494:136c8587efbdd31e09929e53519388e6ff99773a'   
stress-ng -c 0 -l 50 -t 60

exit 0

Then restart the instance to test.

sudo reboot

Then refresh the browser.

10 Adding libraries

By default all the libraries that you install inside a container are ephemeral. To install libraries to persist in the image, it is a best practice to use a Dockerfile. But to test an installation we are going to do it through a terminal in jupyter lab and then add the command in the docker file to rebuild the image.

10.1 Install library through terminal in jupyter lab.

Open a terminal in jupyter.

Install the library with pip. You can try with conda too.

Finally check if the library is installed.

10.2 Add pip or conda command to Dockerfile and rebuild image.

Just open the Dockerfile and add the pip command as in the image below. You can use jupyter text editor.

Then rebuild the image.

cd ~/wd/jupyter-docker
docker build -t jupyter_docker_aws .

10.3 Relaunch jupyter.

Then relaunch the jupyter container with the code below.

docker stop jupyter && docker rm jupyter && \
docker run --name jupyter \
-v /home/ubuntu/wd:/home/jovyan/work/ \
-d -p 8888:8888 \
-e GRANT_SUDO=yes --user root \
jupyter_docker_aws \
start-notebook.sh --NotebookApp.password='sha1:06a9d429c494:136c8587efbdd31e09929e53519388e6ff99773a'

Reload the jupyter in your browser.

Open a notebook and import the library to test the library was installed succesfull.

11 Writing and Reading from s3 with Pandas

You can see examples in "Writing & Reading over S3.ipynb".