Skip to content

Latest commit

 

History

History
97 lines (78 loc) · 5.37 KB

operations.md

File metadata and controls

97 lines (78 loc) · 5.37 KB

Operations

Once deployed the application doesn’t require any special regular maintenance. However, at some point it might be needed to scale the service up or down to address changing traffic profile or acquire access to the files/directories generated by the service for troubleshooting or in case of emergency. Following sections address these concerns.

Service troubleshooting

Logs from all tasks are aggregated and available in CloudWatch (CloudWatch -> Logs -> Log Groups -> <task definition name>).

It is also possible to log into EC2 instances directly to examine the file system or log into running Docker containers (docker exec -it <container name> sh, where <container name> could be found in output of docker container ls  command). In order to do this use corresponding SSH key used during deployment configuration.

Data layout

Application data resides at /app/prebid.js on EC2 instances and mounted to the same path inside the application containers.

Structure of /app/prebid.js directory:

  • /app/prebid.js/working_master - it is where PrebidJS git repository checked out and changes from origin (including tags) are pulled periodically
  • /app/prebid.js/prebid_<version> - a copy of working_master directory with particular version (tag) checked out. Contains pre-built PrebidJS core and modules of corresponding version.

Removing particular release

Problem: a PrebidJS release (git tag) has been created accidentally and picked up by the app before it has been removed from GitHub.

Goal: remove PrebidJS version from the list of available versions returned by /versions endpoint.

Solution:

  1. Delete the tag from the local working copy of PrebidJS repository on the node running the application. There are two ways of doing that:
    1. Delete /app/prebid.js/working_master directory on the EC2 instances hosting the application containers - it will be created anew by the builder script
    2. Log in to builder container on EC2 instances hosting application containers (see above) and delete git tag manually:
      cd /app/prebid.js/working_master
      git tag -d 3.22.0
      
  2. Delete the directory in /app/prebid.js/ named after the tag:
    rm -rf /app/prebid.js/prebid_3.22.0
    
  3. Restart application containers via ECS Console (ECS -> Clusters -> Tasks -> Stop)

Scaling up

This section describes the steps to take to scale the service up, i.e. add new serving nodes.

Scaling service up involves following actions in AWS Console:

  1. Adding new instance to the Auto Scaling Group

    1. Go to EC2 -> Auto Scaling Groups
    2. Edit corresponding group and increase Min and Desired Capacity (in most cases incrementing value by 1 will be enough), adjust Max to be greater than Desired Capacity if necessary.
    3. Wait until new instance(s) come up and become operational.
  2. Adding new Task to the ECS Service

    1. Go to ECS -> <Cluster name>
    2. Edit corresponding Service and increase the Number of tasks (increment value should correspond to the number of instances added to Auto Scaling Group on the previous step).
    3. Wait until new task(s) are launched

Scaling down

This section describes the steps to take to scale the service down, i.e. remove unneeded serving nodes. This process is mostly just inverted scaling up process but with additional safety step to ensure service disruption is avoided.

Scaling service down involves following actions in AWS Console:

  1. Removing a Task from the ECS Service 1 Go to ECS -> <Cluster name> 2. Edit corresponding Service and decrease the Number of tasks (decrement value should correspond to the number of instances intended to remove from the Auto Scaling Group on the next step). 3. Wait until new task(s) are terminated (see Tasks tab)
  2. Removing an instance from the Auto Scaling Group
    1. Go to EC2 -> Auto Scaling Groups
    2. Find corresponding group, open Instances tab, identify and select instances that must not be terminated(i.e. all instances with running Tasks, see ECS Cluster -> ECS Instances tab to find which instances have Tasks running on them). Click Actions -> Instance Protection -> Set Scale In Protection. This step guarantees that running Tasks will not be affected by Auto Scaling Group shrinking.
    3. Edit corresponding group and decrease Min and Desired Capacity (new number should correspond to the number of remaining ECS Tasks running), adjust Max if necessary.
    4. Wait until instance(s) are terminated.
    5. Clear Scale In Protection flag from remaining instances.

Application redeployment

To keep things simple there will be only one tag for images - latest for prod, dev for dev, i.e. no versioning support.

In order to update the application:

  1. Build the new version of the Docker images, tag them with latest (or dev) tag and push to the ECR repositories
  2. Stop ECS Tasks one by one - ECS will start a new Task instead of each stopped one that will cause new images to be pulled

Note: ECS service deployments will not complete if they run on only one task (as they cannot be done safely). To redeploy to a single host, or to replace the host (for example when moving to a bigger instance type), it's necessary to update both the ECS task definition's and the ASG's desired count to 0 - to take the old host out of service - then back to 1.