← Back to Home

Cloud Assignment 4

Purdue University CS351 Spring 2026

Due Friday April 24th 11:00PM

Introduction

During this assignment you will migrate a monolithic Software-as-a-Service application to a microservices architecture running on Kubernetes. You'll deploy the existing N-Tier bird.ai stack one last time on EC2 and RDS, then incrementally pull each piece into a managed Kubernetes cluster (Amazon EKS) using the Strangler Fig migration pattern. You'll write your own Kubernetes manifests, configure an ingress controller for public HTTPS access, extract the ML inference path into a dedicated microservice, and tear down the legacy infrastructure once the new architecture is verified. Along the way you'll get hands-on practice with kubectl, Kustomize overlays, ConfigMaps and Secrets, persistent storage, and the operational rituals that make zero-downtime migrations possible.

📌 Notice

Some hyperlinks in this document may not work unless you right-click to open them in a new tab.

This assignment is distributed as an HTML file on Brightspace, which displays it you in an iframe. An iframe's capabilities are limited for security reasons, and they also break the same-origin policy model of web security used by many websites. Certain websites restrict cross-origin resource sharing, causing our hyperlinks to them to fail when they load in an iframe.

Contents

  1. Initial Setup
    1. Set AWS Budget
    2. Set AWS IAM Permissions
    3. Create IAM User
  2. Update from bird.ai
  3. Installing required software
    1. Terraform (v1.0 or newer)
    2. Ansible
    3. Docker
    4. kubectl
    5. AWS CLI v2
    6. PostgreSQL client tools (v17)
    7. SSH key pair
  4. Deploy the N-Tier Architecture
    1. Deploy bird.ai on EC2 and RDS
    2. Configure kubectl
  5. Migrate the Database
  6. bird.ai on Kubernetes
  7. Public DNS and Teardown
  8. Extract ML Microservice
  9. Estimating Cloud Cost
    1. What you pay for
    2. Minimizing your bill
    3. Why the shared cluster is nearly free
  10. Teardown

Initial Setup

🚨 Important

We've added an additional AWS IAM step, pay careful attention to the section Create IAM User.

Use an account eligible for the AWS Free Tier. Make a post on Ed if you don't have an eligible account. This assignment will use AWS region "us-east-1", do not use AWS services in any other region. You can control your current region from the menu bar of the AWS Console.

https://us-east-1.console.aws.amazon.com/console/home?region=us-east-1#
AWS Console Regions menu with cursor pointing at us-east-1

Set AWS Budget

We're going to set up an AWS Budget that will alert you if you get any monetary charges.

  1. Navigate to the Billing feature in the AWS Billing and Cost Management Service.
  2. Click "Create Budget"
  3. Select "Use a template" then "Zero spend budget"
  4. Add your email to the "Email recipients" text area.
  5. Click "Create budget"

Set AWS IAM Permissions

We're going to change the permission structure that grants instructors read-only access to your account so our autograding scripts can check your progress on a Cloud Assignment.

  1. Navigate to IAM (Identity and Access Management) in your AWS account.
  2. Click on "Roles" in the left-hand navigation menu.
    {
        "Version": "2012-10-17",
        "Statement": [
    	{
    	    "Effect": "Allow",
    	    "Action": "sts:AssumeRole",
    	    "Principal": {
    		"AWS": "arn:aws:iam::949430458732:user/autograder"
    	    },
    	    "Condition": {
    		"StringEquals": {
    		    "sts:ExternalId": "<TODO_REPLACE_WITH_YOUR_OWN_EXTERNAL_ID>"
    		}
    	    }
    	}
        ]
    }
    

Finish by creating a new credentials file in an INI file format.

credentials
[default]
aws_account_id = YOUR_AWS_ACCOUNT_ID
external_id = YOUR_EXTERNAL_ID_FROM_THE_TRUST_POLICY
🚨 Important

Safekeep the credentials file, you will use it for assignment submissions during Cloud Assignment 4.

Create IAM User

We're going to create the IAM User "ca4" you'll be using for programmatic access to and control of AWS services.

  1. Navigate to IAM (Identity and Access Management) in your AWS account.

  2. Click on "Users" in the left-hand navigation menu.

    🌈 The More You Know

    PowerUserAccess grants full access to the AWS services we'll touch (EC2, RDS, ECR, STS) but blocks IAM management. This is for security: if your ca4 access key leaks, an attacker cannot create new IAM identities or escalate privileges in your account.

  3. Click the new "ca4" user on the Users list page.

  4. Open a terminal and run aws configure --profile ca4

    1. Paste in your "Access key" from Step 3 as the "AWS Access Key ID"
    2. Paste in your "Secret access key" from Step 3 as the "AWS Secret Access Key"
    3. Set your "Default region name" as "us-east-1"
    4. Set your "Default output format" as "json"
🚨 Important

Every shell session during this assignment needs the AWS_PROFILE environment variable set.

Terminal
$ export AWS_PROFILE="ca4"
$ aws sts get-caller-identity
{
"UserId": "AIDA...",
"Account": "<YOUR_ACCOUNT_ID>",
"Arn": "arn:aws:iam::<YOUR_ACCOUNT_ID>:user/ca4"
}

Verify that the "Account" value matches the AWS Account ID you'll be using for this assignment, and that the "ARN" value ends in :user/ca4, the user you'll complete the assignment using.

Optional: add export AWS_PROFILE=ca4 to your shell's rc file (e.g. ~/.bashrc if using bash) while you're working on this assignment.

Update from bird.ai

Three letters have been on everyone's mind: IPO. No, it's not that beer all the millennials drink, that's IPA. IPO is an "Initial Public Offering", and the office is buzzing about it. The company has been a lot more relaxed since it overcame its scaling challenges, and mentions of PMF have given way to meetings about KPIs and SLAs.

Headcount is growing. "Two's a company, three's a crowd" you think to yourself as the hum of the office drowns out the traffic sitting outside your new company headquarters. Business has been doing well, and you've doubled the size and number of teams in the company to meet the demands of your new customers.

With scale comes new challenges, however. Your competitive advantage is your innovative technology, speed to market, and customer relationships, but these large new teams have been stepping on each other's toes, causing conflict and slowing down delivery of new features.

You've been stuck in long meetings with the board of executives discussing the IPO, and planning where you'll invest the new capital. Your CTO is looking forward to the research projects you'll be able to fund, the CFO has already made a list of potential acquisition targets, and as CEO you're worried about keeping ahead of your competitors.

Looking to the future, you recognize your software architecture can't keep pace with the changes you anticipate. For the first time in your life, you decide to be proactive and take care of the problem before the last minute.

📋 Instructions

Complete the five sections that follow to migrate bird.ai from its existing N-Tier architecture onto Kubernetes. Each section builds on the previous one, but you can submit your credentials to Gradescope at any point along the way for partial credit. The autograder gives independent feedback on every section. Some steps tell you exactly what to do, and others give you hints or ask you to apply something you learned in a previous Cloud Assignment. When you see a

🤖 Autograder
block at the end of a section, that's your cue to submit and check your progress before moving on.

🌈 The More You Know

This document uses jhenry as a placeholder for your own subdomain. The name is a nod to John Henry, the "steel-driving man" of American folklore who raced a steam-powered drill through a mountain and won, then died with his hammer in his hand. The story feels freshly relevant in 2026: you are about to do the kind of deep infrastructure work a generation of engineers learned by hand, at exactly the moment in software history when AI tools are automating that same work faster than any of us can keep up with. But don't worry, you won't die completing this assignment 😄.

Installing required software

This assignment uses quite a few tools to execute the migration from an N-Tier Architecture to Kubernetes. Install each of these on your personal computer before the first section. All of them are cross-platform, pick the install method that matches your operating system.

Terraform (v1.0 or newer)

Follow the official installation instructions. On macOS, brew install terraform works. On WSL Ubuntu, HashiCorp ships an apt repository, the instructions walk you through it.

Ansible

Follow the official installation instructions. pipx install ansible is the recommended path on any system with Python already installed.

Docker

Install Docker Desktop on macOS or Windows. On native Linux or WSL, install the Docker engine directly per your distribution's guide.

kubectl

Follow the official kubectl install guide. On macOS, brew install kubectl. On WSL Ubuntu, use the Kubernetes apt repository.

AWS CLI v2

Install AWS CLI v2. Version 1 (what your package manager may have as awscli) is end-of-life and won't work for the commands in this assignment.

PostgreSQL client tools (v17)

You need pg_dump and psql version 17 to match the RDS Postgres engine this assignment provisions. pg_dump refuses to dump a server newer than the client, so an older postgresql-client package will fail during the database migration.

SSH key pair

You'll generate a throwaway SSH key pair dedicated to this assignment and store it inside the starter directory (not ~/.ssh).

From inside the starter root (the directory containing the assignment README.md):

Terminal
$ ssh-keygen -t ed25519 -f ./id_ed25519 -C ca4

When prompted for a passphrase, you may leave it empty. This is appropriate for ephemeral course infrastructure, for production keys you'd normally set a passphrase and use ssh-agent.

Deploy the N-Tier Architecture

Architecture migrations are tricky endeavors for an engineering team, especially one whose software serves a significant number of users. It's like driving a car and trying to change the tires, windshields, and eventually the engine, while trying not to slow down or have any passengers fly out the window.

A practical approach to these kinds of migrations has been developed, called the Strangler Fig pattern. The goal is to pick apart an existing service piece-by-piece, examining the data flow, isolating core services, and understanding what can be run concurrently. The key moments are the switchovers, when a part of the old architecture is swapped out with a service in the new architecture. At each switchover you verify the service is continuing to run as normal by monitoring error rates, performance metrics, and validating responses from the new service component.

We're going to get the existing N-Tier Architecture up and running so we have a target to migrate from. Get ready to practice using Terraform and Ansible again.

Deploy bird.ai on EC2 and RDS

  1. From the starter root, run Terraform to provision an EC2 instance and an RDS Postgres database.

    Terminal
    $ cd terraform
    $ terraform init
    $ terraform apply -var "public_key=$(cat ../id_ed25519.pub)"
    $ cd ..

    This creates a free-tier-eligible EC2 instance and a db.t3.micro RDS postgres database. The RDS instance takes a few minutes to come up, so Terraform's apply step can run for roughly five minutes.

  2. Run the Ansible playbook. It installs Docker on the EC2, pulls the pre-built bird-ai-app image from the class ECR, and runs it with a DATABASE_URL pointing at your new RDS instance.

    Terminal
    $ cd ansible
    $ ansible-playbook playbook.yml
    ...
    TASK [Show app URL] *********************************************************************
    ok: [birdai-app] =>
       msg: bird.ai is running at http://18.234.100.45
    
    PLAY RECAP ******************************************************************************
    birdai-app: ok=14 changed=8 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
    localhost: ok=3 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
    $ cd ..

    The final "Show app URL" task prints the public IP of your new bird.ai instance.

  3. Open the URL in your browser. Create an account, upload a photo, and verify that detection and history work. Upload at least one geotagged image so you have real data to watch survive the migration in the next section.

🤖 Autograder +10 (10/100)

Configure kubectl

Kubernetes groups related resources into namespaces, which act as lightweight isolation boundaries inside a cluster. Every student in CS351 has their own namespace on the shared cs351 EKS cluster, and a role binding that grants full CRUD on pods, deployments, services, configmaps, secrets, PVCs, and ingresses inside that one namespace. You cannot see or touch any other student's namespace, and you cannot create cluster-scoped resources like ClusterRoles or Nodes.

Your namespace is derived from your Purdue email. Take the local part (everything before @), lowercase it, replace any dots with hyphens, then prefix with student-:

EmailNamespace
jhenry@purdue.edustudent-jhenry
alice.wong@purdue.edustudent-alice-wong

The provisioning happens automatically once you're enrolled. If kubectl get pods below fails with "Unauthorized", post on Ed with your AWS account ID and an instructor will confirm.

From the starter root, substitute your namespace into the kubeconfig template and save the result to ~/.kube/eks-student.yaml:

Terminal
$ mkdir -p ~/.kube
$ sed 's/NAMESPACE/student-YOURNAME/g' kubeconfig-template.yaml > ~/.kube/eks-student.yaml
$ export KUBECONFIG=~/.kube/eks-student.yaml
$ kubectl get pods
No resources found in student-YOURNAME namespace.
🌈 The More You Know

A kubeconfig file tells kubectl three things: what cluster to talk to, what credentials to present, and what namespace to scope commands to. The template you just filled in points at the instructor's EKS cluster, uses an exec credential plugin to call aws eks get-token (so your AWS ca4 profile becomes your Kubernetes identity), and scopes every command to your namespace. The sed command substitutes your namespace into two places in the template: the namespace: field on the context, and the IAM role ARN in the exec block.

The No resources found message is the success state. The cluster is reachable, your namespace exists, and you have permission to list pods in it. There are zero pods so far because you haven't deployed anything to Kubernetes yet, that's the next section.

🍭 Food for thought

You just reproduced everything you built by hand in CA3 with two commands. Look inside terraform/ and ansible/: what exactly did those files abstract away, and what would change if you had to repeat this deploy for a thousand customers instead of one?

Migrate the Database

The first switchover in the Strangler Fig pattern is the database. Right now bird.ai's Django container on EC2 reads and writes against RDS, and we want to break that dependency before moving any application code. If we left RDS in place and just redeployed Django on Kubernetes, we'd still be paying for and operating a managed database instance sitting outside the cluster, and every new pod would need network access back into the RDS private subnet. Instead, we'll move the data into a postgres pod running in our namespace. After this section, Kubernetes is the only system we need to operate for the rest of the migration.

The provided Ansible playbook handles the whole orchestration for you. It runs pg_dump on the EC2 instance (the only host that can reach RDS, which lives in a private subnet), fetches the dump file back to your laptop, and pipes it into the postgres pod via kubectl exec -i. You never have to SSH anywhere by hand. Before running the playbook, you'll generate secrets and bring up the Kubernetes workloads so there's a postgres pod waiting to receive the data.

  1. Generate two strong secrets. openssl rand -hex 25 produces URL-safe hex output, so neither secret will corrupt the DATABASE_URL you'll wire up in the next section.

    Terminal
    $ openssl rand -hex 25
    16e0bdfc0c000ea178d8dcecc765349b87bc974d228f0f1a08
    $ openssl rand -hex 25
    2b6cca7c44543dc98532feb257d2bd88afe4cfe3e9bd0122db

    Edit the two TODO fields in k8s/base/secret.yaml, one for POSTGRES_PASSWORD, one for DJANGO_SECRET_KEY.

    k8s/base/secret.yaml
    apiVersion: v1
    kind: Secret
    metadata:
      name: bird-ai-secrets
    type: Opaque
    stringData:
      POSTGRES_PASSWORD: ""             # TODO
      DJANGO_SECRET_KEY: ""             # TODO
      S3_ACCESS_KEY: minioadmin
      S3_SECRET_KEY: minioadmin
  2. Deploy the whole production overlay into your namespace. This brings up postgres, minio, the bird.ai app, and the bird.ai proxy all at once.

    Terminal
    $ kubectl apply -k k8s/overlays/production/
    configmap/bird-ai-config created
    secret/bird-ai-secrets created
    service/bird-ai-app created
    service/bird-ai-proxy created
    service/minio created
    service/postgresql created
    persistentvolumeclaim/minio-data created
    persistentvolumeclaim/postgresql-data created
    deployment.apps/bird-ai-app created
    deployment.apps/bird-ai-proxy created
    deployment.apps/minio created
    deployment.apps/postgresql created
    job.batch/minio-create-bucket created
    🌈 The More You Know

    That one kubectl apply spun up four distinct services in your namespace. Each is a separate Deployment with its own ClusterIP Service, so pods can reach it by name.

    • postgresql: the Postgres database, now running as a pod in your namespace instead of the managed RDS instance you left behind. Backed by a PersistentVolumeClaim, so data survives pod restarts.
    • minio: an S3-compatible object store. It replaces the AWS S3 bucket from CA3 as the place bird.ai uploads user images. A one-shot Job seeds it with the bird-ai-media bucket on first boot.
    • bird-ai-app: the Django monolith. It handles HTTP requests, reads and writes postgres, uploads to minio, and (for now) runs the YOLO inference in-process. This is the service you'll break apart in the final section of the assignment.
    • bird-ai-proxy: an nginx reverse proxy. It fronts the app, forwards / traffic to bird-ai-app:8000, and serves /media/ paths directly out of minio so Django never has to proxy image downloads.
    Architecture diagram

    :80

    :8000 /

    :9000 /media/

    :5432

    :9000 uploads

    PVC

    PVC

    User

    bird-ai-proxy
    nginx

    bird-ai-app
    Django + YOLO

    minio
    S3-compatible

    postgresql

    postgres PVC

    minio PVC

    (click to open)

    Expect a few things to look broken at first, which is fine for this section:

    You can see all four pod states with kubectl get pods:

    Terminal
    $ kubectl get pods
    NAME                            READY   STATUS             RESTARTS   AGE
    bird-ai-app-7f9bdd4d7b-k2vxp    0/1     CrashLoopBackOff   3          45s
    bird-ai-proxy-74f5f5d6d9-q8hnl  0/1     Running            0          45s
    minio-6f4c5b9f98-xw8zt          1/1     Running            0          45s
    postgresql-7c9d8f6b84-m2nft     1/1     Running            0          45s

    We only need postgres to be Ready for the migration. Wait for it to finish rolling out before continuing:

    Terminal
    $ kubectl rollout status deployment/postgresql
    deployment "postgresql" successfully rolled out
  3. Run the migration playbook. It reads your Terraform outputs to find the RDS instance, SSHes into the EC2 to install the postgresql17 client and run pg_dump, fetches the dump file back to your laptop, and pipes it into the postgres pod via kubectl exec -i.

    Terminal
    $ cd ansible
    $ ansible-playbook migrate-db.yml
    ...
    TASK [Show migration row count] ********
    ok: [localhost] =>
       msg: django_migrations has 20 rows
    TASK [Show detection row count] ********
    ok: [localhost] =>
       msg: app_detection has 3 rows (your Phase 0 uploads)

    If both row counts are non-zero, your migration succeeded. The playbook is idempotent, so you can re-run it safely if you want to reset the Kubernetes postgres to the latest RDS state.

    💡 Hint

    If the playbook fails with "Permission denied (publickey)", it is looking for your SSH key at ../id_ed25519 relative to the ansible/ directory, which is the project-local key you generated in Prerequisites. If your key is somewhere else, override it explicitly:

    Terminal
    $ ansible-playbook migrate-db.yml --extra-vars "ssh_key=/absolute/path/to/id_ed25519"
🤖 Autograder +10 (20/100)
🍭 Food for thought

Examine ansible/migrate-db.yml. List every step it takes and group them into three phases: (1) things it runs on your laptop, (2) things it runs on the EC2 instance, and (3) things it runs inside the Kubernetes postgres pod. What is the purpose of each phase, and why can't it all run on your laptop?

bird.ai on Kubernetes

The data is migrated but bird.ai is still stuck in CrashLoopBackOff. The postgres pod is up, the secrets are in place, the Deployment is rolled out, and yet Django refuses to start. This is the standard shape of a debugging session on Kubernetes: some pods are happy, one pod is not, and the answer is in the logs.

  1. Start with the logs of the most recent crash. --previous asks for the logs of the previously terminated container instead of the current (possibly restarting) one.

    Terminal
    $ kubectl logs deploy/bird-ai-app --previous --tail=5
    return self._cursor()
            ^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.12/site-packages/django/db/backends/dummy/base.py", line 20, in complain
       raise ImproperlyConfigured(
    django.core.exceptions.ImproperlyConfigured: settings.DATABASES is improperly configured. Please supply the ENGINE value. Check settings documentation for more details.
    💡 Hint

    If kubectl logs returns nothing, your pod has not actually started yet. Try kubectl describe pod -l app=bird-ai-app instead. Describe shows Kubernetes' view of the pod: scheduling events, image pull status, probe results, and the exit code of the last container. Logs tell you what the container said, describe tells you what Kubernetes thinks happened to the pod.

    Django's settings.DATABASES is unconfigured. In bird.ai, settings.py builds the database connection from the DATABASE_URL environment variable, and that variable is empty. You'll set it in the ConfigMap and roll the Deployment to pick up the new value.

    🌈 The More You Know

    A ConfigMap is a Kubernetes object that holds non-confidential configuration as key/value pairs. Pods consume ConfigMaps in one of three ways: as environment variables (via envFrom), as command-line arguments, or as files mounted into a volume. The bird-ai-app Deployment uses envFrom, so every key in the bird-ai-config ConfigMap is injected as an env var when the pod starts. Anything Django reads from os.environ, including DATABASE_URL, comes from there.

    A Secret has the same shape, but is intended for sensitive values like passwords, keys, and tokens. That is why POSTGRES_PASSWORD and DJANGO_SECRET_KEY live in secret.yaml instead of configmap.yaml, even though Django reads them from os.environ the same way. URLs go in ConfigMaps, passwords go in Secrets.

  2. Edit k8s/base/configmap.yaml and fill in DATABASE_URL.

    💡 Hint

    A Postgres connection URL has this shape:

    postgres://USER:PASSWORD@HOST:PORT/DATABASE

    For your in-cluster postgres:

    • USER is bird_ai, set by POSTGRES_USER in k8s/base/postgres.yaml.
    • PASSWORD is the POSTGRES_PASSWORD value you pasted into k8s/base/secret.yaml in the previous section.
    • HOST is postgresql. Every Kubernetes Service in your namespace is reachable by the service's name as a DNS hostname, so postgresql resolves to whichever pod the Service currently routes to.
    • PORT is 5432, the default Postgres port exposed by the postgres Service.
    • DATABASE is bird_ai, set by POSTGRES_DB in k8s/base/postgres.yaml.
  3. Apply the updated ConfigMap and restart the app Deployment so it picks up the new environment.

    Terminal
    $ kubectl apply -k k8s/overlays/production/
    $ kubectl rollout restart deployment/bird-ai-app
    🍭 Food for thought

    A plain kubectl apply updates the ConfigMap in the cluster, but it does not restart the pods that mount it. You have to trigger the rollout yourself. Why do you think that is? What other software have we used in this course that has a similar "edit the config, then tell the service to reload it" workflow?

  4. Wait for the rollout to finish, then confirm all four pods are healthy.

    Terminal
    $ kubectl rollout status deployment/bird-ai-app
    Waiting for deployment "bird-ai-app" rollout to finish: 0 of 1 updated replicas are available...
    deployment "bird-ai-app" successfully rolled out
    $ kubectl get pods
    NAME                            READY   STATUS    RESTARTS   AGE
    bird-ai-app-95dcbf7c7-tx6vb     1/1     Running   0          36s
    bird-ai-proxy-99bc55596-h2s8v   1/1     Running   0          73m
    minio-6589b7c8f4-dvcxn          1/1     Running   0          73m
    postgresql-b854bbf7d-2xnvg      1/1     Running   0          73m
  5. Try your Kubernetes-powered bird.ai locally by port-forwarding the proxy Service to your laptop.

    Terminal
    $ kubectl port-forward svc/bird-ai-proxy 8080:80

    Visit http://localhost:8080 in your browser. Log in with the user you created back in the first section when bird.ai was still running on EC2 and RDS. Your history is intact. The Strangler Fig just ate the database layer, and you did not lose anything.

🤖 Autograder +30 (50/100)

Public DNS and Teardown

Your bird.ai deployment is working end-to-end on Kubernetes, but it is only reachable through kubectl port-forward on your laptop. Nobody else can load http://localhost:8080 in their browser. In this section you will expose the app at a real public URL (for example, https://jhenry.ca4.cs351.cloud for a student whose namespace is student-jhenry) through the NGINX Ingress controller running on the cluster, verify the whole stack works against the new hostname, and then tear down the EC2 and RDS infrastructure you provisioned in the first section. This is the last switchover of the Strangler Fig pattern. After it, the N-Tier architecture is gone.

🌈 The More You Know

An Ingress resource declares routing rules from an external URL to a Service inside your namespace. Creating the Ingress by itself does not route any traffic, it is just a declaration stored in etcd. A separate component called an Ingress Controller watches the cluster for Ingress resources and implements their routing rules. The cs351 cluster runs the NGINX Ingress Controller, deployed once by the instructor and shared across every student namespace.

DNS for ca4.cs351.cloud is handled with a single wildcard: a *.ca4.cs351.cloud A record that points at a shared AWS Network Load Balancer in front of the NGINX Ingress Controller. Every subdomain under ca4.cs351.cloud resolves to the same NLB. The controller then routes requests to your Service by inspecting the incoming Host: header and matching it against each Ingress object's rules[*].host. You do not register your subdomain anywhere. As long as your Ingress resource declares host: jhenry.ca4.cs351.cloud, requests to that hostname find your Service automatically.

TLS is handled the same way. cert-manager running on the cluster issued one wildcard certificate for *.ca4.cs351.cloud via a DNS-01 challenge, and the Ingress Controller terminates HTTPS with that cert for every student subdomain. Nothing to configure on your end.

  1. Edit k8s/base/ingress.yaml and replace SUBDOMAIN with your namespace minus the student- prefix. For example, if your namespace is student-jhenry, use jhenry.

    k8s/base/ingress.yaml
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
     name: bird-ai-ingress
     annotations:
       nginx.ingress.kubernetes.io/proxy-body-size: "20m"
    spec:
     ingressClassName: nginx
     rules:
     - host: SUBDOMAIN.ca4.cs351.cloud
       http:
         paths:
         - path: /
           pathType: Prefix
           backend:
             service:
               name: bird-ai-proxy
               port:
                 number: 80
  2. Uncomment # - ingress.yaml in k8s/base/kustomization.yaml so the Ingress resource is actually applied when you run the overlay.

    k8s/base/kustomization.yaml
    apiVersion: kustomize.config.k8s.io/v1beta1
    kind: Kustomization
    resources:
       - app.yaml
       - proxy.yaml
       - postgres.yaml
       - minio.yaml
       - minio-bucket-job.yaml
       - configmap.yaml
       - secret.yaml
       # TODO uncomment after editing ingress.yaml
       # - ingress.yaml
    🌈 The More You Know

    Kustomize is a template-free Kubernetes config tool built into kubectl. A kustomization.yaml file lists a set of resource manifests under resources:, and kubectl apply -k applies them as a group. On top of that base you can stack overlays that patch specific fields (image tags, replica counts, environment variables) per environment, which is how the starter code keeps overlays/local/ (minikube) and overlays/production/ (EKS) distinct without duplicating every base manifest.

    The ingress.yaml entry starts commented out so the overlay you applied in the database migration section works without requiring you to also fill in an Ingress resource. Now that your app is healthy and you have a subdomain, it is time to flip that entry on.

  3. Edit k8s/base/configmap.yaml and set CSRF_TRUSTED_ORIGINS and DOMAIN to your new public URL. CSRF_TRUSTED_ORIGINS needs the full URL including the https:// scheme, while DOMAIN is just the hostname.

    k8s/base/configmap.yaml
    apiVersion: v1
    kind: ConfigMap
    metadata:
       name: bird-ai-config
    data:
       # Example: https://jhenry.ca4.cs351.cloud
       CSRF_TRUSTED_ORIGINS: "http://localhost:8080,http://127.0.0.1:8080"
       # Example: jhenry.ca4.cs351.cloud
       DOMAIN: ""
  4. Apply the updated overlay and restart the app so it picks up the new CSRF_TRUSTED_ORIGINS and DOMAIN values.

    Terminal
    $ kubectl apply -k k8s/overlays/production/
    $ kubectl rollout restart deployment/bird-ai-app
    🚨 Important

    Do not skip the rollout restart. A plain kubectl apply -k updates the ConfigMap in the cluster, but the running bird-ai-app pods captured the old environment variables when they started, and environment variables are frozen for the lifetime of a process. If you visit your HTTPS URL before restarting, the login page will render fine, but submitting the form will return 403 Forbidden (CSRF verification failed) because Django's running process still thinks the only trusted origin is http://localhost:8080 from the previous section. This is the same "edit config, then tell the service to reload it" rule from the Reflect prompt in the previous section, and it is the single most common footgun in the assignment.

  5. Visit https://jhenry.ca4.cs351.cloud (substituting your own subdomain) in your browser. The app should load with a valid HTTPS certificate and show your data. Log in, browse your history, and upload a geotagged image. After a few seconds, check the map page. Your new upload should appear there, along with every other student's most recent geotagged detections. The map is backed by a shared gossip service that collects detection metadata from every student's namespace, so you are seeing the class's combined map of the world's birds and squirrels.

  6. Once you have verified the Kubernetes app is fully working, destroy the original EC2 and RDS infrastructure.

    🚨 Important

    terraform destroy will permanently delete your RDS database. Before running this, verify that your Kubernetes app is fully working: log in, browse your history, upload a new image. If you need to start over, you will NOT be able to recover the original data, you will have to re-run the first section of this assignment from scratch.

    Terminal
    $ cd terraform
    $ terraform destroy
🤖 Autograder +20 (70/100)
🍭 Food for thought

Look back at the three switchovers you just performed: moving postgres into the cluster, moving the app runtime into the cluster, and moving public access onto the shared Ingress. Which one was the most irreversible? If you had to do this migration on a live customer-facing production system, which switchover would you most want a rollback plan for, and why?

Extract ML Microservice

You have one monolith left to break apart. Right now the bird-ai-app pod handles HTTP, Django views, Postgres, MinIO uploads, AND runs the YOLO image-segmentation model in-process to detect birds and squirrels. The first four are cheap, stateless work. The fifth needs gigabytes of memory, benefits from specialized hardware, and has a completely different scaling profile from the rest of the app. In this section you will split the YOLO inference out of the Django monolith into its own dedicated ml-service microservice, running in its own pod, reachable from bird-ai-app by an in-cluster Service URL.

🌈 The More You Know

There are lots of reasons to extract a service from a monolith: team autonomy, independent deploy cycles, language or framework diversity, fault isolation. The reason that matters here is resource profile. Django's request and response handlers are lightweight: a few megabytes of Python per request, I/O bound, scaled by adding small replicas. YOLO inference is the opposite: gigabytes of model weights resident in memory per pod, CPU or GPU bound, scaled by adding larger but fewer replicas on instance types sized for ML work.

Packing both into the same pod forces a bad compromise. Every bird-ai-app pod carries the model weights in memory whether it is serving an upload or not, the pod's memory limit has to be sized for the worst case, and the instance type you pick has to fit ML work even though most of what the pod does is plain HTTP. Splitting them means you can scale the Django pods horizontally on cheap general-purpose nodes, and scale the ml-service pods vertically (or put them on a completely different node pool) without touching the rest of the app.

  1. Review the provided files in ml-service/.

    That is the whole service. Four files, one endpoint. That is the "micro" in microservice.

  2. Log in to the instructor ECR registry, build the image for linux/amd64, and push it with your subdomain as the tag. Substitute your own subdomain for jhenry in the commands below.

    Terminal
    $ aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 949430458732.dkr.ecr.us-east-1.amazonaws.com
    $ docker build --platform linux/amd64 -t 949430458732.dkr.ecr.us-east-1.amazonaws.com/cs351-ml-service:jhenry ml-service/
    $ docker push 949430458732.dkr.ecr.us-east-1.amazonaws.com/cs351-ml-service:jhenry

    The build step downloads the PyTorch base image and copies the model weights into the image, so it can take a few minutes on a first run. While it runs, you can move on to step 3.

    💡 Hint

    The ECR registry you are pushing to lives in the instructor's AWS account (949430458732), not your student account. The cs351-ml-service repository has a cross-account repository policy that allows push and pull from any AWS principal, which is why your ca4 IAM user can push here without any extra setup in your own account. Everyone pushing to the same repo under different tags is how we keep the grader's ECR lookup simple: it just looks for an image tagged with your subdomain.

  3. Create k8s/base/ml-service.yaml. The fastest way is to copy k8s/base/app.yaml as a starting template and edit the fields below. Both files declare a Deployment and a Service, so the overall shape is the same, only the values change.

    Terminal
    $ cp k8s/base/app.yaml k8s/base/ml-service.yaml
    #FieldOld valueNew value
    1metadata.name (Deployment)bird-ai-appml-service
    2spec.selector.matchLabels.appbird-ai-appml-service
    3spec.template.metadata.labels.appbird-ai-appml-service
    4Container namebird-ai-appml-service
    5Container imagecs351-bird-ai-app:latest949430458732.dkr.ecr.us-east-1.amazonaws.com/cs351-ml-service:jhenry
    6Container ports[0].containerPort80005000
    7Readiness probe path/login//health
    8Readiness probe port80005000
    9envFrom blockconfigmap + secret refsdelete the whole block
    10env block (if present)any valuesdelete the whole block
    11metadata.name (Service)bird-ai-appml-service
    12Service spec.selector.appbird-ai-appml-service
    13Service port80005000
    14Service targetPort80005000

    Keep the strategy: { type: Recreate } and the resource requests and limits unchanged. Your namespace's ResourceQuota cannot fit two ml-service pods at once, so a rolling update would get stuck waiting for quota that will never free up. Recreate tells Kubernetes to delete the old pod first and only then start the new one.

    🌈 The More You Know

    A Kubernetes manifest is a YAML file that declares the desired state of one or more Kubernetes objects. The file you just copied contains two objects separated by a --- document marker: a Deployment (which manages a ReplicaSet that manages pods) and a Service (which gives those pods a stable DNS name inside the cluster). The Deployment is the thing that keeps your pod running and handles rollouts when you change the manifest. The Service is the thing that routes traffic to whichever pod the Deployment currently owns. You need both even for a single-replica service, because the pod's IP changes every time it restarts but the Service's name stays constant.

  4. Add ml-service.yaml to the resources list in k8s/base/kustomization.yaml so the overlay picks it up.

  5. Edit k8s/base/configmap.yaml and set ML_SERVICE_URL to the in-cluster address of your new Service.

    k8s/base/configmap.yaml
    # Set ML_SERVICE_URL to the in-cluster address of your ML
       # microservice. Leave empty to run inference locally in the
       # Django pod (the monolith default).
       #
       # Example: http://ml-service:5000
       ML_SERVICE_URL: ""
  6. Apply the overlay and restart the app so it picks up the new ML_SERVICE_URL environment variable. You know the drill by now.

    Terminal
    $ kubectl apply -k k8s/overlays/production/
    $ kubectl rollout restart deployment/bird-ai-app
  7. Verify the split works end-to-end. Visit https://jhenry.ca4.cs351.cloud (substituting your own subdomain), log in, and upload an image of a bird or squirrel. Then tail the logs of the ml-service pod to confirm the inference is actually running there and not in the Django process.

    Terminal
    $ kubectl logs deploy/ml-service --tail=20
    * Running on http://0.0.0.0:5000
    127.0.0.1 - - [...] "GET /health HTTP/1.1" 200 -
    0: 352x640 1 bird, 1639.1ms
    127.0.0.1 - - [...] "POST /predict HTTP/1.1" 200 -

    The 0: 352x640 1 bird, 1639.1ms line is YOLO's per-request inference log. If you see it, your bird.ai deployment is now genuinely microservice-shaped: HTTP and application logic in one pod, inference in another, talking over an in-cluster Service.

    The new architecture

    :80

    :8000 /

    :9000 /media/

    :5432

    :9000 uploads

    :5000 /predict

    PVC

    PVC

    User

    bird-ai-proxy
    nginx

    bird-ai-app
    Django

    minio
    S3-compatible

    postgresql

    ml-service
    YOLO inference

    postgres PVC

    minio PVC

    (click to open)

🤖 Autograder +30 (100/100)
🎉 Nice work!

You migrated bird.ai from an N-Tier monolith on EC2 and RDS to a microservice architecture on Kubernetes with a shared Ingress, a self-managed database, a self-managed object store, and a dedicated inference service. Every piece of the Strangler Fig pattern you read about in the first section is now something you have done by hand.

🔮 Vibe Check

How did Cloud Assignment 4 go? Let us know on Ed.

Estimating Cloud Cost

CA4's cost story has two halves. Your bird.ai workload after the database migration runs inside a shared EKS cluster managed by the course staff, and every pod, Service, Ingress, and load balancer in that cluster is free to you. What you do pay for is the EC2 instance and RDS database you stood up in the first section, and only while they are running. Because you built both with Terraform, the total cost of the assignment is really the sum of the hours you leave those two resources standing.

What you pay for

Rates for the resources you provision in your own account (us-east-1, on-demand):

ResourceTypeApprox Rate
EC2 instancec7i-flex.large~$0.0725/hr
RDS instancedb.t3.micro PostgreSQL~$0.018/hr
EBS root volume15 GB gp3~$1.20/mo
RDS storage20 GB gp2~$2.30/mo
ECR push to cs351-ml-serviceinstructor's repositoryFree

EC2 is billed per second and RDS is billed per hour while they exist. terraform destroy deletes them and billing stops. terraform apply brings them back and billing resumes. Storage is billed per GB-month and is deleted along with the instance.

If your AWS account is new, both instance types above are covered by the current AWS Free Tier credit pool, so your effective cost is zero until that pool is exhausted. The estimates below assume you are OFF the free tier and are therefore conservative upper bounds.

Minimizing your bill

ScenarioHours billedCost
Complete the first four sections in one sitting (~6 hrs)6~$0.50
Spread across a week, destroy between each session~10~$0.80
Spread across a week, never destroy~168~$13
Forget about it for a month~720~$56

Rows two and three represent the same amount of actual work. The difference is idle time. Leaving the EC2 and RDS running overnight costs roughly fifteen times more than destroying them at the end of each session. terraform destroy takes under three minutes and drops your hourly bill back to zero until the next terraform apply.

Once you have completed Public DNS and Teardown, your ongoing cost for CA4 is $0. Everything after that runs in the shared cluster.

Why the shared cluster is nearly free

The cs351 EKS cluster has real running costs: the EKS control plane, two to four worker nodes, the NLB fronting the cluster, a wildcard TLS certificate, the hosted zone for ca4.cs351.cloud, and the ECR repositories that hold the bird.ai and ml-service images. Those are paid by the instructor and not passed on to you. A rough comparison against running the same workload yourself:

ApproachMonthly cost
Your own EKS cluster + 1 t3.medium node + NLB~$115
k3s on a single t3.large EC2 + NLB~$76
ECS Fargate + ALB~$60
Your slice of the shared cs351 cluster~$2–3

The shared cluster is roughly twenty to fifty times cheaper than the cheapest self-hosted option. The fixed costs (control plane, NLB, certificate, hosted zone) are amortized across every student using the cluster, so each of you pays a small fraction of what you would pay alone. The same amortization is how managed Kubernetes services (EKS Auto Mode, GKE Autopilot) and application platforms (Fly.io, Railway, Render) charge per-tenant prices an order of magnitude below the self-hosted equivalent.

You pay back those savings by accepting less control over Kubernetes version, node instance types, upgrade timing, and which add-ons are installed. That is the core tradeoff of every platform-as-a-service: you give up some control in exchange for not having to operate (or pay for) the platform itself.

Teardown

🚨 Important

Once you have a full score on Gradescope, tear down everything you provisioned in your own AWS account. Your grade is already recorded, and leaving resources running is the only way CA4 can cost you real money. Double-check both the EC2 console and the RDS console to confirm they are empty.

  1. Destroy the N-Tier infrastructure if you did not already do it in Public DNS and Teardown. This deletes the EC2 instance, the RDS database, the security groups, the key pair, and the storage volumes Terraform created with them.

    Terminal
    $ export AWS_PROFILE=ca4
    $ cd terraform
    $ terraform destroy
  2. Clean up your Kubernetes namespace. Delete every resource the production overlay applied so your namespace is empty and no pods are running.

    Terminal
    $ export KUBECONFIG=~/.kube/eks-student.yaml
    $ kubectl delete -k k8s/overlays/production/

    The namespace itself is managed by the course staff and will be removed after the semester ends, so you do not need to delete it yourself.

  3. Verify nothing is running in your own account. Both of these commands should return empty output.

    Terminal
    $ aws ec2 describe-instances --filters "Name=instance-state-name,Values=running" --query 'Reservations[].Instances[].InstanceId' --output text
    $ aws rds describe-db-instances --query 'DBInstances[].DBInstanceIdentifier' --output text

    If either command prints an identifier, find what is left and destroy it, either with terraform destroy (if Terraform owns it) or by hand in the AWS console.

  4. Optional: remove the ca4 IAM user. If you do not plan to reuse this access key for a future assignment, delete the user and its access key in the IAM console. This is good security hygiene: a leaked access key in the future cannot be used against your account if the user no longer exists.

Your CS351 course budget alert from the first section will email you if anything slips through. If you ever see a budget alert after this teardown, come back to step 3 and figure out what is still running.