Cloud Assignment 4

Purdue University CS351 Spring 2026

Due Friday April 24th 11:00PM

Introduction

During this assignment you will migrate a monolithic Software-as-a-Service application to a microservices architecture running on Kubernetes. You'll deploy the existing N-Tier bird.ai stack one last time on EC2 and RDS, then incrementally pull each piece into a managed Kubernetes cluster (Amazon EKS) using the Strangler Fig migration pattern. You'll write your own Kubernetes manifests, configure an ingress controller for public HTTPS access, extract the ML inference path into a dedicated microservice, and tear down the legacy infrastructure once the new architecture is verified. Along the way you'll get hands-on practice with kubectl, Kustomize overlays, ConfigMaps and Secrets, persistent storage, and the operational rituals that make zero-downtime migrations possible.

📌 Notice

Some hyperlinks in this document may not work unless you right-click to open them in a new tab.

This assignment is distributed as an HTML file on Brightspace, which displays it you in an iframe. An iframe's capabilities are limited for security reasons, and they also break the same-origin policy model of web security used by many websites. Certain websites restrict cross-origin resource sharing, causing our hyperlinks to them to fail when they load in an iframe.

Initial Setup
Update from bird.ai
Installing required software
Deploy the N-Tier Architecture
1. Deploy bird.ai on EC2 and RDS
2. Configure kubectl
Migrate the Database
bird.ai on Kubernetes
Public DNS and Teardown
Extract ML Microservice
Estimating Cloud Cost
Teardown

Initial Setup

🚨 Important

We've added an additional AWS IAM step, pay careful attention to the section Create IAM User.

Use an account eligible for the AWS Free Tier. Make a post on Ed if you don't have an eligible account. This assignment will use AWS region "us-east-1", do not use AWS services in any other region. You can control your current region from the menu bar of the AWS Console.

https://us-east-1.console.aws.amazon.com/console/home?region=us-east-1#

AWS Console Regions menu with cursor pointing at us-east-1

Set AWS Budget

We're going to set up an AWS Budget that will alert you if you get any monetary charges.

Navigate to the Billing feature in the AWS Billing and Cost Management Service.
Click "Create Budget"
Select "Use a template" then "Zero spend budget"
Add your email to the "Email recipients" text area.
Click "Create budget"

Set AWS IAM Permissions

We're going to change the permission structure that grants instructors read-only access to your account so our autograding scripts can check your progress on a Cloud Assignment.

Navigate to IAM (Identity and Access Management) in your AWS account.
Click on "Roles" in the left-hand navigation menu.
- Click "Create role".
- Select "Custom trust policy" from the trusted entity type options.
- Paste in the following JSON
```
{
    "Version": "2012-10-17",
    "Statement": [
	{
	    "Effect": "Allow",
	    "Action": "sts:AssumeRole",
	    "Principal": {
		"AWS": "arn:aws:iam::949430458732:user/autograder"
	    },
	    "Condition": {
		"StringEquals": {
		    "sts:ExternalId": "<TODO_REPLACE_WITH_YOUR_OWN_EXTERNAL_ID>"
		}
	    }
	}
    ]
}
```
- In the trust policy JSON, set "sts:ExternalId" with a unique string of your own making. Do not share it with anyone, but store it for later use in the credentials file.
- Click "Next".
- Filter the permissions policies by "AWS managed - job function" and search for "ReadOnlyAccess".
- Check "ReadOnlyAccess"
- Click "Next"
- Fill in "CS351-autograder" for "Role name"
- Click "Create role"

Finish by creating a new credentials file in an INI file format.

credentials

[default]
aws_account_id = YOUR_AWS_ACCOUNT_ID
external_id = YOUR_EXTERNAL_ID_FROM_THE_TRUST_POLICY

🚨 Important

Safekeep the credentials file, you will use it for assignment submissions during Cloud Assignment 4.

Create IAM User

We're going to create the IAM User "ca4" you'll be using for programmatic access to and control of AWS services.

Navigate to IAM (Identity and Access Management) in your AWS account.
Click on "Users" in the left-hand navigation menu.
- Click "Create user"
- Write "ca4" as the user name and click "Next"
- Under "Permissions Options" select "Attach Policies Directly"
- Under "Permissions Policies", filter by "AWS managed - job function" and select "PowerUserAccess"
- Click "Next" then "Create user"
🌈 The More You Know
PowerUserAccess grants full access to the AWS services we'll touch (EC2, RDS, ECR, STS) but blocks IAM management. This is for security: if your ca4 access key leaks, an attacker cannot create new IAM identities or escalate privileges in your account.
Click the new "ca4" user on the Users list page.
- Select the "Security credentials" tab
- Click "Create access key"
- Select "Other" then click "Next" and click "Create access key"
- Copy the "Access key" and "Secret access key" and store them temporarily (we'll use them in the next step)
Open a terminal and run aws configure --profile ca4
1. Paste in your "Access key" from Step 3 as the "AWS Access Key ID"
2. Paste in your "Secret access key" from Step 3 as the "AWS Secret Access Key"
3. Set your "Default region name" as "us-east-1"
4. Set your "Default output format" as "json"

🚨 Important

Every shell session during this assignment needs the AWS_PROFILE environment variable set.

Terminal

$ export AWS_PROFILE="ca4"
$ aws sts get-caller-identity
{
"UserId": "AIDA...",
"Account": "<YOUR_ACCOUNT_ID>",
"Arn": "arn:aws:iam::<YOUR_ACCOUNT_ID>:user/ca4"
}

Verify that the "Account" value matches the AWS Account ID you'll be using for this assignment, and that the "ARN" value ends in :user/ca4, the user you'll complete the assignment using.

Optional: add export AWS_PROFILE=ca4 to your shell's rc file (e.g. ~/.bashrc if using bash) while you're working on this assignment.

Update from bird.ai

Three letters have been on everyone's mind: IPO. No, it's not that beer all the millennials drink, that's IPA. IPO is an "Initial Public Offering", and the office is buzzing about it. The company has been a lot more relaxed since it overcame its scaling challenges, and mentions of PMF have given way to meetings about KPIs and SLAs.

Headcount is growing. "Two's a company, three's a crowd" you think to yourself as the hum of the office drowns out the traffic sitting outside your new company headquarters. Business has been doing well, and you've doubled the size and number of teams in the company to meet the demands of your new customers.

With scale comes new challenges, however. Your competitive advantage is your innovative technology, speed to market, and customer relationships, but these large new teams have been stepping on each other's toes, causing conflict and slowing down delivery of new features.

You've been stuck in long meetings with the board of executives discussing the IPO, and planning where you'll invest the new capital. Your CTO is looking forward to the research projects you'll be able to fund, the CFO has already made a list of potential acquisition targets, and as CEO you're worried about keeping ahead of your competitors.

Looking to the future, you recognize your software architecture can't keep pace with the changes you anticipate. For the first time in your life, you decide to be proactive and take care of the problem before the last minute.

📋 Instructions

Complete the five sections that follow to migrate bird.ai from its existing N-Tier architecture onto Kubernetes. Each section builds on the previous one, but you can submit your credentials to Gradescope at any point along the way for partial credit. The autograder gives independent feedback on every section. Some steps tell you exactly what to do, and others give you hints or ask you to apply something you learned in a previous Cloud Assignment. When you see a

🤖 Autograder

block at the end of a section, that's your cue to submit and check your progress before moving on.

🌈 The More You Know

This document uses jhenry as a placeholder for your own subdomain. The name is a nod to John Henry, the "steel-driving man" of American folklore who raced a steam-powered drill through a mountain and won, then died with his hammer in his hand. The story feels freshly relevant in 2026: you are about to do the kind of deep infrastructure work a generation of engineers learned by hand, at exactly the moment in software history when AI tools are automating that same work faster than any of us can keep up with. But don't worry, you won't die completing this assignment 😄.

Installing required software

This assignment uses quite a few tools to execute the migration from an N-Tier Architecture to Kubernetes. Install each of these on your personal computer before the first section. All of them are cross-platform, pick the install method that matches your operating system.

Terraform (v1.0 or newer)

Follow the official installation instructions. On macOS, brew install terraform works. On WSL Ubuntu, HashiCorp ships an apt repository, the instructions walk you through it.

Ansible

Follow the official installation instructions. pipx install ansible is the recommended path on any system with Python already installed.

Docker

Install Docker Desktop on macOS or Windows. On native Linux or WSL, install the Docker engine directly per your distribution's guide.

kubectl

Follow the official kubectl install guide. On macOS, brew install kubectl. On WSL Ubuntu, use the Kubernetes apt repository.

AWS CLI v2

Install AWS CLI v2. Version 1 (what your package manager may have as awscli) is end-of-life and won't work for the commands in this assignment.

PostgreSQL client tools (v17)

You need pg_dump and psql version 17 to match the RDS Postgres engine this assignment provisions. pg_dump refuses to dump a server newer than the client, so an older postgresql-client package will fail during the database migration.

macOS: brew install libpq && brew link --force libpq
WSL Ubuntu: sudo apt install postgresql-client-17. If your apt only ships older versions, add the official PostgreSQL apt repo first.

SSH key pair

You'll generate a throwaway SSH key pair dedicated to this assignment and store it inside the starter directory (not ~/.ssh).

From inside the starter root (the directory containing the assignment README.md):

Terminal

$ ssh-keygen -t ed25519 -f ./id_ed25519 -C ca4

When prompted for a passphrase, you may leave it empty. This is appropriate for ephemeral course infrastructure, for production keys you'd normally set a passphrase and use ssh-agent.

Deploy the N-Tier Architecture

Architecture migrations are tricky endeavors for an engineering team, especially one whose software serves a significant number of users. It's like driving a car and trying to change the tires, windshields, and eventually the engine, while trying not to slow down or have any passengers fly out the window.

A practical approach to these kinds of migrations has been developed, called the Strangler Fig pattern. The goal is to pick apart an existing service piece-by-piece, examining the data flow, isolating core services, and understanding what can be run concurrently. The key moments are the switchovers, when a part of the old architecture is swapped out with a service in the new architecture. At each switchover you verify the service is continuing to run as normal by monitoring error rates, performance metrics, and validating responses from the new service component.

We're going to get the existing N-Tier Architecture up and running so we have a target to migrate from. Get ready to practice using Terraform and Ansible again.

Deploy bird.ai on EC2 and RDS

From the starter root, run Terraform to provision an EC2 instance and an RDS Postgres database.
Terminal
$ cd terraform $ terraform init $ terraform apply -var "public_key=$(cat ../id_ed25519.pub)" $ cd ..
This creates a free-tier-eligible EC2 instance and a db.t3.micro RDS postgres database. The RDS instance takes a few minutes to come up, so Terraform's apply step can run for roughly five minutes.

Run the Ansible playbook. It installs Docker on the EC2, pulls the pre-built bird-ai-app image from the class ECR, and runs it with a DATABASE_URL pointing at your new RDS instance.

Terminal

$ cd ansible
$ ansible-playbook playbook.yml
...
TASK [Show app URL] *********************************************************************
ok: [birdai-app] =>
   msg: bird.ai is running at http://18.234.100.45

PLAY RECAP ******************************************************************************
birdai-app: ok=14 changed=8 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
localhost: ok=3 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
$ cd ..

The final "Show app URL" task prints the public IP of your new bird.ai instance.

Open the URL in your browser. Create an account, upload a photo, and verify that detection and history work. Upload at least one geotagged image so you have real data to watch survive the migration in the next section.

🤖 Autograder +10 (10/100)

Configure kubectl

Kubernetes groups related resources into namespaces, which act as lightweight isolation boundaries inside a cluster. Every student in CS351 has their own namespace on the shared cs351 EKS cluster, and a role binding that grants full CRUD on pods, deployments, services, configmaps, secrets, PVCs, and ingresses inside that one namespace. You cannot see or touch any other student's namespace, and you cannot create cluster-scoped resources like ClusterRoles or Nodes.

Your namespace is derived from your Purdue email. Take the local part (everything before @), lowercase it, replace any dots with hyphens, then prefix with student-:

Email	Namespace
`jhenry@purdue.edu`	`student-jhenry`
`alice.wong@purdue.edu`	`student-alice-wong`

The provisioning happens automatically once you're enrolled. If kubectl get pods below fails with "Unauthorized", post on Ed with your AWS account ID and an instructor will confirm.

From the starter root, substitute your namespace into the kubeconfig template and save the result to ~/.kube/eks-student.yaml:

Terminal

$ mkdir -p ~/.kube
$ sed 's/NAMESPACE/student-YOURNAME/g' kubeconfig-template.yaml > ~/.kube/eks-student.yaml
$ export KUBECONFIG=~/.kube/eks-student.yaml
$ kubectl get pods
No resources found in student-YOURNAME namespace.

🌈 The More You Know

A kubeconfig file tells kubectl three things: what cluster to talk to, what credentials to present, and what namespace to scope commands to. The template you just filled in points at the instructor's EKS cluster, uses an exec credential plugin to call aws eks get-token (so your AWS ca4 profile becomes your Kubernetes identity), and scopes every command to your namespace. The sed command substitutes your namespace into two places in the template: the namespace: field on the context, and the IAM role ARN in the exec block.

The No resources found message is the success state. The cluster is reachable, your namespace exists, and you have permission to list pods in it. There are zero pods so far because you haven't deployed anything to Kubernetes yet, that's the next section.

🍭 Food for thought

You just reproduced everything you built by hand in CA3 with two commands. Look inside terraform/ and ansible/: what exactly did those files abstract away, and what would change if you had to repeat this deploy for a thousand customers instead of one?

Migrate the Database

The first switchover in the Strangler Fig pattern is the database. Right now bird.ai's Django container on EC2 reads and writes against RDS, and we want to break that dependency before moving any application code. If we left RDS in place and just redeployed Django on Kubernetes, we'd still be paying for and operating a managed database instance sitting outside the cluster, and every new pod would need network access back into the RDS private subnet. Instead, we'll move the data into a postgres pod running in our namespace. After this section, Kubernetes is the only system we need to operate for the rest of the migration.

The provided Ansible playbook handles the whole orchestration for you. It runs pg_dump on the EC2 instance (the only host that can reach RDS, which lives in a private subnet), fetches the dump file back to your laptop, and pipes it into the postgres pod via kubectl exec -i. You never have to SSH anywhere by hand. Before running the playbook, you'll generate secrets and bring up the Kubernetes workloads so there's a postgres pod waiting to receive the data.

Generate two strong secrets. openssl rand -hex 25 produces URL-safe hex output, so neither secret will corrupt the DATABASE_URL you'll wire up in the next section.

Terminal

$ openssl rand -hex 25
16e0bdfc0c000ea178d8dcecc765349b87bc974d228f0f1a08
$ openssl rand -hex 25
2b6cca7c44543dc98532feb257d2bd88afe4cfe3e9bd0122db

Edit the two TODO fields in k8s/base/secret.yaml, one for POSTGRES_PASSWORD, one for DJANGO_SECRET_KEY.

k8s/base/secret.yaml

apiVersion: v1
kind: Secret
metadata:
  name: bird-ai-secrets
type: Opaque
stringData:
  POSTGRES_PASSWORD: ""             # TODO
  DJANGO_SECRET_KEY: ""             # TODO
  S3_ACCESS_KEY: minioadmin
  S3_SECRET_KEY: minioadmin

Deploy the whole production overlay into your namespace. This brings up postgres, minio, the bird.ai app, and the bird.ai proxy all at once.
Terminal
$ kubectl apply -k k8s/overlays/production/ configmap/bird-ai-config created secret/bird-ai-secrets created service/bird-ai-app created service/bird-ai-proxy created service/minio created service/postgresql created persistentvolumeclaim/minio-data created persistentvolumeclaim/postgresql-data created deployment.apps/bird-ai-app created deployment.apps/bird-ai-proxy created deployment.apps/minio created deployment.apps/postgresql created job.batch/minio-create-bucket created
🌈 The More You Know
That one kubectl apply spun up four distinct services in your namespace. Each is a separate Deployment with its own ClusterIP Service, so pods can reach it by name.
- postgresql: the Postgres database, now running as a pod in your namespace instead of the managed RDS instance you left behind. Backed by a PersistentVolumeClaim, so data survives pod restarts.
- minio: an S3-compatible object store. It replaces the AWS S3 bucket from CA3 as the place bird.ai uploads user images. A one-shot Job seeds it with the bird-ai-media bucket on first boot.
- bird-ai-app: the Django monolith. It handles HTTP requests, reads and writes postgres, uploads to minio, and (for now) runs the YOLO inference in-process. This is the service you'll break apart in the final section of the assignment.
- bird-ai-proxy: an nginx reverse proxy. It fronts the app, forwards / traffic to bird-ai-app:8000, and serves /media/ paths directly out of minio so Django never has to proxy image downloads.
Architecture diagram
(click to open)
Expect a few things to look broken at first, which is fine for this section:
- postgres and minio come up Ready 1/1.
- bird-ai-app lands in CrashLoopBackOff. The pod has no DATABASE_URL set yet (you'll add it in the next section), so Django fails to connect and the container exits. Kubernetes restarts it a few times, gives up, and marks the pod CrashLoopBackOff.
- bird-ai-proxy is Running but not Ready. Its readiness probe proxies to the crashing app and returns 502.
You can see all four pod states with kubectl get pods:
Terminal
$ kubectl get pods NAME READY STATUS RESTARTS AGE bird-ai-app-7f9bdd4d7b-k2vxp 0/1 CrashLoopBackOff 3 45s bird-ai-proxy-74f5f5d6d9-q8hnl 0/1 Running 0 45s minio-6f4c5b9f98-xw8zt 1/1 Running 0 45s postgresql-7c9d8f6b84-m2nft 1/1 Running 0 45s
We only need postgres to be Ready for the migration. Wait for it to finish rolling out before continuing:
Terminal
$ kubectl rollout status deployment/postgresql deployment "postgresql" successfully rolled out
Run the migration playbook. It reads your Terraform outputs to find the RDS instance, SSHes into the EC2 to install the postgresql17 client and run pg_dump, fetches the dump file back to your laptop, and pipes it into the postgres pod via kubectl exec -i.
Terminal
$ cd ansible $ ansible-playbook migrate-db.yml ... TASK [Show migration row count] ******** ok: [localhost] => msg: django_migrations has 20 rows TASK [Show detection row count] ******** ok: [localhost] => msg: app_detection has 3 rows (your Phase 0 uploads)
If both row counts are non-zero, your migration succeeded. The playbook is idempotent, so you can re-run it safely if you want to reset the Kubernetes postgres to the latest RDS state.
💡 Hint
If the playbook fails with "Permission denied (publickey)", it is looking for your SSH key at ../id_ed25519 relative to the ansible/ directory, which is the project-local key you generated in Prerequisites. If your key is somewhere else, override it explicitly:
Terminal
$ ansible-playbook migrate-db.yml --extra-vars "ssh_key=/absolute/path/to/id_ed25519"

🤖 Autograder +10 (20/100)

🍭 Food for thought

Examine ansible/migrate-db.yml. List every step it takes and group them into three phases: (1) things it runs on your laptop, (2) things it runs on the EC2 instance, and (3) things it runs inside the Kubernetes postgres pod. What is the purpose of each phase, and why can't it all run on your laptop?

bird.ai on Kubernetes

The data is migrated but bird.ai is still stuck in CrashLoopBackOff. The postgres pod is up, the secrets are in place, the Deployment is rolled out, and yet Django refuses to start. This is the standard shape of a debugging session on Kubernetes: some pods are happy, one pod is not, and the answer is in the logs.

Start with the logs of the most recent crash. --previous asks for the logs of the previously terminated container instead of the current (possibly restarting) one.
Terminal
$ kubectl logs deploy/bird-ai-app --previous --tail=5 return self._cursor() ^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/django/db/backends/dummy/base.py", line 20, in complain raise ImproperlyConfigured( django.core.exceptions.ImproperlyConfigured: settings.DATABASES is improperly configured. Please supply the ENGINE value. Check settings documentation for more details.
💡 Hint
If kubectl logs returns nothing, your pod has not actually started yet. Try kubectl describe pod -l app=bird-ai-app instead. Describe shows Kubernetes' view of the pod: scheduling events, image pull status, probe results, and the exit code of the last container. Logs tell you what the container said, describe tells you what Kubernetes thinks happened to the pod.

Django's settings.DATABASES is unconfigured. In bird.ai, settings.py builds the database connection from the DATABASE_URL environment variable, and that variable is empty. You'll set it in the ConfigMap and roll the Deployment to pick up the new value.

🌈 The More You Know
A ConfigMap is a Kubernetes object that holds non-confidential configuration as key/value pairs. Pods consume ConfigMaps in one of three ways: as environment variables (via envFrom), as command-line arguments, or as files mounted into a volume. The bird-ai-app Deployment uses envFrom, so every key in the bird-ai-config ConfigMap is injected as an env var when the pod starts. Anything Django reads from os.environ, including DATABASE_URL, comes from there.
A Secret has the same shape, but is intended for sensitive values like passwords, keys, and tokens. That is why POSTGRES_PASSWORD and DJANGO_SECRET_KEY live in secret.yaml instead of configmap.yaml, even though Django reads them from os.environ the same way. URLs go in ConfigMaps, passwords go in Secrets.
Edit k8s/base/configmap.yaml and fill in DATABASE_URL.
💡 Hint
A Postgres connection URL has this shape:
postgres://USER:PASSWORD@HOST:PORT/DATABASE
For your in-cluster postgres:
- USER is bird_ai, set by POSTGRES_USER in k8s/base/postgres.yaml.
- PASSWORD is the POSTGRES_PASSWORD value you pasted into k8s/base/secret.yaml in the previous section.
- HOST is postgresql. Every Kubernetes Service in your namespace is reachable by the service's name as a DNS hostname, so postgresql resolves to whichever pod the Service currently routes to.
- PORT is 5432, the default Postgres port exposed by the postgres Service.
- DATABASE is bird_ai, set by POSTGRES_DB in k8s/base/postgres.yaml.
Apply the updated ConfigMap and restart the app Deployment so it picks up the new environment.
Terminal
$ kubectl apply -k k8s/overlays/production/ $ kubectl rollout restart deployment/bird-ai-app
🍭 Food for thought
A plain kubectl apply updates the ConfigMap in the cluster, but it does not restart the pods that mount it. You have to trigger the rollout yourself. Why do you think that is? What other software have we used in this course that has a similar "edit the config, then tell the service to reload it" workflow?

Wait for the rollout to finish, then confirm all four pods are healthy.

Terminal

$ kubectl rollout status deployment/bird-ai-app
Waiting for deployment "bird-ai-app" rollout to finish: 0 of 1 updated replicas are available...
deployment "bird-ai-app" successfully rolled out
$ kubectl get pods
NAME                            READY   STATUS    RESTARTS   AGE
bird-ai-app-95dcbf7c7-tx6vb     1/1     Running   0          36s
bird-ai-proxy-99bc55596-h2s8v   1/1     Running   0          73m
minio-6589b7c8f4-dvcxn          1/1     Running   0          73m
postgresql-b854bbf7d-2xnvg      1/1     Running   0          73m

Try your Kubernetes-powered bird.ai locally by port-forwarding the proxy Service to your laptop.
Terminal
$ kubectl port-forward svc/bird-ai-proxy 8080:80
Visit http://localhost:8080 in your browser. Log in with the user you created back in the first section when bird.ai was still running on EC2 and RDS. Your history is intact. The Strangler Fig just ate the database layer, and you did not lose anything.

🤖 Autograder +30 (50/100)

Public DNS and Teardown

Your bird.ai deployment is working end-to-end on Kubernetes, but it is only reachable through kubectl port-forward on your laptop. Nobody else can load http://localhost:8080 in their browser. In this section you will expose the app at a real public URL (for example, https://jhenry.ca4.cs351.cloud for a student whose namespace is student-jhenry) through the NGINX Ingress controller running on the cluster, verify the whole stack works against the new hostname, and then tear down the EC2 and RDS infrastructure you provisioned in the first section. This is the last switchover of the Strangler Fig pattern. After it, the N-Tier architecture is gone.

🌈 The More You Know

An Ingress resource declares routing rules from an external URL to a Service inside your namespace. Creating the Ingress by itself does not route any traffic, it is just a declaration stored in etcd. A separate component called an Ingress Controller watches the cluster for Ingress resources and implements their routing rules. The cs351 cluster runs the NGINX Ingress Controller, deployed once by the instructor and shared across every student namespace.

DNS for ca4.cs351.cloud is handled with a single wildcard: a *.ca4.cs351.cloud A record that points at a shared AWS Network Load Balancer in front of the NGINX Ingress Controller. Every subdomain under ca4.cs351.cloud resolves to the same NLB. The controller then routes requests to your Service by inspecting the incoming Host: header and matching it against each Ingress object's rules[*].host. You do not register your subdomain anywhere. As long as your Ingress resource declares host: jhenry.ca4.cs351.cloud, requests to that hostname find your Service automatically.

TLS is handled the same way. cert-manager running on the cluster issued one wildcard certificate for *.ca4.cs351.cloud via a DNS-01 challenge, and the Ingress Controller terminates HTTPS with that cert for every student subdomain. Nothing to configure on your end.

Edit k8s/base/ingress.yaml and replace SUBDOMAIN with your namespace minus the student- prefix. For example, if your namespace is student-jhenry, use jhenry.

k8s/base/ingress.yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
 name: bird-ai-ingress
 annotations:
   nginx.ingress.kubernetes.io/proxy-body-size: "20m"
spec:
 ingressClassName: nginx
 rules:
 - host: SUBDOMAIN.ca4.cs351.cloud
   http:
     paths:
     - path: /
       pathType: Prefix
       backend:
         service:
           name: bird-ai-proxy
           port:
             number: 80

Uncomment # - ingress.yaml in k8s/base/kustomization.yaml so the Ingress resource is actually applied when you run the overlay.
k8s/base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization resources: - app.yaml - proxy.yaml - postgres.yaml - minio.yaml - minio-bucket-job.yaml - configmap.yaml - secret.yaml # TODO uncomment after editing ingress.yaml # - ingress.yaml
🌈 The More You Know
Kustomize is a template-free Kubernetes config tool built into kubectl. A kustomization.yaml file lists a set of resource manifests under resources:, and kubectl apply -k applies them as a group. On top of that base you can stack overlays that patch specific fields (image tags, replica counts, environment variables) per environment, which is how the starter code keeps overlays/local/ (minikube) and overlays/production/ (EKS) distinct without duplicating every base manifest.
The ingress.yaml entry starts commented out so the overlay you applied in the database migration section works without requiring you to also fill in an Ingress resource. Now that your app is healthy and you have a subdomain, it is time to flip that entry on.

Edit k8s/base/configmap.yaml and set CSRF_TRUSTED_ORIGINS and DOMAIN to your new public URL. CSRF_TRUSTED_ORIGINS needs the full URL including the https:// scheme, while DOMAIN is just the hostname.

k8s/base/configmap.yaml

apiVersion: v1
kind: ConfigMap
metadata:
   name: bird-ai-config
data:
   # Example: https://jhenry.ca4.cs351.cloud
   CSRF_TRUSTED_ORIGINS: "http://localhost:8080,http://127.0.0.1:8080"
   # Example: jhenry.ca4.cs351.cloud
   DOMAIN: ""

Apply the updated overlay and restart the app so it picks up the new CSRF_TRUSTED_ORIGINS and DOMAIN values.
Terminal
$ kubectl apply -k k8s/overlays/production/ $ kubectl rollout restart deployment/bird-ai-app
🚨 Important
Do not skip the rollout restart. A plain kubectl apply -k updates the ConfigMap in the cluster, but the running bird-ai-app pods captured the old environment variables when they started, and environment variables are frozen for the lifetime of a process. If you visit your HTTPS URL before restarting, the login page will render fine, but submitting the form will return 403 Forbidden (CSRF verification failed) because Django's running process still thinks the only trusted origin is http://localhost:8080 from the previous section. This is the same "edit config, then tell the service to reload it" rule from the Reflect prompt in the previous section, and it is the single most common footgun in the assignment.
Visit https://jhenry.ca4.cs351.cloud (substituting your own subdomain) in your browser. The app should load with a valid HTTPS certificate and show your data. Log in, browse your history, and upload a geotagged image. After a few seconds, check the map page. Your new upload should appear there, along with every other student's most recent geotagged detections. The map is backed by a shared gossip service that collects detection metadata from every student's namespace, so you are seeing the class's combined map of the world's birds and squirrels.
Once you have verified the Kubernetes app is fully working, destroy the original EC2 and RDS infrastructure.

🚨 Important
terraform destroy will permanently delete your RDS database. Before running this, verify that your Kubernetes app is fully working: log in, browse your history, upload a new image. If you need to start over, you will NOT be able to recover the original data, you will have to re-run the first section of this assignment from scratch.
Terminal
$ cd terraform $ terraform destroy

🤖 Autograder +20 (70/100)

🍭 Food for thought

Look back at the three switchovers you just performed: moving postgres into the cluster, moving the app runtime into the cluster, and moving public access onto the shared Ingress. Which one was the most irreversible? If you had to do this migration on a live customer-facing production system, which switchover would you most want a rollback plan for, and why?

Extract ML Microservice

You have one monolith left to break apart. Right now the bird-ai-app pod handles HTTP, Django views, Postgres, MinIO uploads, AND runs the YOLO image-segmentation model in-process to detect birds and squirrels. The first four are cheap, stateless work. The fifth needs gigabytes of memory, benefits from specialized hardware, and has a completely different scaling profile from the rest of the app. In this section you will split the YOLO inference out of the Django monolith into its own dedicated ml-service microservice, running in its own pod, reachable from bird-ai-app by an in-cluster Service URL.

🌈 The More You Know

There are lots of reasons to extract a service from a monolith: team autonomy, independent deploy cycles, language or framework diversity, fault isolation. The reason that matters here is resource profile. Django's request and response handlers are lightweight: a few megabytes of Python per request, I/O bound, scaled by adding small replicas. YOLO inference is the opposite: gigabytes of model weights resident in memory per pod, CPU or GPU bound, scaled by adding larger but fewer replicas on instance types sized for ML work.

Packing both into the same pod forces a bad compromise. Every bird-ai-app pod carries the model weights in memory whether it is serving an upload or not, the pod's memory limit has to be sized for the worst case, and the instance type you pick has to fit ML work even though most of what the pod does is plain HTTP. Splitting them means you can scale the Django pods horizontally on cheap general-purpose nodes, and scale the ml-service pods vertically (or put them on a completely different node pool) without touching the rest of the app.

Review the provided files in ml-service/.
- app.py: a complete Flask /predict endpoint that loads YOLO, runs inference on an uploaded image, and returns JSON. You do not need to modify this.
- Dockerfile: provided. Builds the ML service image with the model weights baked in.
- requirements.txt: Python dependencies.
- yoloe-26l-seg.pt and mobileclip2_b.ts: the model weights.
That is the whole service. Four files, one endpoint. That is the "micro" in microservice.
Log in to the instructor ECR registry, build the image for linux/amd64, and push it with your subdomain as the tag. Substitute your own subdomain for jhenry in the commands below.
Terminal
$ aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 949430458732.dkr.ecr.us-east-1.amazonaws.com $ docker build --platform linux/amd64 -t 949430458732.dkr.ecr.us-east-1.amazonaws.com/cs351-ml-service:jhenry ml-service/ $ docker push 949430458732.dkr.ecr.us-east-1.amazonaws.com/cs351-ml-service:jhenry
The build step downloads the PyTorch base image and copies the model weights into the image, so it can take a few minutes on a first run. While it runs, you can move on to step 3.

💡 Hint
The ECR registry you are pushing to lives in the instructor's AWS account (949430458732), not your student account. The cs351-ml-service repository has a cross-account repository policy that allows push and pull from any AWS principal, which is why your ca4 IAM user can push here without any extra setup in your own account. Everyone pushing to the same repo under different tags is how we keep the grader's ECR lookup simple: it just looks for an image tagged with your subdomain.

Create k8s/base/ml-service.yaml. The fastest way is to copy k8s/base/app.yaml as a starting template and edit the fields below. Both files declare a Deployment and a Service, so the overall shape is the same, only the values change.

Terminal

$ cp k8s/base/app.yaml k8s/base/ml-service.yaml

#	Field	Old value	New value
1	`metadata.name` (Deployment)	`bird-ai-app`	`ml-service`
2	`spec.selector.matchLabels.app`	`bird-ai-app`	`ml-service`
3	`spec.template.metadata.labels.app`	`bird-ai-app`	`ml-service`
4	Container `name`	`bird-ai-app`	`ml-service`
5	Container `image`	`cs351-bird-ai-app:latest`	`949430458732.dkr.ecr.us-east-1.amazonaws.com/cs351-ml-service:jhenry`
6	Container `ports[0].containerPort`	`8000`	`5000`
7	Readiness probe `path`	`/login/`	`/health`
8	Readiness probe `port`	`8000`	`5000`
9	`envFrom` block	configmap + secret refs	delete the whole block
10	`env` block (if present)	any values	delete the whole block
11	`metadata.name` (Service)	`bird-ai-app`	`ml-service`
12	Service `spec.selector.app`	`bird-ai-app`	`ml-service`
13	Service `port`	`8000`	`5000`
14	Service `targetPort`	`8000`	`5000`

Keep the strategy: { type: Recreate } and the resource requests and limits unchanged. Your namespace's ResourceQuota cannot fit two ml-service pods at once, so a rolling update would get stuck waiting for quota that will never free up. Recreate tells Kubernetes to delete the old pod first and only then start the new one.

🌈 The More You Know

A Kubernetes manifest is a YAML file that declares the desired state of one or more Kubernetes objects. The file you just copied contains two objects separated by a --- document marker: a Deployment (which manages a ReplicaSet that manages pods) and a Service (which gives those pods a stable DNS name inside the cluster). The Deployment is the thing that keeps your pod running and handles rollouts when you change the manifest. The Service is the thing that routes traffic to whichever pod the Deployment currently owns. You need both even for a single-replica service, because the pod's IP changes every time it restarts but the Service's name stays constant.

Add ml-service.yaml to the resources list in k8s/base/kustomization.yaml so the overlay picks it up.

Edit k8s/base/configmap.yaml and set ML_SERVICE_URL to the in-cluster address of your new Service.

k8s/base/configmap.yaml

# Set ML_SERVICE_URL to the in-cluster address of your ML
   # microservice. Leave empty to run inference locally in the
   # Django pod (the monolith default).
   #
   # Example: http://ml-service:5000
   ML_SERVICE_URL: ""

Apply the overlay and restart the app so it picks up the new ML_SERVICE_URL environment variable. You know the drill by now.
Terminal
$ kubectl apply -k k8s/overlays/production/ $ kubectl rollout restart deployment/bird-ai-app
Verify the split works end-to-end. Visit https://jhenry.ca4.cs351.cloud (substituting your own subdomain), log in, and upload an image of a bird or squirrel. Then tail the logs of the ml-service pod to confirm the inference is actually running there and not in the Django process.
Terminal
$ kubectl logs deploy/ml-service --tail=20 * Running on http://0.0.0.0:5000 127.0.0.1 - - [...] "GET /health HTTP/1.1" 200 - 0: 352x640 1 bird, 1639.1ms 127.0.0.1 - - [...] "POST /predict HTTP/1.1" 200 -
The 0: 352x640 1 bird, 1639.1ms line is YOLO's per-request inference log. If you see it, your bird.ai deployment is now genuinely microservice-shaped: HTTP and application logic in one pod, inference in another, talking over an in-cluster Service.

The new architecture
(click to open)

🤖 Autograder +30 (100/100)

🎉 Nice work!

You migrated bird.ai from an N-Tier monolith on EC2 and RDS to a microservice architecture on Kubernetes with a shared Ingress, a self-managed database, a self-managed object store, and a dedicated inference service. Every piece of the Strangler Fig pattern you read about in the first section is now something you have done by hand.

🔮 Vibe Check

How did Cloud Assignment 4 go? Let us know on Ed.

Estimating Cloud Cost

CA4's cost story has two halves. Your bird.ai workload after the database migration runs inside a shared EKS cluster managed by the course staff, and every pod, Service, Ingress, and load balancer in that cluster is free to you. What you do pay for is the EC2 instance and RDS database you stood up in the first section, and only while they are running. Because you built both with Terraform, the total cost of the assignment is really the sum of the hours you leave those two resources standing.

What you pay for

Rates for the resources you provision in your own account (us-east-1, on-demand):

Resource	Type	Approx Rate
EC2 instance	c7i-flex.large	~$0.0725/hr
RDS instance	db.t3.micro PostgreSQL	~$0.018/hr
EBS root volume	15 GB gp3	~$1.20/mo
RDS storage	20 GB gp2	~$2.30/mo
ECR push to `cs351-ml-service`	instructor's repository	Free

EC2 is billed per second and RDS is billed per hour while they exist. terraform destroy deletes them and billing stops. terraform apply brings them back and billing resumes. Storage is billed per GB-month and is deleted along with the instance.

If your AWS account is new, both instance types above are covered by the current AWS Free Tier credit pool, so your effective cost is zero until that pool is exhausted. The estimates below assume you are OFF the free tier and are therefore conservative upper bounds.

Minimizing your bill

Scenario	Hours billed	Cost
Complete the first four sections in one sitting (~6 hrs)	6	~$0.50
Spread across a week, destroy between each session	~10	~$0.80
Spread across a week, never destroy	~168	~$13
Forget about it for a month	~720	~$56

Rows two and three represent the same amount of actual work. The difference is idle time. Leaving the EC2 and RDS running overnight costs roughly fifteen times more than destroying them at the end of each session. terraform destroy takes under three minutes and drops your hourly bill back to zero until the next terraform apply.

Once you have completed Public DNS and Teardown, your ongoing cost for CA4 is $0. Everything after that runs in the shared cluster.

Why the shared cluster is nearly free

The cs351 EKS cluster has real running costs: the EKS control plane, two to four worker nodes, the NLB fronting the cluster, a wildcard TLS certificate, the hosted zone for ca4.cs351.cloud, and the ECR repositories that hold the bird.ai and ml-service images. Those are paid by the instructor and not passed on to you. A rough comparison against running the same workload yourself:

Approach	Monthly cost
Your own EKS cluster + 1 `t3.medium` node + NLB	~$115
k3s on a single `t3.large` EC2 + NLB	~$76
ECS Fargate + ALB	~$60
Your slice of the shared `cs351` cluster	~$2–3

The shared cluster is roughly twenty to fifty times cheaper than the cheapest self-hosted option. The fixed costs (control plane, NLB, certificate, hosted zone) are amortized across every student using the cluster, so each of you pays a small fraction of what you would pay alone. The same amortization is how managed Kubernetes services (EKS Auto Mode, GKE Autopilot) and application platforms (Fly.io, Railway, Render) charge per-tenant prices an order of magnitude below the self-hosted equivalent.

You pay back those savings by accepting less control over Kubernetes version, node instance types, upgrade timing, and which add-ons are installed. That is the core tradeoff of every platform-as-a-service: you give up some control in exchange for not having to operate (or pay for) the platform itself.

Teardown

🚨 Important

Once you have a full score on Gradescope, tear down everything you provisioned in your own AWS account. Your grade is already recorded, and leaving resources running is the only way CA4 can cost you real money. Double-check both the EC2 console and the RDS console to confirm they are empty.

Destroy the N-Tier infrastructure if you did not already do it in Public DNS and Teardown. This deletes the EC2 instance, the RDS database, the security groups, the key pair, and the storage volumes Terraform created with them.
Terminal
$ export AWS_PROFILE=ca4 $ cd terraform $ terraform destroy
Clean up your Kubernetes namespace. Delete every resource the production overlay applied so your namespace is empty and no pods are running.
Terminal
$ export KUBECONFIG=~/.kube/eks-student.yaml $ kubectl delete -k k8s/overlays/production/
The namespace itself is managed by the course staff and will be removed after the semester ends, so you do not need to delete it yourself.
Verify nothing is running in your own account. Both of these commands should return empty output.
Terminal
$ aws ec2 describe-instances --filters "Name=instance-state-name,Values=running" --query 'Reservations[].Instances[].InstanceId' --output text $ aws rds describe-db-instances --query 'DBInstances[].DBInstanceIdentifier' --output text
If either command prints an identifier, find what is left and destroy it, either with terraform destroy (if Terraform owns it) or by hand in the AWS console.
Optional: remove the ca4 IAM user. If you do not plan to reuse this access key for a future assignment, delete the user and its access key in the IAM console. This is good security hygiene: a leaked access key in the future cannot be used against your account if the user no longer exists.

Your CS351 course budget alert from the first section will email you if anything slips through. If you ever see a budget alert after this teardown, come back to step 3 and figure out what is still running.

Cloud Assignment 4

Introduction

Contents

Initial Setup

Set AWS Budget

Set AWS IAM Permissions

Create IAM User

Update from bird.ai

Installing required software

Terraform (v1.0 or newer)

Ansible

Docker

kubectl

AWS CLI v2

PostgreSQL client tools (v17)

SSH key pair

Deploy the N-Tier Architecture

Deploy bird.ai on EC2 and RDS

Configure kubectl

Migrate the Database

bird.ai on Kubernetes

Public DNS and Teardown

Extract ML Microservice

Estimating Cloud Cost

What you pay for

Minimizing your bill

Why the shared cluster is nearly free

Teardown