Purdue University CS351 Spring 2026
Due Friday April 24th 11:00PM
During this assignment you will migrate a monolithic Software-as-a-Service application to a microservices architecture running on Kubernetes. You'll deploy the existing N-Tier bird.ai stack one last time on EC2 and RDS, then incrementally pull each piece into a managed Kubernetes cluster (Amazon EKS) using the Strangler Fig migration pattern. You'll write your own Kubernetes manifests, configure an ingress controller for public HTTPS access, extract the ML inference path into a dedicated microservice, and tear down the legacy infrastructure once the new architecture is verified. Along the way you'll get hands-on practice with kubectl, Kustomize overlays, ConfigMaps and Secrets, persistent storage, and the operational rituals that make zero-downtime migrations possible.
Some hyperlinks in this document may not work unless you right-click to open them in a new tab.
This assignment is distributed as an HTML file on Brightspace, which displays it you in an iframe. An iframe's capabilities are limited for security reasons, and they also break the same-origin policy model of web security used by many websites. Certain websites restrict cross-origin resource sharing, causing our hyperlinks to them to fail when they load in an iframe.
We've added an additional AWS IAM step, pay careful attention to the section Create IAM User.
Use an account eligible for the AWS Free Tier. Make a post on Ed if you don't have an eligible account. This assignment will use AWS region "us-east-1", do not use AWS services in any other region. You can control your current region from the menu bar of the AWS Console.
We're going to set up an AWS Budget that will alert you if you get any monetary charges.
We're going to change the permission structure that grants instructors read-only access to your account so our autograding scripts can check your progress on a Cloud Assignment.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Principal": {
"AWS": "arn:aws:iam::949430458732:user/autograder"
},
"Condition": {
"StringEquals": {
"sts:ExternalId": "<TODO_REPLACE_WITH_YOUR_OWN_EXTERNAL_ID>"
}
}
}
]
}
Finish by creating a new credentials file in an INI file format.
[default]
aws_account_id = YOUR_AWS_ACCOUNT_ID
external_id = YOUR_EXTERNAL_ID_FROM_THE_TRUST_POLICYSafekeep the credentials file, you will use it for assignment submissions during Cloud Assignment 4.
We're going to create the IAM User "ca4" you'll be using for programmatic access to and control of AWS services.
Navigate to IAM (Identity and Access Management) in your AWS account.
Click on "Users" in the left-hand navigation menu.
PowerUserAccess grants full access to the AWS services we'll touch (EC2, RDS, ECR, STS) but blocks IAM management. This is for security: if your ca4 access key leaks, an attacker cannot create new IAM identities or escalate privileges in your account.
Click the new "ca4" user on the Users list page.
Open a terminal and run aws configure --profile ca4
Every shell session during this assignment needs the AWS_PROFILE environment variable set.
$ export AWS_PROFILE="ca4"
$ aws sts get-caller-identity
{
"UserId": "AIDA...",
"Account": "<YOUR_ACCOUNT_ID>",
"Arn": "arn:aws:iam::<YOUR_ACCOUNT_ID>:user/ca4"
}Verify that the "Account" value matches the AWS Account ID you'll be using for this assignment, and that the "ARN" value ends in :user/ca4, the user you'll complete the assignment using.
Optional: add export AWS_PROFILE=ca4 to your shell's rc file (e.g. ~/.bashrc if using bash) while you're working on this assignment.
Three letters have been on everyone's mind: IPO. No, it's not that beer all the millennials drink, that's IPA. IPO is an "Initial Public Offering", and the office is buzzing about it. The company has been a lot more relaxed since it overcame its scaling challenges, and mentions of PMF have given way to meetings about KPIs and SLAs.
Headcount is growing. "Two's a company, three's a crowd" you think to yourself as the hum of the office drowns out the traffic sitting outside your new company headquarters. Business has been doing well, and you've doubled the size and number of teams in the company to meet the demands of your new customers.
With scale comes new challenges, however. Your competitive advantage is your innovative technology, speed to market, and customer relationships, but these large new teams have been stepping on each other's toes, causing conflict and slowing down delivery of new features.
You've been stuck in long meetings with the board of executives discussing the IPO, and planning where you'll invest the new capital. Your CTO is looking forward to the research projects you'll be able to fund, the CFO has already made a list of potential acquisition targets, and as CEO you're worried about keeping ahead of your competitors.
Looking to the future, you recognize your software architecture can't keep pace with the changes you anticipate. For the first time in your life, you decide to be proactive and take care of the problem before the last minute.
Complete the five sections that follow to migrate bird.ai from its existing N-Tier architecture onto Kubernetes. Each section builds on the previous one, but you can submit your credentials to Gradescope at any point along the way for partial credit. The autograder gives independent feedback on every section. Some steps tell you exactly what to do, and others give you hints or ask you to apply something you learned in a previous Cloud Assignment. When you see a
This document uses jhenry as a placeholder for your own subdomain. The name is a nod to John Henry, the "steel-driving man" of American folklore who raced a steam-powered drill through a mountain and won, then died with his hammer in his hand. The story feels freshly relevant in 2026: you are about to do the kind of deep infrastructure work a generation of engineers learned by hand, at exactly the moment in software history when AI tools are automating that same work faster than any of us can keep up with. But don't worry, you won't die completing this assignment 😄.
This assignment uses quite a few tools to execute the migration from an N-Tier Architecture to Kubernetes. Install each of these on your personal computer before the first section. All of them are cross-platform, pick the install method that matches your operating system.
Follow the official installation instructions. On macOS, brew install terraform works. On WSL Ubuntu, HashiCorp ships an apt repository, the instructions walk you through it.
Follow the official installation instructions. pipx install ansible is the recommended path on any system with Python already installed.
Install Docker Desktop on macOS or Windows. On native Linux or WSL, install the Docker engine directly per your distribution's guide.
Follow the official kubectl install guide. On macOS, brew install kubectl. On WSL Ubuntu, use the Kubernetes apt repository.
Install AWS CLI v2. Version 1 (what your package manager may have as awscli) is end-of-life and won't work for the commands in this assignment.
You need pg_dump and psql version 17 to match the RDS Postgres engine this assignment provisions. pg_dump refuses to dump a server newer than the client, so an older postgresql-client package will fail during the database migration.
brew install libpq && brew link --force libpqsudo apt install postgresql-client-17. If your apt only ships older versions, add the official PostgreSQL apt repo first.You'll generate a throwaway SSH key pair dedicated to this assignment and store it inside the starter directory (not ~/.ssh).
From inside the starter root (the directory containing the assignment README.md):
$ ssh-keygen -t ed25519 -f ./id_ed25519 -C ca4When prompted for a passphrase, you may leave it empty. This is appropriate for ephemeral course infrastructure, for production keys you'd normally set a passphrase and use ssh-agent.
Architecture migrations are tricky endeavors for an engineering team, especially one whose software serves a significant number of users. It's like driving a car and trying to change the tires, windshields, and eventually the engine, while trying not to slow down or have any passengers fly out the window.
A practical approach to these kinds of migrations has been developed, called the Strangler Fig pattern. The goal is to pick apart an existing service piece-by-piece, examining the data flow, isolating core services, and understanding what can be run concurrently. The key moments are the switchovers, when a part of the old architecture is swapped out with a service in the new architecture. At each switchover you verify the service is continuing to run as normal by monitoring error rates, performance metrics, and validating responses from the new service component.
We're going to get the existing N-Tier Architecture up and running so we have a target to migrate from. Get ready to practice using Terraform and Ansible again.
From the starter root, run Terraform to provision an EC2 instance and an RDS Postgres database.
$ cd terraform
$ terraform init
$ terraform apply -var "public_key=$(cat ../id_ed25519.pub)"
$ cd ..This creates a free-tier-eligible EC2 instance and a db.t3.micro RDS postgres database. The RDS instance takes a few minutes to come up, so Terraform's apply step can run for roughly five minutes.
Run the Ansible playbook. It installs Docker on the EC2, pulls the pre-built bird-ai-app image from the class ECR, and runs it with a DATABASE_URL pointing at your new RDS instance.
$ cd ansible
$ ansible-playbook playbook.yml
...
TASK [Show app URL] *********************************************************************
ok: [birdai-app] =>
msg: bird.ai is running at http://18.234.100.45
PLAY RECAP ******************************************************************************
birdai-app: ok=14 changed=8 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
localhost: ok=3 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
$ cd ..The final "Show app URL" task prints the public IP of your new bird.ai instance.
Open the URL in your browser. Create an account, upload a photo, and verify that detection and history work. Upload at least one geotagged image so you have real data to watch survive the migration in the next section.
Kubernetes groups related resources into namespaces, which act as lightweight isolation boundaries inside a cluster. Every student in CS351 has their own namespace on the shared cs351 EKS cluster, and a role binding that grants full CRUD on pods, deployments, services, configmaps, secrets, PVCs, and ingresses inside that one namespace. You cannot see or touch any other student's namespace, and you cannot create cluster-scoped resources like ClusterRoles or Nodes.
Your namespace is derived from your Purdue email. Take the local part (everything before @), lowercase it, replace any dots with hyphens, then prefix with student-:
| Namespace | |
|---|---|
jhenry@purdue.edu | student-jhenry |
alice.wong@purdue.edu | student-alice-wong |
The provisioning happens automatically once you're enrolled. If kubectl get pods below fails with "Unauthorized", post on Ed with your AWS account ID and an instructor will confirm.
From the starter root, substitute your namespace into the kubeconfig template and save the result to ~/.kube/eks-student.yaml:
$ mkdir -p ~/.kube
$ sed 's/NAMESPACE/student-YOURNAME/g' kubeconfig-template.yaml > ~/.kube/eks-student.yaml
$ export KUBECONFIG=~/.kube/eks-student.yaml
$ kubectl get pods
No resources found in student-YOURNAME namespace.A kubeconfig file tells kubectl three things: what cluster to talk to, what credentials to present, and what namespace to scope commands to. The template you just filled in points at the instructor's EKS cluster, uses an exec credential plugin to call aws eks get-token (so your AWS ca4 profile becomes your Kubernetes identity), and scopes every command to your namespace. The sed command substitutes your namespace into two places in the template: the namespace: field on the context, and the IAM role ARN in the exec block.
The No resources found message is the success state. The cluster is reachable, your namespace exists, and you have permission to list pods in it. There are zero pods so far because you haven't deployed anything to Kubernetes yet, that's the next section.
You just reproduced everything you built by hand in CA3 with two commands. Look inside terraform/ and ansible/: what exactly did those files abstract away, and what would change if you had to repeat this deploy for a thousand customers instead of one?
The first switchover in the Strangler Fig pattern is the database. Right now bird.ai's Django container on EC2 reads and writes against RDS, and we want to break that dependency before moving any application code. If we left RDS in place and just redeployed Django on Kubernetes, we'd still be paying for and operating a managed database instance sitting outside the cluster, and every new pod would need network access back into the RDS private subnet. Instead, we'll move the data into a postgres pod running in our namespace. After this section, Kubernetes is the only system we need to operate for the rest of the migration.
The provided Ansible playbook handles the whole orchestration for you. It runs pg_dump on the EC2 instance (the only host that can reach RDS, which lives in a private subnet), fetches the dump file back to your laptop, and pipes it into the postgres pod via kubectl exec -i. You never have to SSH anywhere by hand. Before running the playbook, you'll generate secrets and bring up the Kubernetes workloads so there's a postgres pod waiting to receive the data.
Generate two strong secrets. openssl rand -hex 25 produces URL-safe hex output, so neither secret will corrupt the DATABASE_URL you'll wire up in the next section.
$ openssl rand -hex 25
16e0bdfc0c000ea178d8dcecc765349b87bc974d228f0f1a08
$ openssl rand -hex 25
2b6cca7c44543dc98532feb257d2bd88afe4cfe3e9bd0122dbEdit the two TODO fields in k8s/base/secret.yaml, one for POSTGRES_PASSWORD, one for DJANGO_SECRET_KEY.
apiVersion: v1
kind: Secret
metadata:
name: bird-ai-secrets
type: Opaque
stringData:
POSTGRES_PASSWORD: "" # TODO
DJANGO_SECRET_KEY: "" # TODO
S3_ACCESS_KEY: minioadmin
S3_SECRET_KEY: minioadminDeploy the whole production overlay into your namespace. This brings up postgres, minio, the bird.ai app, and the bird.ai proxy all at once.
$ kubectl apply -k k8s/overlays/production/
configmap/bird-ai-config created
secret/bird-ai-secrets created
service/bird-ai-app created
service/bird-ai-proxy created
service/minio created
service/postgresql created
persistentvolumeclaim/minio-data created
persistentvolumeclaim/postgresql-data created
deployment.apps/bird-ai-app created
deployment.apps/bird-ai-proxy created
deployment.apps/minio created
deployment.apps/postgresql created
job.batch/minio-create-bucket createdThat one kubectl apply spun up four distinct services in your namespace. Each is a separate Deployment with its own ClusterIP Service, so pods can reach it by name.
bird-ai-media bucket on first boot./ traffic to bird-ai-app:8000, and serves /media/ paths directly out of minio so Django never has to proxy image downloads.(click to open)
Expect a few things to look broken at first, which is fine for this section:
DATABASE_URL set yet (you'll add it in the next section), so Django fails to connect and the container exits. Kubernetes restarts it a few times, gives up, and marks the pod CrashLoopBackOff.You can see all four pod states with kubectl get pods:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
bird-ai-app-7f9bdd4d7b-k2vxp 0/1 CrashLoopBackOff 3 45s
bird-ai-proxy-74f5f5d6d9-q8hnl 0/1 Running 0 45s
minio-6f4c5b9f98-xw8zt 1/1 Running 0 45s
postgresql-7c9d8f6b84-m2nft 1/1 Running 0 45sWe only need postgres to be Ready for the migration. Wait for it to finish rolling out before continuing:
$ kubectl rollout status deployment/postgresql
deployment "postgresql" successfully rolled outRun the migration playbook. It reads your Terraform outputs to find the RDS instance, SSHes into the EC2 to install the postgresql17 client and run pg_dump, fetches the dump file back to your laptop, and pipes it into the postgres pod via kubectl exec -i.
$ cd ansible
$ ansible-playbook migrate-db.yml
...
TASK [Show migration row count] ********
ok: [localhost] =>
msg: django_migrations has 20 rows
TASK [Show detection row count] ********
ok: [localhost] =>
msg: app_detection has 3 rows (your Phase 0 uploads)If both row counts are non-zero, your migration succeeded. The playbook is idempotent, so you can re-run it safely if you want to reset the Kubernetes postgres to the latest RDS state.
If the playbook fails with "Permission denied (publickey)", it is looking for your SSH key at ../id_ed25519 relative to the ansible/ directory, which is the project-local key you generated in Prerequisites. If your key is somewhere else, override it explicitly:
$ ansible-playbook migrate-db.yml --extra-vars "ssh_key=/absolute/path/to/id_ed25519"Examine ansible/migrate-db.yml. List every step it takes and group them into three phases: (1) things it runs on your laptop, (2) things it runs on the EC2 instance, and (3) things it runs inside the Kubernetes postgres pod. What is the purpose of each phase, and why can't it all run on your laptop?
The data is migrated but bird.ai is still stuck in CrashLoopBackOff. The postgres pod is up, the secrets are in place, the Deployment is rolled out, and yet Django refuses to start. This is the standard shape of a debugging session on Kubernetes: some pods are happy, one pod is not, and the answer is in the logs.
Start with the logs of the most recent crash. --previous asks for the logs of the previously terminated container instead of the current (possibly restarting) one.
$ kubectl logs deploy/bird-ai-app --previous --tail=5
return self._cursor()
^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/django/db/backends/dummy/base.py", line 20, in complain
raise ImproperlyConfigured(
django.core.exceptions.ImproperlyConfigured: settings.DATABASES is improperly configured. Please supply the ENGINE value. Check settings documentation for more details.If kubectl logs returns nothing, your pod has not actually started yet. Try kubectl describe pod -l app=bird-ai-app instead. Describe shows Kubernetes' view of the pod: scheduling events, image pull status, probe results, and the exit code of the last container. Logs tell you what the container said, describe tells you what Kubernetes thinks happened to the pod.
Django's settings.DATABASES is unconfigured. In bird.ai, settings.py builds the database connection from the DATABASE_URL environment variable, and that variable is empty. You'll set it in the ConfigMap and roll the Deployment to pick up the new value.
A ConfigMap is a Kubernetes object that holds non-confidential configuration as key/value pairs. Pods consume ConfigMaps in one of three ways: as environment variables (via envFrom), as command-line arguments, or as files mounted into a volume. The bird-ai-app Deployment uses envFrom, so every key in the bird-ai-config ConfigMap is injected as an env var when the pod starts. Anything Django reads from os.environ, including DATABASE_URL, comes from there.
A Secret has the same shape, but is intended for sensitive values like passwords, keys, and tokens. That is why POSTGRES_PASSWORD and DJANGO_SECRET_KEY live in secret.yaml instead of configmap.yaml, even though Django reads them from os.environ the same way. URLs go in ConfigMaps, passwords go in Secrets.
Edit k8s/base/configmap.yaml and fill in DATABASE_URL.
A Postgres connection URL has this shape:
postgres://USER:PASSWORD@HOST:PORT/DATABASE
For your in-cluster postgres:
bird_ai, set by POSTGRES_USER in k8s/base/postgres.yaml.POSTGRES_PASSWORD value you pasted into k8s/base/secret.yaml in the previous section.postgresql. Every Kubernetes Service in your namespace is reachable by the service's name as a DNS hostname, so postgresql resolves to whichever pod the Service currently routes to.5432, the default Postgres port exposed by the postgres Service.bird_ai, set by POSTGRES_DB in k8s/base/postgres.yaml.Apply the updated ConfigMap and restart the app Deployment so it picks up the new environment.
$ kubectl apply -k k8s/overlays/production/
$ kubectl rollout restart deployment/bird-ai-appA plain kubectl apply updates the ConfigMap in the cluster, but it does not restart the pods that mount it. You have to trigger the rollout yourself. Why do you think that is? What other software have we used in this course that has a similar "edit the config, then tell the service to reload it" workflow?
Wait for the rollout to finish, then confirm all four pods are healthy.
$ kubectl rollout status deployment/bird-ai-app
Waiting for deployment "bird-ai-app" rollout to finish: 0 of 1 updated replicas are available...
deployment "bird-ai-app" successfully rolled out
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
bird-ai-app-95dcbf7c7-tx6vb 1/1 Running 0 36s
bird-ai-proxy-99bc55596-h2s8v 1/1 Running 0 73m
minio-6589b7c8f4-dvcxn 1/1 Running 0 73m
postgresql-b854bbf7d-2xnvg 1/1 Running 0 73mTry your Kubernetes-powered bird.ai locally by port-forwarding the proxy Service to your laptop.
$ kubectl port-forward svc/bird-ai-proxy 8080:80Visit http://localhost:8080 in your browser. Log in with the user you created back in the first section when bird.ai was still running on EC2 and RDS. Your history is intact. The Strangler Fig just ate the database layer, and you did not lose anything.
Your bird.ai deployment is working end-to-end on Kubernetes, but it is only reachable through kubectl port-forward on your laptop. Nobody else can load http://localhost:8080 in their browser. In this section you will expose the app at a real public URL (for example, https://jhenry.ca4.cs351.cloud for a student whose namespace is student-jhenry) through the NGINX Ingress controller running on the cluster, verify the whole stack works against the new hostname, and then tear down the EC2 and RDS infrastructure you provisioned in the first section. This is the last switchover of the Strangler Fig pattern. After it, the N-Tier architecture is gone.
An Ingress resource declares routing rules from an external URL to a Service inside your namespace. Creating the Ingress by itself does not route any traffic, it is just a declaration stored in etcd. A separate component called an Ingress Controller watches the cluster for Ingress resources and implements their routing rules. The cs351 cluster runs the NGINX Ingress Controller, deployed once by the instructor and shared across every student namespace.
DNS for ca4.cs351.cloud is handled with a single wildcard: a *.ca4.cs351.cloud A record that points at a shared AWS Network Load Balancer in front of the NGINX Ingress Controller. Every subdomain under ca4.cs351.cloud resolves to the same NLB. The controller then routes requests to your Service by inspecting the incoming Host: header and matching it against each Ingress object's rules[*].host. You do not register your subdomain anywhere. As long as your Ingress resource declares host: jhenry.ca4.cs351.cloud, requests to that hostname find your Service automatically.
TLS is handled the same way. cert-manager running on the cluster issued one wildcard certificate for *.ca4.cs351.cloud via a DNS-01 challenge, and the Ingress Controller terminates HTTPS with that cert for every student subdomain. Nothing to configure on your end.
Edit k8s/base/ingress.yaml and replace SUBDOMAIN with your namespace minus the student- prefix. For example, if your namespace is student-jhenry, use jhenry.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: bird-ai-ingress
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: "20m"
spec:
ingressClassName: nginx
rules:
- host: SUBDOMAIN.ca4.cs351.cloud
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: bird-ai-proxy
port:
number: 80Uncomment # - ingress.yaml in k8s/base/kustomization.yaml so the Ingress resource is actually applied when you run the overlay.
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- app.yaml
- proxy.yaml
- postgres.yaml
- minio.yaml
- minio-bucket-job.yaml
- configmap.yaml
- secret.yaml
# TODO uncomment after editing ingress.yaml
# - ingress.yamlKustomize is a template-free Kubernetes config tool built into kubectl. A kustomization.yaml file lists a set of resource manifests under resources:, and kubectl apply -k applies them as a group. On top of that base you can stack overlays that patch specific fields (image tags, replica counts, environment variables) per environment, which is how the starter code keeps overlays/local/ (minikube) and overlays/production/ (EKS) distinct without duplicating every base manifest.
The ingress.yaml entry starts commented out so the overlay you applied in the database migration section works without requiring you to also fill in an Ingress resource. Now that your app is healthy and you have a subdomain, it is time to flip that entry on.
Edit k8s/base/configmap.yaml and set CSRF_TRUSTED_ORIGINS and DOMAIN to your new public URL. CSRF_TRUSTED_ORIGINS needs the full URL including the https:// scheme, while DOMAIN is just the hostname.
apiVersion: v1
kind: ConfigMap
metadata:
name: bird-ai-config
data:
# Example: https://jhenry.ca4.cs351.cloud
CSRF_TRUSTED_ORIGINS: "http://localhost:8080,http://127.0.0.1:8080"
# Example: jhenry.ca4.cs351.cloud
DOMAIN: ""Apply the updated overlay and restart the app so it picks up the new CSRF_TRUSTED_ORIGINS and DOMAIN values.
$ kubectl apply -k k8s/overlays/production/
$ kubectl rollout restart deployment/bird-ai-appDo not skip the rollout restart. A plain kubectl apply -k updates the ConfigMap in the cluster, but the running bird-ai-app pods captured the old environment variables when they started, and environment variables are frozen for the lifetime of a process. If you visit your HTTPS URL before restarting, the login page will render fine, but submitting the form will return 403 Forbidden (CSRF verification failed) because Django's running process still thinks the only trusted origin is http://localhost:8080 from the previous section. This is the same "edit config, then tell the service to reload it" rule from the Reflect prompt in the previous section, and it is the single most common footgun in the assignment.
Visit https://jhenry.ca4.cs351.cloud (substituting your own subdomain) in your browser. The app should load with a valid HTTPS certificate and show your data. Log in, browse your history, and upload a geotagged image. After a few seconds, check the map page. Your new upload should appear there, along with every other student's most recent geotagged detections. The map is backed by a shared gossip service that collects detection metadata from every student's namespace, so you are seeing the class's combined map of the world's birds and squirrels.
Once you have verified the Kubernetes app is fully working, destroy the original EC2 and RDS infrastructure.
terraform destroy will permanently delete your RDS database. Before running this, verify that your Kubernetes app is fully working: log in, browse your history, upload a new image. If you need to start over, you will NOT be able to recover the original data, you will have to re-run the first section of this assignment from scratch.
$ cd terraform
$ terraform destroyLook back at the three switchovers you just performed: moving postgres into the cluster, moving the app runtime into the cluster, and moving public access onto the shared Ingress. Which one was the most irreversible? If you had to do this migration on a live customer-facing production system, which switchover would you most want a rollback plan for, and why?
You have one monolith left to break apart. Right now the bird-ai-app pod handles HTTP, Django views, Postgres, MinIO uploads, AND runs the YOLO image-segmentation model in-process to detect birds and squirrels. The first four are cheap, stateless work. The fifth needs gigabytes of memory, benefits from specialized hardware, and has a completely different scaling profile from the rest of the app. In this section you will split the YOLO inference out of the Django monolith into its own dedicated ml-service microservice, running in its own pod, reachable from bird-ai-app by an in-cluster Service URL.
There are lots of reasons to extract a service from a monolith: team autonomy, independent deploy cycles, language or framework diversity, fault isolation. The reason that matters here is resource profile. Django's request and response handlers are lightweight: a few megabytes of Python per request, I/O bound, scaled by adding small replicas. YOLO inference is the opposite: gigabytes of model weights resident in memory per pod, CPU or GPU bound, scaled by adding larger but fewer replicas on instance types sized for ML work.
Packing both into the same pod forces a bad compromise. Every bird-ai-app pod carries the model weights in memory whether it is serving an upload or not, the pod's memory limit has to be sized for the worst case, and the instance type you pick has to fit ML work even though most of what the pod does is plain HTTP. Splitting them means you can scale the Django pods horizontally on cheap general-purpose nodes, and scale the ml-service pods vertically (or put them on a completely different node pool) without touching the rest of the app.
Review the provided files in ml-service/.
app.py: a complete Flask /predict endpoint that loads YOLO, runs inference on an uploaded image, and returns JSON. You do not need to modify this.Dockerfile: provided. Builds the ML service image with the model weights baked in.requirements.txt: Python dependencies.yoloe-26l-seg.pt and mobileclip2_b.ts: the model weights.That is the whole service. Four files, one endpoint. That is the "micro" in microservice.
Log in to the instructor ECR registry, build the image for linux/amd64, and push it with your subdomain as the tag. Substitute your own subdomain for jhenry in the commands below.
$ aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 949430458732.dkr.ecr.us-east-1.amazonaws.com
$ docker build --platform linux/amd64 -t 949430458732.dkr.ecr.us-east-1.amazonaws.com/cs351-ml-service:jhenry ml-service/
$ docker push 949430458732.dkr.ecr.us-east-1.amazonaws.com/cs351-ml-service:jhenryThe build step downloads the PyTorch base image and copies the model weights into the image, so it can take a few minutes on a first run. While it runs, you can move on to step 3.
The ECR registry you are pushing to lives in the instructor's AWS account (949430458732), not your student account. The cs351-ml-service repository has a cross-account repository policy that allows push and pull from any AWS principal, which is why your ca4 IAM user can push here without any extra setup in your own account. Everyone pushing to the same repo under different tags is how we keep the grader's ECR lookup simple: it just looks for an image tagged with your subdomain.
Create k8s/base/ml-service.yaml. The fastest way is to copy k8s/base/app.yaml as a starting template and edit the fields below. Both files declare a Deployment and a Service, so the overall shape is the same, only the values change.
$ cp k8s/base/app.yaml k8s/base/ml-service.yaml| # | Field | Old value | New value |
|---|---|---|---|
| 1 | metadata.name (Deployment) | bird-ai-app | ml-service |
| 2 | spec.selector.matchLabels.app | bird-ai-app | ml-service |
| 3 | spec.template.metadata.labels.app | bird-ai-app | ml-service |
| 4 | Container name | bird-ai-app | ml-service |
| 5 | Container image | cs351-bird-ai-app:latest | 949430458732.dkr.ecr.us-east-1.amazonaws.com/cs351-ml-service:jhenry |
| 6 | Container ports[0].containerPort | 8000 | 5000 |
| 7 | Readiness probe path | /login/ | /health |
| 8 | Readiness probe port | 8000 | 5000 |
| 9 | envFrom block | configmap + secret refs | delete the whole block |
| 10 | env block (if present) | any values | delete the whole block |
| 11 | metadata.name (Service) | bird-ai-app | ml-service |
| 12 | Service spec.selector.app | bird-ai-app | ml-service |
| 13 | Service port | 8000 | 5000 |
| 14 | Service targetPort | 8000 | 5000 |
Keep the strategy: { type: Recreate } and the resource requests and limits unchanged. Your namespace's ResourceQuota cannot fit two ml-service pods at once, so a rolling update would get stuck waiting for quota that will never free up. Recreate tells Kubernetes to delete the old pod first and only then start the new one.
A Kubernetes manifest is a YAML file that declares the desired state of one or more Kubernetes objects. The file you just copied contains two objects separated by a --- document marker: a Deployment (which manages a ReplicaSet that manages pods) and a Service (which gives those pods a stable DNS name inside the cluster). The Deployment is the thing that keeps your pod running and handles rollouts when you change the manifest. The Service is the thing that routes traffic to whichever pod the Deployment currently owns. You need both even for a single-replica service, because the pod's IP changes every time it restarts but the Service's name stays constant.
Add ml-service.yaml to the resources list in k8s/base/kustomization.yaml so the overlay picks it up.
Edit k8s/base/configmap.yaml and set ML_SERVICE_URL to the in-cluster address of your new Service.
# Set ML_SERVICE_URL to the in-cluster address of your ML
# microservice. Leave empty to run inference locally in the
# Django pod (the monolith default).
#
# Example: http://ml-service:5000
ML_SERVICE_URL: ""Apply the overlay and restart the app so it picks up the new ML_SERVICE_URL environment variable. You know the drill by now.
$ kubectl apply -k k8s/overlays/production/
$ kubectl rollout restart deployment/bird-ai-appVerify the split works end-to-end. Visit https://jhenry.ca4.cs351.cloud (substituting your own subdomain), log in, and upload an image of a bird or squirrel. Then tail the logs of the ml-service pod to confirm the inference is actually running there and not in the Django process.
$ kubectl logs deploy/ml-service --tail=20
* Running on http://0.0.0.0:5000
127.0.0.1 - - [...] "GET /health HTTP/1.1" 200 -
0: 352x640 1 bird, 1639.1ms
127.0.0.1 - - [...] "POST /predict HTTP/1.1" 200 -The 0: 352x640 1 bird, 1639.1ms line is YOLO's per-request inference log. If you see it, your bird.ai deployment is now genuinely microservice-shaped: HTTP and application logic in one pod, inference in another, talking over an in-cluster Service.
(click to open)
You migrated bird.ai from an N-Tier monolith on EC2 and RDS to a microservice architecture on Kubernetes with a shared Ingress, a self-managed database, a self-managed object store, and a dedicated inference service. Every piece of the Strangler Fig pattern you read about in the first section is now something you have done by hand.
How did Cloud Assignment 4 go? Let us know on Ed.
CA4's cost story has two halves. Your bird.ai workload after the database migration runs inside a shared EKS cluster managed by the course staff, and every pod, Service, Ingress, and load balancer in that cluster is free to you. What you do pay for is the EC2 instance and RDS database you stood up in the first section, and only while they are running. Because you built both with Terraform, the total cost of the assignment is really the sum of the hours you leave those two resources standing.
Rates for the resources you provision in your own account (us-east-1, on-demand):
| Resource | Type | Approx Rate |
|---|---|---|
| EC2 instance | c7i-flex.large | ~$0.0725/hr |
| RDS instance | db.t3.micro PostgreSQL | ~$0.018/hr |
| EBS root volume | 15 GB gp3 | ~$1.20/mo |
| RDS storage | 20 GB gp2 | ~$2.30/mo |
ECR push to cs351-ml-service | instructor's repository | Free |
EC2 is billed per second and RDS is billed per hour while they exist. terraform destroy deletes them and billing stops. terraform apply brings them back and billing resumes. Storage is billed per GB-month and is deleted along with the instance.
If your AWS account is new, both instance types above are covered by the current AWS Free Tier credit pool, so your effective cost is zero until that pool is exhausted. The estimates below assume you are OFF the free tier and are therefore conservative upper bounds.
| Scenario | Hours billed | Cost |
|---|---|---|
| Complete the first four sections in one sitting (~6 hrs) | 6 | ~$0.50 |
| Spread across a week, destroy between each session | ~10 | ~$0.80 |
| Spread across a week, never destroy | ~168 | ~$13 |
| Forget about it for a month | ~720 | ~$56 |
Rows two and three represent the same amount of actual work. The difference is idle time. Leaving the EC2 and RDS running overnight costs roughly fifteen times more than destroying them at the end of each session. terraform destroy takes under three minutes and drops your hourly bill back to zero until the next terraform apply.
Once you have completed Public DNS and Teardown, your ongoing cost for CA4 is $0. Everything after that runs in the shared cluster.
The cs351 EKS cluster has real running costs: the EKS control plane, two to four worker nodes, the NLB fronting the cluster, a wildcard TLS certificate, the hosted zone for ca4.cs351.cloud, and the ECR repositories that hold the bird.ai and ml-service images. Those are paid by the instructor and not passed on to you. A rough comparison against running the same workload yourself:
| Approach | Monthly cost |
|---|---|
Your own EKS cluster + 1 t3.medium node + NLB | ~$115 |
k3s on a single t3.large EC2 + NLB | ~$76 |
| ECS Fargate + ALB | ~$60 |
Your slice of the shared cs351 cluster | ~$2–3 |
The shared cluster is roughly twenty to fifty times cheaper than the cheapest self-hosted option. The fixed costs (control plane, NLB, certificate, hosted zone) are amortized across every student using the cluster, so each of you pays a small fraction of what you would pay alone. The same amortization is how managed Kubernetes services (EKS Auto Mode, GKE Autopilot) and application platforms (Fly.io, Railway, Render) charge per-tenant prices an order of magnitude below the self-hosted equivalent.
You pay back those savings by accepting less control over Kubernetes version, node instance types, upgrade timing, and which add-ons are installed. That is the core tradeoff of every platform-as-a-service: you give up some control in exchange for not having to operate (or pay for) the platform itself.
Once you have a full score on Gradescope, tear down everything you provisioned in your own AWS account. Your grade is already recorded, and leaving resources running is the only way CA4 can cost you real money. Double-check both the EC2 console and the RDS console to confirm they are empty.
Destroy the N-Tier infrastructure if you did not already do it in Public DNS and Teardown. This deletes the EC2 instance, the RDS database, the security groups, the key pair, and the storage volumes Terraform created with them.
$ export AWS_PROFILE=ca4
$ cd terraform
$ terraform destroyClean up your Kubernetes namespace. Delete every resource the production overlay applied so your namespace is empty and no pods are running.
$ export KUBECONFIG=~/.kube/eks-student.yaml
$ kubectl delete -k k8s/overlays/production/The namespace itself is managed by the course staff and will be removed after the semester ends, so you do not need to delete it yourself.
Verify nothing is running in your own account. Both of these commands should return empty output.
$ aws ec2 describe-instances --filters "Name=instance-state-name,Values=running" --query 'Reservations[].Instances[].InstanceId' --output text
$ aws rds describe-db-instances --query 'DBInstances[].DBInstanceIdentifier' --output textIf either command prints an identifier, find what is left and destroy it, either with terraform destroy (if Terraform owns it) or by hand in the AWS console.
Optional: remove the ca4 IAM user. If you do not plan to reuse this access key for a future assignment, delete the user and its access key in the IAM console. This is good security hygiene: a leaked access key in the future cannot be used against your account if the user no longer exists.
Your CS351 course budget alert from the first section will email you if anything slips through. If you ever see a budget alert after this teardown, come back to step 3 and figure out what is still running.