← Back to Home

All Weekly Reports

16 reports · 34,333 words · ~172 min total

Week 1

1,133 words · 6 min read

Summary

January 12 - January 18

Meetings

Accomplishments

Next Steps

Analyzing course use of Gradescope’s Autograder

During the spring term of 2025, Patryk Tomalak developed four cloud assignments that utilized a Gradescope feature called “Autograder”. By packaging resources in a zip file, instructors can create a custom grading experience for any assignment. The autograding experience for the cloud assignments were designed as “mastery learning” exercises encouraging students to solve assignments step-by-step as they worked towards a perfect score. The autograder scripts run on each student submission, providing feedback by awarding points or displaying helpful error messages. Students have an unlimited number of submission attempts during the assignment time period, and assignments were self-contained. That is, apart from registering for an Amazon Web Services (AWS) account, every step required to pass an assignment was documented within the assignment PDF itself, and successful completion of an assignment did not rely on the result of previous assignments.

I will take a moment to analyze the approach taken by Patryk in developing these assignments. Assignment tasks revolved around provisioning AWS resources - virtual machines, cloud functions - and using orchestration features, both open-source and proprietary to the AWS platform. The autograder scripts relied on an AWS access key to review a student’s AWS account, the student themselves would create a “CS351-autograder” user with read-only access, to which they associate and store an AWS access key for later submission to Gradescope. The scripts also initiated key-based SSH access to any virtual machine by means of a private key packaged in the grader zip file, for which the corresponding public key was shared with students in each assignment. These access mechanisms allowed instructors to query two sources of information: First, the student’s account with a cloud provider, which contains the data on how they are using the platform, and second, the resources provisioned using the platform, enabling comprehensive auto-verification of an assignment’s requirements.

The grader zip files followed a common directory structure:

grader.zip
├── config
├── id_rsa
├── requirements.txt
├── run_autograder
├── run_tests.py
├── setup.sh
└── tests
    └── test_aws.py

Several Python libraries power the unit tests:

A typical unit test will verify the autograder has access to the student’s AWS account, that the “CS351-autograder” user exists, and that an AWS budget was created to detect free-tier overruns, before listing any expected resources, like an EC2 instance, and connecting to or asserting against them. The stdout of the executed unit tests are parsed by another Gradescope utility “JSONTestRunner” and stored in a file “results.json”. The contents determine what the student will see in their assignment submission and their final score.

Creating a cloud assignment

Visit https://www.gradescope.com. Log in and visit course “CS 35100-CLC”.

Click “Assignments” on the left menu.

https://www.gradescope.com/courses/1214735
Left menu link to assignments page

Click “Create Assignment” at the bottom right.

https://www.gradescope.com/courses/1214735/assignments
Assignment page create assignment

Select the “Programming Assignment” type then click “Next”.

https://www.gradescope.com/courses/1214735/assignments/new
Create assignment select assignment type

Fill in the displayed fields. You must specify the total points for the assignments, the release date, and the due date. Last session all cloud assignments were worth 100 points. Then click “Create Assignment”.

https://www.gradescope.com/courses/1214735/assignments/new
Create assignment select assignment type

You have created a new assignment and are now able to upload a zip file containing the autograder files that will run when students submit.

Let’s start designing a simple cloud assignment. Our goal for this assignment is to verify students have correctly configured our autograder’s access to their AWS account, and that they are able to spin up an EC2 instance the autograder can SSH into. This will help instructional staff feel confident all students are prepared for the course assignments.

Remember the directory structure for our autograder setup, and keep the documentation page as reference.

grader.zip
├── config
├── id_rsa # we'll change to id_ed25519
├── requirements.txt
├── run_autograder
├── run_tests.py
├── setup.sh
└── tests
    └── test_aws.py

The config file should contain our AWS SDK settings.

config
[default]
region = "us-east-1"
output = "json"

Next we’ll generate a new private key, this time a shorter ed25519 key, using ssh-keygen. Do not set a passphrase. Keep the public key at hand, we’ll include it in the assignment document.

Terminal
$ ssh-keygen -o
Generating public/private ed25519 key pair.
Enter file in which to save the key (~/.ssh/id_ed25519): ./id_ed25519
Enter passphrase for "./id_ed25519" (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in ./id_ed25519
Your public key has been saved in ./id_ed25519.pub
The key fingerprint is:
SHA256:npsK/8qXY9TFTsDJEbxcAxqwsWfNenrfg2dkReiVYbA user@hostname.local
The key's randomart image is:
+--[ED25519 256]--+
|      o..++= ..+o|
|       + =* o +.o|
|      o +.o= E o |
|       o .o + . .|
|        S..+   . |
|       ..+. . o  |
|    .  .+..  +   |
|     +  =+ ...+  |
|      =*+.  .o.. |
+----[SHA256]-----+

Our python script will rely on libraries defined in requirements.txt

requirements.txt
gradescope-utils>=0.3.1
botocore
boto3
paramiko

and setup.sh initializes the Ubuntu container created for our autograder assignment.

setup.sh
#!/usr/bin/env bash
apt-get install -y python3 python3-pip python3-dev

pip3 install -r /autograder/source/requirements.txt

mkdir ~/.aws/
cp /autograder/source/config ~/.aws/config
cp /autograder/source/id_ed25519 $PWD

run_autograder will run on each submission,

run_autograder
#!/usr/bin/env bash

# Copy credentials to the AWS credentials directory
cp /autograder/submission/credentials ~/.aws/credentials

cd /autograder/source

# Run the Python script
python3 run_tests.py

passing control to Python through run_tests.py.

run_tests.py
import unittest
from gradescope_utils.autograder_utils.json_test_runner import JSONTestRunner

if __name__ == '__main__':
  suite = unittest.defaultTestLoader.discover('tests')
  with open('/autograder/results/results.json', 'w') as f:
      JSONTestRunner(visibility='visible', stream=f).run(suite)

Everything we have defined so far will remain much the same for each cloud_assignment. It is the simple scaffolding enabling our method of grading assignments.

Now, let’s get to the meat of the autograder setup: the unit tests. This will vary by assignment, and will represent most of the work of creating cloud assignments. All our unit test files will be in a directory tests/. For now, all unit test functions will be in a single file, test_aws.py.

Let’s create a minimal test_aws.py file.

test_aws.py
import unittest
import boto3 # AWS SDK
import paramiko # SSH client
from gradescope_utils.autograder_utils.decorators import weight, number

class TestAWS(unittest.TestCase):
  @weight(1)
  @number(1)
  def test_case(self):
      self.fail("TODO")

Finally, we’ll zip everything up.

Terminal
$ tree .
.
└── grader
   ├── config
   ├── id_ed25519
   ├── id_ed25519.pub
   ├── requirements.txt
   ├── run_autograder
   ├── run_tests.py
   ├── setup.sh
   └── tests
       └── test_aws.py
$ (cd grader; zip -r ../grader.zip *)
adding: config (stored 0%)
adding: id_ed25519 (deflated 33%)
adding: id_ed25519.pub (deflated 4%)
adding: requirements.txt (deflated 4%)
adding: run_autograder (deflated 33%)
adding: run_tests.py (deflated 36%)
adding: setup.sh (deflated 36%)
adding: tests/ (stored 0%)
adding: tests/test_aws.py (deflated 33%)
$ zipinfo grader.zip
Archive:  grader.zip
Zip file size: 2532 bytes, number of entries: 9
-rw-r--r--  3.0 unx       43 tx stor 26-Jan-18 17:59 config
-rw-------  3.0 unx      444 tx defN 26-Jan-18 18:00 id_ed25519
-rw-r--r--  3.0 unx      123 tx defN 26-Jan-18 18:00 id_ed25519.pub
-rw-r--r--  3.0 unx       49 tx defN 26-Jan-18 19:48 requirements.txt
-rw-r--r--  3.0 unx      200 tx defN 26-Jan-18 19:49 run_autograder
-rw-r--r--  3.0 unx      307 tx defN 26-Jan-18 19:49 run_tests.py
-rw-r--r--  3.0 unx      221 tx defN 26-Jan-18 19:49 setup.sh
drwxr-xr-x  3.0 unx        0 bx stor 26-Jan-18 19:56 tests/
-rw-r--r--  3.0 unx      257 tx defN 26-Jan-18 19:56 tests/test_aws.py
9 files, 1644 bytes uncompressed, 1152 bytes compressed:  29.9%

It’s time to upload our autograder definition, and run a test to see if its working.

Navigate to the “CA0” assignment we created and click “Select Autograder”. Upload the zip file we just created.

https://www.gradescope.com/courses/1214735/assignments/7494302/configure_autograder
autograder configuration page for cloud assignment 0

Then click “Update Autograder”. This will start a build of the Docker image, and the build log, along with any errors, will be displayed below the update button for debugging purposes.

Now click “Test Autograder”.

https://www.gradescope.com/courses/1214735/assignments/7494302/configure_autograder
autograder configuration page for cloud assignment 0

You’ll be asked to upload a submission file. Our current autograder script doesn’t care what you upload, so upload a blank file for now. Students will eventually submit a file containing the AWS access key credentials for the autograder to access their AWS account.

Click “Upload” and wait for the autograder to finish. The result will show a failure, like we expected, and include debugging information from the run.

https://www.gradescope.com/courses/1214735/assignments/7494302/submissions/381402520#
autograder result summary displaying a failure

This upload and test flow will be how we test cloud assignments before release.

Let’s take a look at part of the final test_aws.py file for Cloud Assignment 0.

test_aws.py
import unittest
import boto3
import paramiko
from gradescope_utils.autograder_utils.decorators import weight, number

class TestAWS(unittest.TestCase):
  _cache = {}

  @property
  def account_id(self):
      """
      Returns the AWS account id for the current AWS user
      """
      if 'identity' not in self._cache:
          self._cache['identity'] = boto3.client('sts').get_caller_identity()
      return self._cache['identity']['Account']

  ...

  @property
  def ec2_instances(self):
      """
      Returns a list of all ec2 instances in the account
      """
      if 'ec2_instances' not in self._cache:
          self._cache['ec2_instances'] = (
              boto3.client('ec2').describe_instances(Filters=[
                  {'Name': 'instance-state-name', 'Values': ['running']}
              ])
          )
      return [
          instance
          for res in self._cache['ec2_instances']['Reservations']
          for instance in res['Instances']
      ]

  ...

  @weight(15)
  @number(1.1)
  def test_aws_account_access(self):
      """
      Autograding - can connect with AWS?
      """
      if not any(char.isalpha() for char in self.account_id):
          print(f'AWS Account ID: {self.account_id}')
      else:
          self.fail('Accessing the account failed, go back to the cloud assignment handout and try following the instructions carefully')

  @weight(15)
  @number(1.2)
  def test_aws_autograder_user(self):
      """
      Autograding - required user 'CS351-autograder' exists?
      """
      if any(user['UserName'] == 'CS351-autograder' for user in self.iam_users):
          print('CS351-autograder detected in users')
      else:
          self.fail('unable to find CS351-autograder in IAM users')

  ...

  @weight(15)
  @number(3.1)
  def test_ec2_instance_exists(self):
      """
      EC2 - Does the correct instance exist?
      """
      num_instances = len(self.ec2_instances)
      if num_instances != 1:
          self.fail(f'Did not find the expected number of EC2 instances running (found {num_instances}, expected 1). Remember, use the us-east-1 region and have exactly one instance running for this lab.')

      instance = self.ec2_instances[0]
      tags = instance.get('Tags', [])
      if not any(tag['Key'] == 'Name' and tag['Value'] == 'ca0' for tag in tags):
          self.fail('EC2 instance must be named ca0')

      if instance['InstanceType'] != 't2.micro':
          self.fail('EC2 instance must be a t2.micro')

      volume_sizes = [volume['Size'] for volume in self.ebs_volumes]
      if not all(size < 30 for size in volume_sizes):
          self.fail('You exceeded the free EBS volume size of 30GB.')

      platform_details = instance.get('PlatformDetails')
      if not platform_details or 'Linux' not in platform_details:
          self.fail(f'Expected EC2 instance to be running Linux. (found "{platform_details}")')

  ...

  @weight(25)
  @number(3.2)
  def test_ec2_accessible_by_ssh(self):
      """
      EC2 - can connect to instance over SSH?
      """
      instance = self.ec2_instances[0]
      public_ip = instance.get('PublicIpAddress')
      if not public_ip:
          self.fail('Unable to find public ip address for EC2 instance')

      keyname = instance.get('KeyName')
      if not keyname:
          self.fail('Unable to find key pair attached to EC2 instance. Make sure to attach your generated secret key when creating the VM')

      pkey = paramiko.Ed25519Key.from_private_key_file('./id_ed25519')
      ssh = paramiko.SSHClient()
      ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
      ssh.connect(hostname=public_ip, username='ec2-user', pkey=pkey, timeout=2)
      stdin, stdout, stderr  = ssh.exec_command('uname -a')

      if not stdout.read().decode('utf-8'):
          self.fail('Unable to connect to EC2 instance over SSH.')

      print('Successfully connected to EC2 instance over SSH')

There are two autograder details to note: First, the title in the UI for a particular question is determined by the docstring in the unit test function. Second, all weights for each unit test must add to the total number of points for the assignment.

Week 2

5,662 words · 29 min read

Summary

January 19 - January 25

Meetings

Accomplishments

Reflections on Cloud Assignment 0

Students encountered several problems while working through the assignment, I’ll summarize those problems shared through Ed.

Invalid credentials file format

There were three issues involving the credentials file.

  1. The credentials file values should not have had quotes around the stored values. This was a problem with the assignment handout, which contained quotes around the placeholder values and misled students.
  2. The credentials file must not have a file extension. Several students had a file extension at the end of the filename, such as “.txt”, which caused our grader script to not find it. In a minority of cases a filename will not carry a file extension, by convention, and students assuming the common case would have been encountered this issue, as most GUI file editors will add file extensions to files. This problem is exacerbated by a common operating system practice, hiding file extensions in a file explorer, which occurs in both MacOS and Windows by default.
  3. The credentials file must not be “UTF-8 (with BOM)” encoded or a non-plain text format. In a striking demonstration of cloud computing’s pervasiveness, a student created a credential file in Google Docs, downloaded it as a .txt on their own computer, and then uploaded it to Gradescope as a submission. The file was “UTF-8 (with BOM)” encoded, which the AWS SDK was unable to read. Another student had the same issue with file encodings when using VSCode. Interestingly, one student tried uploading a PDF as a credentials file.

Let’s discuss how to remediate these problems.

(1) can be fixed by testing the cloud assignment with more people ahead of time, to catch any typos. Often the person writing the assignment holds different assumptions than those working through it for the first time. In my case, I had my old credentials file lying around, which I re-used, and which didn’t contain quotes, rather than making a new one during testing.

(2) can be fixed by having our grader script load any file with “credentials” in the name. Postel’s law would serve us well: “be conservative in what you send, be liberal in what you accept.”

(3) can be detected by the grader script which can convert it from one encoding to another. Worse issues, like uploading a PDF, can be avoided by improving assignment instructions.

AWS Free Tier Changes

The AWS Free Tier changed on July 15, 2025, last summer after the previous course section ended. I will discuss the specific changes in a later section, here the relevant detail is that the EC2 instance types eligible for free tier billing changed from t2.micro and t3.micro to t3.micro, t3.small, t4g.micro, t4g.small, c7i-flex.large, m7i-flex.large.

My account was created before July 15, 2025, so I was on a different free tier than students in the current section. For this reason, I did not realize the free tier instances had changed, and I developed my grader script to verify the constraints of the old free tier. It took me several deployments and back and forth with the students to have the new grading script working.

This exposed several problems: (1) the free tier assumptions inherited from past assignments no longer hold, requiring us to reconsider our approach to developing assignments. (2) the develop-debug-deploy loop for Gradescope’s autograder is relatively slow (3) we don’t have an automated testing framework for our grader scripts, forcing manual testing and (4) Ed, while quite good, is not the ideal for back and forth debugging with students, I’m forced to ask for screenshots and for them to try things and get back to me.

Let’s momentarily discuss some steps we could take to address these problems.

(1) is a problem that has become an opportunity. The new free tier is much more flexible, and opens up new opportunities for assignments. Instead of the free tier dictating the architecture for our cloud assignments, our architectural choices have been freed, and now the free tier simply dictates the scale. More on this later.

(2) The develop-debug-deploy loop for the autograder script can be sped up. Gradescope allows for two deployment options: Manually uploading a zip file, our current aproach, which requires Gradescope to build a Docker image on our behalf based on its contents. Or we can register a link to an image in a container registry we control. We simply build and deploy our image, and the autograder will pull it on each container run. This gives us more flexibility with how we engineer our Docker container, and many more opportunities to improve the automation of and speed at which we can deploy our grader scripts.

(3) Our scripts are manually tested at the moment. Automated testing would improve our confidence in their results, and help prevent any regressions as we update and develop the assignments. Additionally, it would help future sections of the course as new teaching assistants inherit maintenance of these assignments. However, it is challenging to develop integration tests that includ third-party components, particularly infrastructure. I think we can take two approaches, of which I favor the latter: First, determine if its feasible to mock API call results to make sure our test assertions are appropriate, which requires us to fake what AWS API responses will look like. And second, we could develop ansible and terraform scripts that deploy solutions on our own infrastructure, figure out some way to trigger an autograder grader either locally or automatically, and perform an end-to-end integration test at development time, not deploy time.

(4) Ed is a great platform for asynchronous communication, but debugging with students is more effective synchronously, and when they can share what they’re looking at. Currently they’re restricted by how fast they can type and how many screenshots they can share. Perhaps we should consider an office hour for cloud assignment issues, or a weekly zoom meeting that students can drop into to share their screen and have a quick discussion.

Delayed Mail Delivery for Purdue Outlook

One student had an issue Thursday (1/22) evening. When you register for an AWS account, they send a verification code to the provided email that has an expiration time of 10 minutes. That email was consistently delivered to the student’s inbox more than 10 minutes later, impeding them from verifying their account. The email verification step is required if students are to create their own AWS accounts. That night I sent my self an email from my personal email account to my purdue account and it took 4 hours to get delivered, and it was to my junk email folder, where I had to report it as non-spam. The next morning I sent another test email, and it was delivered within a minute. Unfortunately, this has exposed that we are at the mercy of Purdue’s e-mail infrastructure when students are setting up their AWS accounts and working on the cloud assignments.

A possible fix for this issue touches on a prevailing conversation about administering this course: to grant students regular IAM accounts registered under an instructional staff-controlled AWS account. Thus, students would receive a simple username and password login at the start of the course, without email verification, and on first login could be easily required to register a proper MFA device for subsequent logins. I will discuss course administration approaches later in this report.

Sharing privileged credentials on Ed

In a private Ed post, a student posted a screenshot of their AWS credentials with me. Ed uploads files to their content delivery network (CDN), which does not perform authentication for performance reasons and access reasons. I informed them of the risk they are taken, removed the image from their message, and advised them to rotate their credentials. In future assignments I will emphasize the privileged nature of their AWS credentials, they risk they assume if they are irresponsible guarding them, common mistakes to avoid such as committing them to source control, how to evaluate whether or not to put certain data on cloud providers, and useful way to store and secure secrets for their personal projects.

Projected calendar of the course

Let’s take a look

  Week    Dates    Textbook    Topics    Assignments    Exams  
11/12-1/18Ch. 1,2The Motivations for Cloud, Elastic Computing and its advantages
21/19-1/25Ch. 3Types of Clouds and Cloud ProvidersCA0 (1/20-1/23)
31/26-2/1Ch. 4,5Data Center Infrastructure and Equipment, Virtual MachinesCA1 (1/27-2/6)
42/2-2/8Ch. 6Containers
52/9-2/15Ch. 7Virtual NetworksCA2 (2/10-2/20)
62/16-2/22Ch. 8Virtual StorageMidterm 1, Feb. 17
72/23-3/1Ch. 9AutomationCA3 (2/24-3/6)
83/2-3/8Ch. 10Orchestration: Automated Replication and Parallelism
93/9-3/15Ch. 11The MapReduce ParadigmCA4 (3/10-3/20)
103/16-3/22Ch. 12Microservices
123/23-3/29Ch. 13,14Controller-based Management Software, Serverless Computing and Event ProcessingCA5 (3/24-4/3)
133/30-4/5Ch. 15DevOpsMidterm 2, Mar. 31
144/6-4/12Ch. 16Edge Computing and IIoT
154/13-4/19Ch. 17Cloud Security and Privacy
164/20-4/26Ch. 18Controlling the Complexity of Cloud-Native Systems

I’ll note there are 1-2 weeks of space to delay a cloud assignment. I expect to release Cloud Assignment 1 on January 27th, and Cloud Assignment 2 on February 10th.

Reflections on assignments in Dan Marinescu’s Cloud Computing: Theory and Practice

Dan Marinescu’s Cloud Computing textbook is very good, I’m enjoying reading it so far. I’m through the first two chapters, and have read the two appendices, which are the most relevant to the cloud assignments for CS 351. I’ll discuss the appendices and several of their suggestions in this section.

Appendix A: Cloud projects

Appendix A discusses possible cloud projects for students to complete. They are as follows:

  1. Cloud simulation of a distributed trust algorithm
  2. A trust management service
  3. Simulation of traffic management in a smart city
  4. A cloud service for adaptive data streaming
  5. Optimal FPGA synthesis
  6. Tensor network contraction on AWS
  7. A simulation study of machine-learning scalability
  8. Cloud-based task alert application
  9. Cloud-based health-monitoring application

I’ll break the suspense — I think “A cloud service for adaptive data streaming” or “A simulation study of machine-learning scalability” are possible choices for an assignment.

“A cloud service for adaptive data streaming” is a project to find the optimal architecture for adaptive data streaming problems. Consider adaptive audio streaming, which is a multiobjective optimization problem. From the text, “We wish to convert the highest quality audio file stored on the cloud to a resolution corresponding to the rate that can be sustained by the available bandwidth; at the same time, we wish to minimize the cost on the cloud site and also minimize the buffer requirements for the mobile device to accommodate the transmission jitter. Finally, we wish to reduce to a minimum the start-up time for the content delivery.” The performance of a solution depends on resource constraints: available CPU cycles, buffer space on the sender and receiver, and network bandwidth.

“A simulation study of machine-learning scalability” is based on work done control a video game, StarCraft, which will likely engage students who are fans of video games in general or the game itself. Graduate students were asked to build a convolutional neural network (CNN) to predict the computational effort required to build a deep neural network (DNN), and then (1) build a dataset by running a scenario 20,000 times, (2) train the model to predict a “best” action, then (3) rerun game scenarios using the new predicted best action.

I chose these assignments because they are relevant to current industry trends: streaming multimedia to clients, and training AI/ML algorithms. They are also easily benchmarked, allow us to evaluate student’s implementations by comparing their performance against each other. This produces a rank of solutions by performance, with which we can confer extra credit to the best performing solutions, incentivizing students to be creative and go above and beyond.

The textbook contains the cloud architecture they used to implement some of these projects projects, but the assignments would have to be adapted for the skill level of our students. Additionally, these projects are outside the scope of my expertise, and I would require support from other instructional staff.

Now to discuss the remaining assignments. Project 1 is a distributed computation where students would be expected to implement the algorithm. Its a simulation of a cloud architecture, not an implementation of the cloud architecture. It’s both outside the scope, and in a way trivial, for this course. Project 2 uses an algorithm to assess what nodes in a cluster or network are malicious. I believe it would use too many resources to implement, or require adaptation to multiple containers on a node, while not being relevant to most industry work at the moment. Project 3 is quite fun, but more an exercise in object-oriented simulation. Project 5 uses a third-party tool for electrical and computer engineers. Project 6 is a condensed matter physics simulation that is algorithmically focused, and whose cloud principles, that is choosing the right instance for the right job, is better expressed in another manner. Project 8 uses a considerable number of AWS services to develop a web application, whose architecture does not fit in the free tier, whose effort exceeds 10 days, and whose principles can be taught more simply. Project 9 is an IoT data-streaming application, which would require us to provide or simulate IoT devices to stream data to student servers.

Appendix B: Cloud application development

Appendix B is an introduction to application development on AWS. It’s useful as a tutorial and exposes useful features, but suffers from some issues: (1) it’s specific to a single cloud vendor, (2) language examples are in C# and Java neither of which were used in the course last semester, and (3) the information is out of date.

Instructional Infrastructure

My experience developing CA0, the issues students have had working on the assignment, and my curiosity with regard to student engagement with the cloud assignments, have spurred questions about our “instructional infrastructure”.

In my formulation, instructional infrastructure is the processes and rules used when producing content for the course, and the mechanisms that enable feedback and review of that work. Let’s consider cloud assignment 0. I took a series of steps to (1) develop the assignment, through consideration of current textbook content and discussion with the instructional staff, (2) test the assignment, by producing an autograder script and manually testing it, (3) publishing the assignment, by writing the assignment using a web framework and exporting the resulting web page as a PDF shared through brightspace, and (4) received feedback about the assignment, by observing Ed discussion, answering questions, and tracking student progress using Gradescope’s log of grading results.

I believe the most valuable practice we can adopt this class semester is a culture of assessment and reflection. We can improve our ability to assess student experience and engagement with assignments by collecting more and higher quality data, and we can tighten feedback looks with regard to that data through automation and clear processes. I will discuss some possible steps to enrich these factors.

Currently, we get information about student engagement with the assignment through Ed discussion, which is working fine. Using Gradescope submission statistics, we can see student progress on an assignment, but Gradescope only shares the data of their final submission attempt. Finally, we have the capability to inspect a student’s AWS account using the autograder, but only at the time that they perform an assignment submission, and we don’t take advantage of its full potential. We also do not get any signal from how they experiment with AWS between submissions.

Here I’ll propose some ideas to tackle problems (1) granular tracking of assignment progress and (2) evaluation of student AWS use.

Consider the autograder: Gradescope invokes a docker container on each student submission. The container contains our grading script, which runs arbitrary code, and is where we have defined the point values of assignment “questions” corresponding to particular assertions we make about the state of a student’s AWS account and derived resources, which add up to the students final grade. Now, to work towards (1), we can place monitoring code in the grading container that records the time of submission, a snapshot of the state in a student’s AWS account, and the point total of that particular submission, and store that information in a database under instructor control. We’ll then be able visualize and calculate statistics related to student engagement: submissions per assignment, speed of completion, activity over time, and much more. We can use this information to assess cloud assignments and learning outcomes in order to improve future assignments. It would require setting up cloud infrastructure under management of the instruction staff to collect, store, and display this data.

With regard to (2), we have inverted our control as instructional staff. Student’s grant us permission to review their solutions, on their terms. Thus, to get more information about a student’s use of their account, we must ask permission or use indirect means of access, currently, the “credentials” files uploaded during assignment submission (also note that this method places privileged student data — an access token with full administrative access to their personal AWS account tied to their credit card — on Gradescope’s servers, subject to their security controls).

Let’s consider how to address this problem. Our goal is two-fold, in fact, we are desperately trying to achieve them today, but our means are insufficient. Those two goals are (1) how do we control student resource use such that they do not incur charges for completing course assignments, and (2) how can we understand and enable student cloud use so we can assess and meet learning outcomes.

I suggest we create a class AWS account under the control of instructional staff. At the start of the course, we can create IAM users for each student based on their career account username, and assign a temporary password for their first login. After logging in, they can change their password and configure an MFA device, which we can enforce as account administrators. For each new cloud assignment, we can create a permission group that grants student access to only those resources needed to complete the assignment. This lets us control whether a student has permission to create an EC2 instance or a Lambda function, however, it is limited in that the security group will not limit the number of EC2 instances a student can spin up. So, we still have the problem of a student spinning up Bitcoin miners on our dime.

To address that particular issue, we can register an AWS EventBridge rule that triggers a Lambda function function every time an ec2 instance is launched. It would track student instance use in a DynamoDB cluster, and if a student goes over the alloted resource limit for a particular assignment, the function could restrict EC2 service access temporarily and kill any excess instances, notifying us if necessary. Events are not limited to EC2 instance spinups, we can actually monitor billing in real-time, letting us track course spend granularly during the semester.

By creating a method by which we are account administrators with full control over student resource access, we now have the appropriate permission level to monitor student resource usage holistically, and the data by which to evaluate that usage.

The dollars and cents of cloud assignments

Two of our previous discussion topics require us to rephrase cloud assignment proposals in financial terms. The changes to AWS’s Free Tier limits total cloud usage for the course to under $100 per student (possibly $200), and the suggestion to bring students under an instructor-controlled account makes cloud assignments a departmental budget item. Here I’ll suggest a framework for how to price a cloud assignment.

A cloud assignment uses several AWS resources, is available during the time period defined by its release and due date, and is expected to be worked on by all students enrolled in the class. Using this information, we’ll define a simple formula to evaluate the cost of an EC2-based assignment.

(Assignment Length in days) * (EC2 Instance cost / hr) * (Number of instances used to complete assignment) * (Number of Students) = Maximum Total cost of assignment

We can use this formula to estimate funding for the class. The formula assumes the worst case of max utilization per student for the whole assignment period, aka, the “class of bitcoin miners” scenario. We don’t have the data for a more accurate estimate. We can now architect assignments from two directions: what cloud principles would we like them to learn and how many resources will they have access to learn those principles? The synthesis of these two approaches clearly defines possible system architectures students can implement.

Note: EC2 instance cost depends on instance type and chosen operating system. Amazon Linux is the most affordable option, Ubuntu demands a 4% premium, RHEL %28, SUSE 66%, and Windows a whopping 103%. These premiums can change depending on the instance type. An instance type’s base price is a ratio between allotted type of CPU chip, number of vCPU, GiB of memory, network bandwidth, block storage bandwidth, and hypervisor scheduling algorithm. AWS has different hypervisor scheduling formulas to assign hardware resources capacity to each virtual machines, the two available to free tier instances are “burst” and “flex”.

Let’s take as an example the traditional three-tier web application system architecture containing 1 proxy server, 3 application servers, and 1 database server. Allow me to digress, to develop more deeply the details needed to produce both an appropriate estimate of the cost of the cloud assignment, and an appropriate architecture reflecting real-world considerations.

An application server will need a general purpose CPU. The Proxy server should be memory and/or network optimized, it wants to cache a lot of stuff, hold connections open, and route requests quickly. The database needs an I/O optimized instance with plenty of block storage bandwidth, whose memory size will depend on the data its storing.

Let’s examine the properties of free tier instances:

TypeCPUPrice/hrvCPUMemoryNetworkBlock StorageHypervisor Schedule
t3.microIntel Xeon Platinum 8000 1st/2nd Gen$0.010421GiB5 Gbps2.8 GbpsBurst
t3.smallIntel Xeon Platinum 8000 1st/2nd Gen$0.020822 GiB5 Gbps2.8 GbpsBurst
t4g.microAWS Graviton 2$0.008421 GiB5 Gbps2.8 GbpsBurst
t4g.smallAWS Graviton 2$0.016822 GiB5 Gbps2.8 GbpsBurst
c7i-flex.largeIntel Xeon 4th Gen$0.0847924Gib12.5Gbps10GbpsFlex
m7i-flex.largeIntel Xeon 4th Gen$0.0957628GiB12.5Gbps10GbpsFlex

AWS Graviton 2 are proprietary chips (latest is gen. 5) and the t4g family is “burstable”. “Burstable” is AWS’s answer to the noisy neighbor and bin packing problems of virtual machines. VMs compete for resources — you can’t respond to an HTTP request on a cloud server if another VM is using the phsyical interface you’re requesting, i.e. the network interface card — so hypervisors control a virtual machine’s access to resources, AWS has a proprietary hypervisor “Nitro” they use for most of their EC2 instances.

t3 and t4g instances are heavily throttled by the hypervisor, they are only guaranteed a 10%-20% baseline performance per vCPU. A VM earns “CPU Credits” when idle, which are paid to the hypervisor to “burst” and get more resource access. Those resources could be CPU time, network bandwidth, or disk bandwidth. t3 instances are Intel Xeon Platinum chips, while t4g are AWS Graviton CPUs based on the ARM “Neoverse” design.

c7i and m7i are Intel Xeon chips, they has half the maximum cores of the ARM chips, but 4x more cache and a 25% faster clock rate. Databases perform better on these chips. m7i are “memory optimized” with a 4:1 ratio of memory to vCPU, twice as much as the c7 family. Both are “flex” scheduled, which guarantee a more generous 40% baseline performance.

Returning to our 3-tier architecture, the database server can use the c7i. Assignment datasets are small so the DB doesn’t need much ram, but it should be a non-burstable instance because it will serve multiple clients, we want it to be consistently fast. Flex is the next best free option. The proxy will use the m7i instance type, it will need to hold open a lot of concurrent connections and copy a lot of data from memory, the extra memory will directly contribute to scaling the service for more active users.

The application servers will be performing a variety of jobs, all with different execution patterns. Application servers are also often written in a single threaded language. Javascript runs serially using an event loop for concurrency. Python has a “Global Interpreter Lock” preventing parallelism. They cannot take advantage of more vCPU easily. Using affordable, general purpose instances with a low vCPU count that are easy to recreate and fail independently is a good strategy. The t3 and t4g instances types fit the bill, t3 are x86 and t4g are ARM.

Now we can plan out how much a cloud assignment will cost to implement a 3-tier architecture using three t4g instances, a c7i-flex.large, and a m7i-flex.large.

(77) * (3 * .0168 + .08479 + .09576) * (10 * 24) = $4267.956 ($55.428/student)

If we use t4g.micro instead, it’s $3802.26 ($49.38/student). Remember, this is at max utilization for the whole assignment period.

I hope this approach to pricing assignments is clear. In fact, I’ll suggest implementing this particular architecture as a good goal for cloud assignment 2. The AWS free tier change has caused short-term problems, but inspired long-term possibilities.

Development of Cloud Assignment 1

Currently covered chapters in the textbook are:

This assignment will incorporate elastic computing concepts, use technologies likely introduced in previous coursework (web servers, databases), and be an introduction to deploying an application on infrastructure-as-a-service. It is not too difficult, but is significant enough to merit a 2 week assignment period (10 days). It will address the same topics and use the same resourcs as Patryk’s past cloud assignment, but to my taste.

The previous Cloud assignment had students spin up an EC2 VM, create an init.sql file with a very simple schema and some example values, and load it into a sqlite db using a Dockerfile. It also had students create a http webserver using any approach, although nginx was recommended.

I like the idea of a student building a web application and “shipping” it on cloud infrastructure, but let’s have them deploy a realistic application.

Using the Django web framework and SQLite, students will implement a simple user interface powered by the Compiler Explorer API. It will have a text area where a user can type in Python code and click a button to display resulting Python bytecode. There will be another button to click, that will run the code in a container environment on their VM, and display the result. The students will learn how to use an application framework (Django+SQLite), and solve little web application problems along the way. This will introduce students to making a Software-as-a-Service application using an Infrastructure-as-a-Service provider.

The students will be graded on:

  1. Navigating AWS
  2. Bootstrapping a VM
  3. Running a Python application server
  4. Configuring a database with an application server
  5. Writing an HTML user interface
  6. Writing an MVC controller
  7. Writing an MVC model
  8. Executing code in a container
  9. Querying SQLite statistics
  10. Creating a REST endpoint
  11. Benchmark Score (Extra Credit)

By the end of the assignment, they will have installed dependencies on the server, created an web application project, implemented application functionality using the framework, examined database query statistics, become familiar with REST principles, and have deployed an application to the cloud.

The benchmark score is meant to incentivize students to be creative and go above-and-beyond. We can load test their final application and create a leaderboard of every student. It should be a separate Gradescope assignment/submission, so if the load test crashes their VM, it doesn’t slow them down getting a 100% on every other section. The top 5 students can get extra credit, but only if they stand up in class and share the optimizations they used to get the highest score. For example, they could put a proxy server written in a compiled language to improve cache performance, or tune OS settings to improve network performance and CPU utilization. There is the issue of only “burst” and “flex” instances being available, so performance is highly dependent on the hypervisor scheduling algorithm, but this is meant to be fun. The autograder can check they are not using more than one instance of the allowed type to enforce fairness. I can release it on the Tuesday before the Friday it’s due, 7 days in, so I can have more time to implement the leaderboard, and so students that finish the assignment early get another task to learn from and challenge them.

The assignment should be completed on a single EC2 instance, so we as instructors can be confident the first same is firmly in the free tier.

The new AWS Free Tier

Previously, particular services had time-based restrictions on usage, as well as by usage type, for example, “750hrs of t3.small per month” for EC2, which lasted for 12 months. Now, new accounts receive $100 in credits (rather than time) that can only be spent on certain resources and within particular usage limits. These credits last for a shorter time than before, 6 months, and only by using up the $100 credits will your account be credited an additional $100 before the free tier ends. This is AWS’s way to encourage people to experiment with a variety of services shortly after creating their account.

Accounts created before the free tier changes remain on the old plan.

EC2

Free tier is limited to instance types: t3.micro t3.small t4g.micro t4g.small c7i-flex.large m7i-flex.large

EBS

Free tier use is capped at 30 GB of storage, 2 million I/Os, and 1 GB of snapshot storage with Amazon Elastic Block Store (EBS).

RDS

Free tier choices are db.t3.micro and db.t4g.micro instances and 4 engines: MySQL, PostgreSQL, MariaDB, and Microsoft SQL Server

Lambda

Capped at 1,000,000 free requests per month. Up to 400,000 GB-seconds or 3.2 million seconds of compute time per month

S3

No usage limits on the free tier.

SageMaker AI and Amazon Bedrock

These are AI focused services also in the free tier.

Additional Services

There are many additional services with a free tier allotment, but they are not commonly used.

Ideation for future cloud assignments

Cloud Assignment 2

I will propose the n-tier architecture project for cloud assignment two. The textbook chapters covered during assignment 2’s release period are:

We can benchmark students implementations by performing a load test. We can choose a problem such that we submit data to the student cluster they have to process, and then we can query that data somehow.

Cloud Assignment 3

Will be about automation and orchestration. Should introduce Ansible, Terraform, and Kubernetes.

Professor Adams also wants it to be about networking and storage too. I could do a Kubernetes dive into these things. Also emphasize firewall settings (See Table B.1 in Dan Marinescu’s Cloud Computing, Appendix B). From Appendix A, project “A.4 A cloud service for adaptive data streaming” could be a good one here, or assignment 2. ████ and ████ could also work on implementing some of these algorithms while I work on the surrounding infrastructure and create the write up. They could do “A.7 A simulation study of machine-learning scalability” together.

Cloud Assignment 4

Probably something to do with the RCAC visit.

Cloud Assignment 5

Should be the FaaS project and implementation

Approach to developing Cloud Assignment 1

Goal: Students will develop their own SaaS on top of IaaS, becoming familiar with EC2 virtual machines.

Application: Detect a bird in a photo.

Constraints: 1 t4g.small EC2 VM. No other AWS services.

Total Possible Cost: Using our above formula, the Max Cost per student for this assignment is (10 * 24) * (0.0168) = $4.032.

Rationale for Cloud Assignment 1

The assignment was designed to meet four goals.

  1. To demonstrate the breathtaking progress in cloud services and intelligent computing.
  2. To provide students experience with a realistic application they can learn from and experiment with.
  3. To establish foundational application-level knowledge that can be expanded on to teach system-level decision-making.
  4. To choose an application design that has implications beyond computing and into ethical and privacy concerns.

Inspired by the XKCD comic “Tasks”, I wanted to demonstrate to students that what we’re able to do today was unthinkable 10 years ago. We can run a complicated image recognition task that takes a natural language input and transforms that not just into a classification of an image but the generation of a bounding box and a coloring of the object shape itself. All of this on the cheapest available virtual machine on AWS, and in less than a second. Unbelievable.

I also wanted students to have experience with a realistic and common type of application they will encounter in industry. Having them deploy one of the most commonly used web frameworks using the basic tools you would find on a POSIX-compatible system introduces struggle and manual grunt work, the recent memory of which will motivates the adoption of cloud-focused tooling like containers and automation. For example, struggling to install Python helps you realize how useful it is to be able to reference a specific version of Python in a Dockerfile.

Another point is that a realistic application can be scaled according to systematic, non-contrived principles. Slightly modifying the requirements of the application can necessitate major modifications to the system architecture. The bird image recognition application can be taken different directions with regard to scale and requirements without compromising the conceptual integrity of the service. This allows students to have a stable reference point for system design decisions, and recognize more easily the subtleties of change and its implications. For example, if students were asked to implement a history feature, we would need sufficient storage space for uploaded images which motivates the introduction of S3. Or if they were asked to scale the service to accommodate more load, we could introduce a N-tier web architecture with multiple EC2 instaces to serve as the proxy, application, and database servers.

Finally, image recognition is a ethically charged technology. Today, human rights abuses are enabled by malicious application of face detection and other image-based machine learning algorithms. Discrimination and bias can easily intrude on in naive image recognition applications. We’ve started with an innocuous software service, bird recognition, but we could raise the stakes easily by having students strip geolocation data from EXIF metadata embedded in uploaded images, bringing up questions of user privacy. Or the “Bird.ai” startup introduced in Assignment 1 could raise VC-funding contingent on implementing a feature where the background of a bird photo is analyzed to mine features for another ML algorithms, raising questions data sovereignty. Students today are the future professionals that must grapple with these technologically driven issues.

I hope this assignment serves a good foundation for following course assignments, and may provide interesting ideas for the development of coursework in future sections of the course.

Week 3

2,376 words · 12 min read

Summary

January 26 - February 1

Meetings

Accomplishments

Building an autograder script

I built a docker test harness. It speeds up development as I don’t need to deploy to Autograder to test the autograding script. Still need to publish as .zip because we don’t have a class container image registry.

Autograder has a custom test harness. I’m trying to hook into it to distinguish between test failures from assertions and test failures from the test harness being wrong. I want students to get a message “TestHarness failed: Contact Ed” so they don’t waste time thinking they did something wrong, and I get notified quickly I need to update the autograder script. It’s tricky with Gradescope’s setup though.

I made a choice with the autograder harness that if a non-AssertionError occurs (that is, something is wrong with how the test is coded rather than with what the student implemented) it will display a message and a stack trace to the student, prompting them to share it with a TA (me) on Ed. This is the best I can do for Cloud Assignment 1, because we don’t have any infrastructure for storing and responding to errors. I can build that later.

Cloud assignment 2

Cloud assignment 2 will consist of the implementation of two features, and the expansion of system architecture to accommodate those features.

The motivation for the concepts I’m introducing in Cloud Assignment 2 is summarized by the following blurb: “In Cloud Assignment 1, bird.ai was building their MVP. Now, they’ve raised a seed round and are ready to get more customers”

The first feature is a history of submitted images and the resulting classification. This introduces a need to store user-provided photos and keep track of submissions per-user. Disk space on a server is limited (especially on the AWS free tier), and students will be walked through why we transition data off disk into object storage, in this case, S3. By carefully managing on-disk data usage, we are able to scale the history feature to many more users and image submissions without greatly expanding the storage capabilities of the servers themselves, saving money and improving service resilience.

Now that images are stored in AWS S3, students will have to track submission metadata and content in order to recover and display that information to the user through a user interface. To that end, they will have to implement a simple Django data model that will power the history feature experience, introducing them to the “Model” in “Model-View-Controller”, highlighting how system architecture decisions must be tied to application-level implementations.

It’s not enough that images are stored in S3. Content-delivery networks, or CDNs, are an important user experience and cost-saving measure. Users are sensitive to latency, and delivering static images is not something either Python or S3 is exceptional at. Any image requests from either of those resources have to travel to the origin server in a specific AWS Region, latency grows for users further away than that region. Additionally, pulling data out of S3 is expensive in terms of data transfer. We will use these motivations to introduce AWS Cloudfront as a CDN server in front of the S3 bucket, reducing latency by pushing static assets to the edge of the AWS global network. AWS Cloudfront is both lower latency and lower cost, reducing pressure on origin servers, and improving key metrics of the user experience. Cloudfront is lower latency by taking advantage of 100+ “Points of Presence”, or small regional data centers, around the globe, running software that is optimized for serving media. Cloudfront is 10x lower cost than S3 due to different trade-offs made in its architecture and to encourage customers to reduce load on S3 servers.

The second feature I would like students to implement is geo-location related to image submissions (Note: this may get pushed to the third assignment depending on assignment 2’s length). Images have EXIF metadata that, among other parameters, records the longitude and latitude where a photo was taken. Students will be asked to strip the metadata from each submitted photo and store it in the database, using that data to populate a map showing where every photo was taken. This feature is meant to demonstrate to students how much information you are sharing with cloud software vendors when you use their services. There is a possibility to grab the IP of the user at time of submission and perform IP-based geo-location, but it can be difficult to find a IP to location database with a permissive licensing model. Students will learn how to query PostgreSQL using its geospatial capabilities to power the feature, and a mapping service will be hosted, introducing multi-node service architectures. Again, this is a stretch goal and will likely be introduced in Cloud Assignment 3.

The final application architecture will consist of an AWS RDS instance running PostgreSQL instead of SQLite. It introduces different types of relational databases, and gives an opportunity to contrast SQLite’s in-memory database model with PostgreSQL’s client-server model. PostgreSQL is commonly used in industry, and has a vast open-source ecosystem, which students can take advantage of for self-learning. Students will deploy their updated application onto an EC2 instance, and configure the application to connect to the AWS-managed database, giving them valuable DevOps skills. Students will also have to write a Dockerfile for their Django application, so they codify concepts and commands learned in Cloud Assignment 1 and put them to work containerizing the application.

This is the nuts and bolts of software development, and students’ mental-model of software will be challenging by pushing this assignment to production.

It’s important to touch on what was not introduced in this assignment. Ansible has been ommitted, so students gain more experience manually configuring servers. Infrastructure-as-Code has not made an entrance either, students will gain more familiarity with the AWS console by setting up AWS Cloudfront and S3 manually in this assignment. Orchestration has not been introduced yet either, we are slowly ramping up the complexity of our application, creating a correspondence with how software grows in the real-world, organically, through communication with users, with changes being motivated by technical or product needs.

There is an opportunity to disucss virtual networking in Linux by how Docker sets up container networking, which will be mentioned in the assignment as well. Students will be encouraged to explore the network configuration on the EC2 instance, and to answer questions about what changes the Docker daemon makes when running containers. Virtual storage will also be mentioned, in the context of Docker Volumes, EBS, and S3.

Cloud Assignment 3

Cloud Assignment 3 coincides with two important chapters in the textbook: Automation and Orchestration. These practices are important when scaling a software service beyond a single server into multiple supporting resources.

This assignment will introduce an N-tier web architecture for the bird.ai SaaS. All previous features will continue to be supported, but focus will be place on a system architecture meant for scaling. Part of the deliverables for the assignment will be producing an architecture that can handle a sustained load test from the autograder. Another focus will be placed on the instance types chosen for each component of this service. Students are presented with a wide array of choices from EC2, how will they make decisions about what to use? By connecting how a software operates to the hardware that best supports that operation is an important principle to be aware of, and will help students become informed consumers of cloud services.

Let’s discuss what an N-tier architecture looks like. First web application servers are meant to be scaled horizontally. They are typically stateless, and handle a variety of tasks. They are best run on general-purpose instances. Multiple application servers must be load balanced, and a proxy server will be introduced, requiring a memory-optimized instance to support caching and a higher number of concurrent connections. Finally, application servers will connect to a shared database, which should be run on hardware that is IO-optimized and whose architecture has been well-tuned for that type of workload. All these pieces expand a students perspective from an application as a single process on a single machine, to a suite of processes across multiple machines that work together to achieve more than they could separately. It also forces them to inspect and understand the software they are running, so they can choose the right platform to run it on.

Having more than one server introduces challenges that will motivate automation-based practices to initialize servers and deploy applications. This assignment will introduce Ansible, and students will have to write playbooks for common tasks. The assignment will also introduce Terraform, an common tool in industry, which will be useful for them to manage the complexity of this assignment with less risk of incurring costs due to unused resources remaining provisioned. The load test performed against their architecture comes with risks, crashing EC2 instances or web servers. Their automation scripts will allow them to get back to where they were before quickly, with the threat of a crash being additional motivation to have automation already defined for a software service.

I had initially thought this assignment would be a good opportunity to introduce Kubernetes. I believe that will wait until Microservices have been introduced, during the time period for Cloud Assignment 4. It will already be a lot to introduce multi-node architectures and different automation practices to students. Kubernetes is such a big topic (auto-scaling, control-planes, scheduling, software-defined networking, etc.) it merits a more focused assignment.

Calendar Update

I’ve updated the course calendar to account for Spring Break and to give a week break to students after each midterm. The projected course calendar is now on this project’s home page.

Publishing HTML to Brightspace

I’ve developed a script that converts assignment assets to a single file that is easy to publish to Brightspace. It inlines styles, fonts, and images into the HTML document itself, making it completely self-contained for rendering by a web browser.

I’ve also added Grace to my repository holding my senior project files, so she can stay on top of progress and have a point of reference if she has questions about how assignments are developed or tested.

Markdown Editor

Grace will be taking over the class after I’ve graduate. She uses a markdown editor for writing, it will be ideal if I could integrate a WYSIWYG editor with Astro so she could edit assignments more easily, and I could turn the publishing workflow into a GUI rather than TUI interface.

Possible approaches:

CloudBank

During a meeting with Grace, Justin Gillingham popped in. He asked us how we were managing the AWS free tier, and I explained our current approach to cloud assignments and outlined how were were planning resource utilization across the course to make sure students were staying with the free tier. He specifically asked what we were doing for students who had already exhausted their free tier. I told him of the two students who had raised the issue already, and how I told them it would be best if they could get a credit card from a parent who likely hadn’t used the free tier before, and in particular for this first assignment, it could be completely quickly with only a few cents in charges if they were ok with that.

He then mentioned there is a NSF program called “CloudBank” which provides credits for different cloud platforms to researchers. He suggested we write a proposal and try to get credits through the program, which would grant us an AWS account pre-filled with credits we would then use for instruction. We would have to figure out a way to manage cloud resources so students don’t consume too much.

Justin also let me know there is a graduate section of the course that Douglas Comer is teaching right now, he offered to put me in touch with the GTA. I’m curious what assignments they have planned for the semester.

Gradescope’s Autograder Runtime

Gradescope’s Autograder feature relies on executing instructor-created grader scripts in a Docker container to produce the resulting grade for a student’s cloud assignment submission.

The typical use case is to run and verify a student’s code submission, perhaps by running unit tests in a sandboxed environment. Our usage is different, there’s nothing to sandbox, we reach out of the environment, into the students cloud infrastructure. A student submission is not code at all, it’s a permission slip, sharing AWS credentials that grant read-only access to their account.

The container that runs on every student submission may be defined in two ways: as a zip file, whose upload triggers a rebuild of the autograder container for that assignment, and as a URL pointing to an image in a container repository. That image in the container repository is pulled everytime a student submits their assignment, and I’ll be exploring it in the next few paragraphs.

Any autograder container has to derive from Gradescope’s base image gradescope/autograder-base. Currently, the base image is built from Ubuntu 22.04 for x86 and has not been updated in 2 years. Its image layers contain test harness logic and an SSH configuration for monitoring its execution. A metadata file and the student submission is mounted into the container at runtime and the container fetches the latest test harness code from Gradescope’s S3 bucket in us-west-2 before executing the “run_autograder” script provided by an instructor. The default python installation is version 3.10.12 (released Jun 2023). A final quirk, it uses dumb-init instead as PID 1 in the container instead of the more typical tini.

Our Elastic Container Registry must be accessible by Gradescope’s autograder. That’s only possible as a public repository, which students could theoretically find. The only other options are to host the images using GitHub or DockerHub. I have reached out to Gradescope support about this issue, who responded and informed me they have notified the technical team. I am awaiting a response.

After some more research, there may be a way to circumvent this issue. Gradescope’s own infrastructure is hosted on AWS, and I’m able to glean their AWS Account ID by the default configuration of autograder containers. I may be able to grant access to the private ECR repository with only that ID. I will experiment with that this week.

Week 4

3,486 words · 18 min read

Summary

February 2 - February 8

Meetings

Accomplishments

Calendar Update

Midterm 1 will be delayed by a week.

Autograder ECR access

I’ve been creating a new Cloud Assignment, which means I need to write a new autograder script to package in a Gradescope container.

Gradescope has two methods of uploading an autograder container. The first is to package the grader script in a zip file and upload it through the user interface. This is manual, repetitive, slow. The second method is to register a container repository following the Open Container Initiative (OCI) distribution specification that you publish an assignment image to. I’ve decided to take this approach, its easier to manage and quicker to update.

Elastic Container Registry (ECR) is Amazon’s service to manage container images. I’ve created a private repository to store each all assignment autograder images. When I first registered this repository with Gradescope, autograder runs were not able to pull container images from the private repo, they didn’t have access permissions.

I then noticed that the default image repository in the Gradescope user interface was filled with a URL.

405699249069.dkr.ecr.us-west-2.amazonaws.com/production-autograders-0042:us-prod-docker_image-570142

That’s the image Gradescope built for us when we uploaded a zip file to the user interface. Sitting in their company infrastructure. Which means I now know their Account ID. That I’ll use to grant them and only them access to the private ECR repository.

And it worked. This policy let them pull the autograder image.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowCrossAccountPull",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::405699249069:root"
      },
      "Action": [
        "ecr:GetDownloadUrlForLayer",
        "ecr:BatchGetImage",
        "ecr:BatchCheckLayerAvailability"
      ]
    }
  ]
}

Instructional Site

I’ve created an instructional website to help run the course. It performs various tasks, the most important of which is to track cloud assignment grading requests from students.

https://cs351.couetil.com
Instructional Site Home page

There is a monitoring script in the autograder containers. On each run, it will collect submission metadata and the result of a grading attempt. We can use the assignment data to uniquely identify a student using their email address.

https://cs351.couetil.com/submissions/
Instructional Site Submissions page

The submission metadata also provides their Purdue ID.

https://cs351.couetil.com/students/1/
Instructional Site Student Detail Page

The monitoring script collects their credentials file. The site will associate the credentials file with a user and keep track of if any changes between assignments. Storing the users credentials gives us another capability.

First, we can create a snapshot of their AWS account and write it to a JSON file during the test run. We use it in the test script, and send it back to the instructional site in order to record exactly the data that gave a student a particular score on Gradescope. If a student ever has a question about why they got a certain score on a Cloud Assignment, we’ll be able to tell them exactly.

https://cs351.couetil.com/submissions/8/
Instructional Site Submission page

Let’s look closer at that snapshot. Remember, I said there was a second capability we’ve gained by storing a student’s credentials for each submission. Why don’t we go to the “Take Snapshot” feature and select a student.

https://cs351.couetil.com/snapshots/take/?student=1
Instructional Site Take Snapshot page

Click!

https://cs351.couetil.com/snapshots/6/
Instructional Site Snapshot detail page

Now we can look at a student’s AWS Account configuration whenever we want. If they raise an issue on Ed, we can quickly help them debug by taking a snapshot of their AWS Account and using the information to reduce how much back and forth we do through Ed messages.

All these features are gated behind a login page. There is a public create account page, but any new user has to be approved by an existing user before they can log in.

https://cs351.couetil.com/accounts/login/?next=/
Instructional Site Login Failed page

The fact that an autograder container can send submission files to our instructional site means it must also have a public API endpoint. I’ve implemented basic assignment management capabilities: an API token is generated per-assignment granting access to the API and linking a student submission to the particular assignment.

https://cs351.couetil.com/assignments/1/
Instructional Site Assignment page

To log onto the instructional website, create an account and send me a message, I’ll approve you ASAP.

Instructional Site Infrastructure

The Instructional Site lives in my AWS account, and I’m managing it the same way I’m teaching students to manage cloud projects in these assignments.

All resource use is defined using Terraform. Application deploys and EC2 instance initialize is performed with Ansible.

The site’s resources are in us-east-1. Keeping this site the same AWS region we have students perform their assignments in reduces data transfer fees. I’ve enabled CloudWatch alarms for projected monthly billing that exceeds $20 and $50 dollars. All resources created for instructional infrastructure carry a “Project” tag I can use for billing attribution.

The projected monthly cost for this application architecture is ~30/month.

The infrastructure is meant to be managed through several commands packaged as scripts:

The database server is a db.t3.micro that allows TCP connections to port 5432 from the EC2 instance hosting the Django server. Backups are only kept for 1 day to reduce cost. A private subnet and an availability zone is shared between application and database to reduce data transfer charges and disallow ingress traffic from the public internet.

The database minimally configured, let’s discuss some future changes.

  1. Add deletion protection on the database. At the moment, a terraform destroy will cause data loss.
  2. Extend the backup time window, which will increase monthly costs.
  3. Increase database size, it is currently at 1GB of RAM and 20GB of disk storage.

I’ve implemented CloudWatch alarms that will alert me at connor@couetil.com if any of the following conditions are triggered:

  1. CPU > 80% (sustained over 10 min)
  2. Freeable memory < 100 MB (sustained over 10 min)
  3. Free storage < 2GB (single 5-min check)
  4. Connections > 50

The application server is a t4g.small with 2 vCPU, 2 GiB RAM, and Amazon Linux 2023 as OS. It’s configured with a systemd unit definition that starts a the application container image. The django application is configured to use an AWS RDS connection with values injected into the systemd definition from terraform state at deploy time. The application is deployed to the EC2 instance using an Ansible playbook.

I’ve added a CloudWatch EC2 status check for failure and the server has been hardened with fail2ban to restrict SSH access attempts and with dnf-automatic to automate software package updates.

The django application will have an default super user “admin” configured. I’ve stored the password in my password manager.

SSL termination is performed by an AWS Cloudfront distribution using the EC2 instance as an origin server. DNS is managed by Cloudflare, the website URL is “cs351.couetil.com” (I’ve moved my senior project website to “senior-project.couetil.com”). Non-SSH access to the application server is restricted to the AWS Cloudfront IP range.

Access to the django admin page is a concern, I’m storing student AWS credentials. There two means of interfacing with the django application: through a standard user interface, which can be accessed by normal users, and through the django admin page, only accessible by super users. The django admin page provides a granular view to database tables, whereas the standard user interface hides privileged information and performs all sensitive operations server-side, so no student credentials are sent out of AWS. I’ve made the admin page inaccessible publicly, it can only be accessed using SSH forwarding.

Account Snapshots

I can now generate snapshots of an AWS account, both in the autograder container and on the instructional website. This is performed by using a Python library I’ve created aws_snapshot.

I’ve take the AWS SDK calls being made ad-hoc in the testing script and created a typed specification that produces a JSON structure that can be stored, inspected, and asserted against. You can create a snapshot of all the resources you care about from any authorized AWS session.

When new assignments are created, the snapshot structure can be incrementally added to in a safe way. The library has 100% test coverage (in fact, all the code for my senior project has 100% test coverage, even the instructional site.)

There’s a config.ini file injected into each autograder image that powers the snapshot feature. It specifies the API url to POST the submission data to, the API token for the assignment, and aws credentials that container uses to authorize a snapshot of the student’s AWS account.

config.ini
[api]
url = <SUBMISSION_API_ENDPOINT>
token = <API_TOKEN>

[aws]
aws_access_key_id = <FROM_INSTRUCTIONAL_AUTOGRADER_USER>
aws_secret_access_key = <FROM_INSTRUCTIONAL_AUTOGRADER_USER>

Student credential concerns

I am storing student AWS credentials in plain text in a database that is encrypted on disk and in a private network. The credentials are read-only access to a student AWS account. I am still concerned about storing stateful long-lasting privileged secrets. Gradescope already stores and shows them to any instructor or teaching assistant assigned to a class. It would be better if we had a mechanism for a student to grant us an IAM role temporarily for a single autograder run.

Let’s discuss a way to grant that temporary credential.

We’ll instruct students to set up cross-account role assumption. Students will create an IAM “Role” instead of a “User”. The role will be called “CS351-autograder” whose permissions will follow along this template:

{
  "Role": "CS351-autograder",
  "TrustPolicy": {
    "Version": "2012-10-17",
    "Statement": [
      {
        "Effect": "Allow",
        "Principal": {
          "AWS": "arn:aws:iam::<INSTRUCTIONAL_ACCOUNT_ID>:user/autograder"
        },
        "Action": "sts:AssumeRole"
      }
    ]
  },
  "ManagedPolicyArn": "arn:aws:iam::aws:policy/ReadOnlyAccess"
}

The policy grants permission to an IAM user in the instructional AWS account to “assume” the role a student created in their account. Role assumption comes with temporary credentials granting all privileges associated with that role. Students will assign the ReadOnlyAccess policy to the role and will no longer have to generate an access key. Roles are different than users. Roles are identities with minimal permissions and short-lived credentials. There is no way to gain access to the AWS console through a role and no way to attach a privileged access key to a role.

We’ll discuss the other half of this authorization flow, but first I want to describe how it has worked for previous assignments. Our grading scripts run on Gradescope’s infrastructure in a container we provide them. Gradescope likely runs the autograder container as an ECS task in their AWS account. That means that the default AWS permissions structure in our executing container is defined by Gradescope’s technical team. This has been experimentally validated in the following manner: I’ve published an assignment’s autograder container to a private Elastic Container Registry (ECR) in the instructional AWS account, and updated the assignment settings to pull an image from that ECR repository on each submission. At first, autograding attempts failed because the repository was private. When I granted Gradescope’s AWS Account ID access to the instructional ECR repository, their autograding infrastructure was then able to pull repository images. If the default AWS permissions for an autograder run is controlled by Gradescope, how can our autograder script get access to a student’s AWS account for grading? That’s where the credentials file comes in, and why we require students to create an IAM user with a long-lived credential that they submit on Gradescope’s website. In this manner, we use a student’s own privileged credentials at the start of each container run to establish a new AWS SDK session that has the privileges needed to grade a cloud assignment.

With that background, the changes needed to move liability from students to the instructional staff, while improving security for both, becomes clearer. Revisiting role assumption, the instructional AWS account will hold an IAM user “autograder”. The user will have the privilege of being able to assume a role “CS351-autograder” in any AWS account. When a student submits their AWS Account ID as a credential to a Gradescope assignment, the autograder container will read their account ID and external ID (more about that in AWS documentation and in the assignment), then assume that role. Now, short-lived credentials will be passed from the student account to the grading script and grading will continue as normal. The only long-lived AWS credential during an autograding run will be an access key from the instructional account packaged with the container that will set up the initial AWS SDK session. This means students are protected from accidental credential exposure (which has already happened), and instructors can carefully manage permissions between Gradescope, grading scripts, and student accounts.

The credentials packaged in the grading container for user “autograder” from the instructional account has minimal privileges. All it can do is assume a role in a student account, and get the user ID and account ID for the current AWS session. No other access to the instructional account is enabled.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": "sts:AssumeRole",
            "Effect": "Allow",
            "Resource": "arn:aws:iam::*:role/CS351-autograder",
            "Sid": "AllowAssumeGradingRole"
        },
        {
            "Action": "sts:GetCallerIdentity",
            "Effect": "Allow",
            "Resource": "*",
            "Sid": "AllowGetCallerIdentity"
        }
    ]
}

And with that the instructional staff becomes an intermediary managing access between Gradescope and the student.

Students now only have to submit their Account ID and their secret External ID to Gradescope to trigger an autograde request. They will continue to upload a credentials file, and it will be an INI file with fields.

[default]
aws_account_id = ...
external_id = ...

We have now made some progress towards securing the execution environment. Let’s discuss possible work for the future by reflecting on the question, “what essential roles does Gradescope perform?” First and foremost, Gradescope integrates with Brightspace to pass along grades. Second, it authenticates a student and provides course management options to the instructor. Third, it is an innovative user interface for students and staff to interact with. A fourth feature used by this class appears to be essential, but is in fact, incidental. Autograding does not need to be triggered by Gradescope. They are the inspiration, but the autograding action performed is simple and wholly defined by the instructional staff in assignment docker containers.

If we want to take more control over the cloud assignments, we can take one step, then another.

The first step is to have our autograder scripts execute in the instructional AWS account rather than Gradescope’s. The flow is as follows. An assignment docker container (for example, “Cloud Assignment 2”) is constructed and provided a configuration file containing an API token granting permission to the instructional site’s REST API with permissions scoped to a particular assignment. Gradescope will execute this container when a student submits their account id, and the container will collect the credentials file and POST it to the instructional site along with the assignment ID and API key. The instructional site will read the student’s AWS account ID from the request, find the grading script for the corresponding assignment, and start an ECS+Fargate task that will perform the grading. It responds to the initial container request with a URL of an endpoint the Gradescope container should poll until the grading task is complete. When the grading task has completed, the poll request will return an updated status and a URL to fetch the results.json file for the student submission. Remember, the results.json file is the API Gradescope has defined to control a student’s score on a particular assignment. The autograder container will download the results.json file to the appropriate location on its filesystem, and exit. The student will then see the result of their submission in Gradescope’s user interface. With this setup, all autograder containers submitted to Gradescope are the same except for a configuration file that provides an assignment ID and API token for the instructional website. No AWS credentials, either from the instructional staff or students, will ever be shared with Gradescope. Publishing assignment containers becomes simpler and more secure, and instructors will have total control on the methodology used to grade cloud assignments, opening up avenues for automation and innovation.

The second step, which is much too long of a step to consider at the moment, is exploring what this class looks like without using Gradescope’s autograding feature, and experimenting with different ways to provide Cloud Assignment material to students.

Billing Event Streams

It would be nice to get a stream of billing events from students. That would indicate what resources they’re experimenting with, as well as give us as instructors an idea of if they are within or exceeding any free tier limits.

That might be possible by setting up AWS EventBus across instructional and student accounts to collect billing events. AWS EventBus charges per 64KB sent, and $1 buys you a million 64KB. Unfortunately, cross account requests double charge, once for each account. This approach would necessarily incur a charge for the student, even if small.

It’s impractical to stream billing events from AWS, as its expensive and the real accounting is every 24 hours anyway. Instead, during each snapshot, the CloudWatch “EstimatedCharges” metric will be recorded instead.

Submission Data Visualization

I will have to consider what pieces of data I’ll want to visualize per-assignment.

This is all I can think of at the moment.

Cloud Assignment 1 reflection

Next week I will reflect on Cloud Assignment 1’s Ed issues. I don’t have time to go through them at the moment.

Managing Autograder SSH credentials

In order to grade Cloud Assignments, we have to access student instances over SSH. Currently, we ask students to SSH into a fresh instance and manually include autograder’s public key in the authorized_keys file. We publish the public key in the assignment.

We could create an Amazon Machine Image (AMI) with the authorized_keys file containing the autograder public key already present. Creating AMIs for students to use in assignments provides opportunities for automation and monitoring during assignments.

Accessibility Concerns

Cloud Assignments already have alt-text for all images and links. But it is manually created, so they are minimally accessible. As part of the build step for the Astro project, I can send each image and link to an LLM to summarize as alt-text, then include that alt-text in the final build result. I can also cache the result for every image and link based on a content-hash, so repeated calls to an LLM for the same content will not be made. This is a low priority at the moment, and will likely be worked on when I develop the markdown editor integration when this course will be handed-off.

As a start, I could perform this step when I generate a single file of an assignment.

Accessibility standards provided by Grace.

I need a Microsoft CoPilot API key from Grace.

There is GenAI studio offered by Purdue. Grace shared the link, it didn’t work for me at first. https://www.rcac.purdue.edu/knowledge/genaistudio

Student concerns about cost

There have been a lot of students outside of the free tier. Several students have already incurred (minimal) usage costs. One student, with minimal extra usage, suggested he might have blown $3 on his mistake.

It would be good to be upfront with students about how much an assignment will cost in cloud credits. First of all, it teaches them to always attach a cost number to their cloud use because it is not free (for serious use). Second, its an opportunity for us to present the business side of cloud computing in a way that directly affects their wallet, and being able to price cloud usage, and knowing all the different components that go into a bill, is actually really useful. So each assignment can have a “Cost” section where we do a projected calculation of the price of an assignment, based on some parameters. This would be fun, and a good educational addition to this course.

So an assignment will have a projected cost, and comparing the cost of different assignments will give a notion of the “scale” of an assignment to the students. As well as keep us, the instructional staff, honest about how much of a student’s free tier we’re using and force us to make that calculation for each assignment so we can present it to the student. Keeps us honest, keeps them informed.

Cloud Assignment Structure

Cloud assignments are starting to develop a common structure

Week 5

4,827 words · 25 min read

Summary

February 9 - February 15

Meetings

Accomplishments

Cloud Assignment 3

Motivation: The detection history feature went viral. Funny bird photos captured with your application are all over BipBop and CrackerGraham. It proved to investors you’ve gained traction and found PMF. You just raised a Series-A round of $30 million dollars. Time to scale.

Ethical concerns: Investors are demanding that they find new sources of revenue. One is also on the board of a large financial company interested in purchasing your user data to feed into credit models they use to set interest rates for clients who love squirrels.

Part 1 of Cloud Assignment 3 will focus on implementing the N-tier system architecture for Software-as-a-Service applications. Part 2 will focus on implementing an application feature that introduces the idea of distributed systems. This system/application split will continue through assignments. This format inspires many new assignments. If we take up our previous conversation of different “tracks” in the Cloud Computing course, for one or two assignments, students could select from a set of Cloud Assignments, each with a different focus: “Network Engineer”, “Data Engineer”, “AI/ML”, “Full-stack Developer”, “Cybersecurity”, “High-performance Computing”, “Computational Biology”, “Simulation and Modeling”, “Site Reliability Engineering”. Each assignment introduces a system architecture in Part 1, and then solves a relevant problem using it in Part 2.

Part 1: N-Tier Architecture

Part 2: Squirrel-due University

I will have to host an assignment server, with two REST endpoints, acting as a centralized co-ordination server to power the gossip protocol implementation

George, I’d like to have a guest blog or video on the topic of software ethics in the context of this assignment, can you recommend anyone?

Grace, I’d like you to write a “The More You Know” for Cloud Assignment 3. We have students updating the application code of the YOLO convolution neural-network model to detect “squirrel” as well as “bird”. You’ve worked with CNNs before, would you like to write a description of the specific YOLO model we are using, and provide specifics about how our natural language classification is transformed into an image classification?

The assignment is in progress.

Cloud Assignment 4

Motivation: Series B-*

Orchestration + Kubernetes

There could be an interesting ML element to add to for this one. Maybe they distill a LLM in RCAC then take the model weights and deploy it to a Kubernetes cluster using an ML runtime like Ollama.

The students will handle DNS Delegation use CoreDNS, the DNS that Kubernetes itself uses. I will delegate a subdomain matching their student account name to them, and they’ll configure CoreDNS for DNS delegation. I will also have to figure out some way to handle the SSL certificates for the cluster.

It would be nice if I could collaborate with other students, professors, or computing professionals at Purdue for this assignment, I don’t have practical experience with Kubernetes and I’m sure someone on Purdue’s campus does. Does RCAC run any Kubernetes clusters? They have an Anvil slide deck that mentions Kubernetes clusters there: https://www.rcac.purdue.edu/training/anvilcomposable. It’s basically a training document, so I’ll use it as reference material for the assignment.

I’d also like to meet someone who has trained LLMs before. I’d like their opinion and perspective on training and deploying an LLM model as part of an assignment.

Cloud Assignment 5

Motivation: IPO or Acquisition (Exit)

They will build their own FaaS. They will write their own Terraform provider (using Go!) that will deploy an application onto that FaaS. That application will be vibe coded by the student. The autograder will test that their FaaS works and that their terraform provider for it works, but the student will have to bring their vibe coded application deployed on their own infrastructure into class to show it off to the instructional staff and their peers to get the rest of their points.

Crazy Idea

My Cloud Assignments could be “written” by AI, like Claude Code. It’s already a conversational stream of markdown text, peppered with <components> and begging to be interactive. A cloud assignment could be truly very interactive.

AI-controlled UI components: I have “The More You Know”, “Hint”, “Vibe Check”. interspersing the assignment content. An “AI” bot is already involved in the course. It even assigns points! Talk about feedback! This goes beyond mastery learning, students could ask questions straight to the assignment, and the assignment would answer back.

It would be like a “A Young Lady’s Illustrated Primer” but for computer science education. An assignment would be a document published by the instructional staff, introducing and motivating the assignment. The student would then talk to the assignment to figure out what to do, driving the experience. The AI would guide them through the assignment, and add or subtract points along the way as they complete various tasks, until DING!DING!DING!DING! 100 points!

We would record every conversation a student has with the assignment, and pull a bunch of data about how they are learning and how we could improve. For example, with conversation content and telemetry, we would know precisely how much time a student spends on an assignment. I believe this is the holy grail metric of teaching. Studying the amount of time students spend on assignments gives incredible feedback as to the courses students are engaged with. Of course, different students will be engaged with different courses, interests vary. But I believe examining the variability of time spent on assignments by students would provide some very interesting patterns of behavior that would help improve assignment development and scheduling.

If the assignments continue in my style, they would be cute and fun. I could keep writing fun assignments like this, and they would become part of training data, so that the interactivity becomes better and more creative over time. If this is expanded to other classes, the components would be themed and appear differently according to the tastes of the instructor or the intended audience.

For example, my “<Info>” component appears as “The More You Know” callout in cloud assignments. It could appear as “Dive Deeper” in another course and be styled differently, even though the underlying component and intent is the same. This idea unifies the intent of different sections in an assignment across courses, while preserving a distinct identity for each course. A course could choose to disable certain types of components for their assignments if they don’t think it fits the content. For example, a competitive programming course would enable a leet-code style challenge component, while an intro programming class would disable it, but both would have “<Info>” enabled.

We could anonymize and examine individual learning outcomes by looking at the chat conversations. Both writing and voice would be possible. The student would speak, and have to read the response (no voice back, we don’t want to have to choose an accent for an AI voice, plus students should be encouraged to read).

I’ve already noticed during CS351 lectures, a majority of students have the course slides out and open on their computer. Most are ready to engage with course content that has been distributed to them over the internet. Also, students ask for practice exams, expressing a desire to practice curated content outside of class to prepare for an exam. A minority of students play games during parts of the lecture. If we could create interactive, curated, gamified assignments, students could practice on their own time in a fun and interesting way, and instructors would gain useful engagement data that would help improve their course (a flywheel).

Now, on the instructor’s side, it’s actually the same but flipped. The students will have a AI bot that guides them through completing an assignment. The instructors will have an AI bot that helps them write an assignment. This is an important part of the process. The student experience will only be as good as the assignments the instructors can write. Exceptional source material will deliver an exceptional educational experience. By giving an instructor tools that help them write the best assignment they’ve ever written, we’re giving an opportunity for a professor to amaze students, motivating them to take initiative and collaborate. The bot could also help the instructors manage the course. I still remember taking CS408 Software Testing with Pedro Fonseca. The assignments were exceptional in that course, I spent a lot of time on them and learned so much.

The instructor would create a new assignment and write out the learning outcomes. They would define a series of tasks they want the student to complete, and each task would have an associated test that could verify it was completed correctly. They would then write short essays on topics and ideas in the assignment, and add links to resources. This would be organized as a knowledge graph rather than a written document. The AI would then arrange and orchestrate the assignment, guiding students through the tasks according to the learning outcomes, and including content from the assignment knowledge graph along the way.

We could share the software with other universities and collect collaborators. We would share assignment content, increasing the base of high-quality training data for the AI. We would also share anonymized and processed conversation data, for analysis and training.

The AI Assignment would also hyperlink to high-quality course materials, whether that is news links, class slides (which are often open by half the class while in lecture), videos, or textbook chapters. When the AI creates a hyperlink we would archive the target’s content at the time so whatever was linked to and viewed in an assignment could be faithfully and accurately reproduced.

Research Direction:

Preliminary data on student submissions

We had two students submit the day part 1 of the assignment was released.

The first student to finish part 1 of the assignment was ██████████████████, just a few hours after the assignment was released. He asked a question on Ed at 9:40PM, and his first submission was at 10:02PM.  Their last submission was at 10:23PM, they were likely working on this assignment for just under an hour. He made 7 total submissions over 20 minutes.

The first student to submit was █████████████████, 4 hours after the assignment was announced. He posted on Ed when he couldn’t find the assignment on Gradescope. He made the first submission 1 minute after I activated it.

A metric to track is estimated time spent on assignment. We would define a “session” as a series of attempts where each sebsequent submission attempt is within an hour of each other. We could average number of sessions spent on the assignment, and then calculate the time per session by measure the time between the first submission and the last submission, then calculate the total time spent on an assignment by adding up all the session times. This could be visualized per-student, and as a class average and standard deviation.

A student submitted 5 times in a row with the same result. She made a post on Ed and it turns out she had a validation error in her choice of external ID (The autograder has been updated to have informative error messages for both the edge cases she triggered). Another metric to track will be number of sequent submissions with the same grading result. That would be a pro-active indicator that something is wrong with the assignment. She later completed Part 1 of the assignment after 16 submissions over 3 “sessions”, taking an estimated 4 hours.

Apart from Gradescope submissions, there are two other sources of student engagement. The first is the class Ed discussion board. When a student posts about an issue they are having with the assignment, we can be sure they are engaging with it. The second is Brightspace. It records the first time students view the Cloud Assignment HTML file. Using these three metrics, we’re able to build a larger picture of student engagement, from their first look at an assignment to their final submission.

Reflection on finishing Cloud Assignment 2

Cloud Assignment 2 has been a lot of work. I haven’t gotten enough sleep this week, and the effort has de-prioritized work for other classes. I’m happy with how the assignment has turned out, and I believe the trade-off I made this week was a good one. I am having a lot of fun! It’s bringing me back to new product launches I’ve been a part of.

This is part of the growing pains of delivering new curriculum content, and just like riding a bike, sometimes you need to pedal hard to get up to speed before you can cruise. I’m almost to a point where I can maintain a cruising velocity, let’s consider what I could build that would help me, and what I have already built during cloud assignment 2, that helps me deliver course content.

Next assignment will go more smoothly, but perhaps we can keep the “Part 1” then “Part 2” release schedule. It lets us put out a cloud assignment early, and it encourages students to sit down and make time for two sessions of work on a cloud assignment, possibly increasing student engagement and learning outcomes. Students will have a first attempt, reflect on that attempt, and then approach the second part with a new perspective. It also lets us get early feedback on an assignment, the first day Ed posts of students engaging with the assignment release were helpful - “early adopter” feedback. For the curriculum developer, it places less pressure on delivering a whole assignment in a short time period by adding 2-3 days of slack for “Part 2”. Because our assignments have students complete them using a DevOps-oriented approach, returning to an assignment after a few days shouldn’t be an imposition because they’ll have built sufficient automation to get up and running quickly, and will have the autograder for instant feedback. I placed a <Notice> after Part 1 of the assignment is done that the student can stop their EC2 instance if they want to take a break without losing their work. Stopping the EC2 instance will save them money because they are not paying for idle CPU time. Because of the work we performed in Part 1, all they’ll need to start Part 2 is start their stopped instance and they’ll already have 50/100 points from the autograder and can get started on the rest of the assignment immediately.

Our Cloud Assignments are evolving to have two parts: the first has a system architectural focus, where students have to implement a canonical cloud architecture. The second is application focused, the students have to implement an application feature that takes advantage of, or demonstrates the capabilities of, that new system architecture. This highlights the feedback mechanism between product requirements and system capabilities, and provides insight into a common misunderstanding between product teams and engineering teams. For example, in our fictional bird.ai startup, the cloud assignment 2 conversation may have gone like: Product - “Hey, we’re just implementing a history page, how hard can that be?”, Engineering - “So we’re letting users upload and store as many images as they want? But at what cost?”. We could have early releases of the architectural parts of the assignment, then the application implementation part would be released a few days later.

It has taken me about 40 hours to complete this assignment, from writing the assignment text to developing the starter application and autograder test suite. I’d like to praise the programming language Python. Its ability to introspect its own structure has been exceptionally useful for developing an assignment that students can iteratively update and deploy in a DevOps-style. I’ve created a stub Django model students update throughout the assignment. The controller that uses that model to introspect on its structure to determine at what step the student is in the assignment, taking a different control path based on its guess. That way, students don’t have to update controller code to account for structural changes in the data model (like a professional software engineer would), they can focus on implementing the concept introduced in the assignment and then immediately see the effects of their implementation . If they’re curious, they can also then see how their implementation is being used by reading the controller file. The controller file is heavily commented, contrasting with the empty stub model file. Students will learn how the controller works by reading the code, and then have to use that knowledge to create a model based on just a few instructions and hints.

Something very useful during my time developing the assignment was having a solution application and starter application split. The solution application was the finished implementation of the whole project. I could iterate on that during the initial phase of writing the assignment in order to test out any ideas. The starter application is what the students will receive. It’s heavily commented, with indications where to perform the implementation for each assignment step. During testing, I could copy the starter into a new directory, iteratively apply updates taken from the solution application, and emulate the student experience. It showed me a lot: first, I realized I had disabled DEBUG mode in the Django applications, so error messages were obscured, which would have made debugging issues way harder for students. I also realized there were inconsistencies between the solution application I had been iterating on and the starter application, which I was able to fix. I will keep this application split for future assignments, it has worked out well.

A challenge during this assignment was managing disk space, both locally and on the remote server. Let’s identify the places where this crops up:

For the next assignment, I will have to optimize the size of the Python dependencies. I will also have to instruct students on using the --rm option with docker run and introduce them to the docker system prune command, to clean up unused images. The systemd service definition I provided them handles this on the server. I will avoid having students export tar files to disk by having them push images to an ECR repository.

I’m not sure why Intel images are so large, when I did a multi-platform with a linux/amd64 target and the image was 8GB. This is a concern because many students in the class will be using Intel chips. I’m realizing I’ll have to develop assignments to target both architectures. Although containers are lightweight on a server, they’ve ended up being quite heavy on a laptop. A possible way to remedy this would be to introduce a continuous integration server (aka build server) server, where students send their build context over, it builds it for them, and stores it in a class container repository. Students would then pull from this image repository as needed, and we wouldn’t have to worry about (1) if their PC architecture matches the cloud server (2) if their PC is powerful enough to perform multi-platform container builds (3) whether they have enough space on their laptop to deal with the detritus of local container development.

A student was building containers for the platform linux/amd64 on their x86 architecture computer. When archived and sent to the t4g.small instance (ARM), the container would not run because Docker Engine does not come bundled with a virtual machine like Docker Desktop does for development.

Another aspect to this assignment is that a lot of compromises were made with security in order to simplify this assignment. We’re deploying a development-mode, permissive, somewhat “hackable” Django application onto the public internet. The risk is not high, there’s no privileged data, the instance has almost no privileges to other AWS services, the blast radius is small. How should we communicate this to students? That this is a learning exercise and to truly make the Django application production-ready would require a lot more detail work? I will address this in Cloud Assignment 3.

The instructional infrastructure has already been useful for helping debug student issues. A student posted on Ed about how they were stuck at 80/100 for the assignment. I was able to look at their last submission on the instructional site, grab their EC2 instance IP from the AWS snapshot taken of their account, and visit their web server to inspect the issue. This helped me quickly determine they had a permissions issue between Cloudfront and S3, and I was able to provide a timely and thorough response to them and encouraged them to make the post public so it would help other students. Having this information available has lowered the effort to debug student issues with assignments, allowing curriculum developers and teaching assistants to focus on the primary work of creating better course content, rather than become mired in chat-based debugging of a student issue.

Updates to instructional site

I’ve created graphs and summary tables on the instructional website to summarize assignment submission data. There are informational tooltips you can hover over to learn more about what data the charts display.

https://cs351.couetil.com/assignments/1/
Instructional Site: Assignment Detail View

I will be collecting SSH session history and HTTP session history soon, and experiment with ways of displaying this history clearly to an instructor. There’s an opportunity to “auto-debug” using RCAC’s GenAI service.

The styles have been modernized, and an effort has been made to make it easily navigable. Please reach out with any feedback.

Paying the bills

I am spending money to help run the course. Let’s explore in what ways:

I will have clearer accounting of the instructional infrastructure and cloud assignment testing costs at the end of the billing period, I will summarize it then. I have upgraded my Claude subscription to Max, $100/month, it will be sufficient as a coding agent for the rest of the semester, and I use Gemini at $20/month for research.

I will tag resources based on what their purpose to help keep track of spend, and every month do a end-of-billing period review.

Domain name for use with the course

I’d like a domain name managed by the instructional staff so students can learn about DNS and host websites behind human-readable names, they’ve been doing everything IP-based at the moment. I’m proposing CA3 and CA4 will introduce DNS to students in the form of A records first, then DNS delegation.

Let’s discuss possible domain names:

Jonathan suggested there is a possibility of running a class domain using Purdue’s infrastructure.

How I’ve been using AI

I use Claude Code for agentic coding. The new Opus 4.6 model released a month ago is a lot better at managing context length and working on multi-step long running tasks. It makes many fewer mistakes than 6 months ago, it’s a night and day difference.

One thing AI is good at is managing change. It is relentlessly detail-oriented, and when I have to update one detail to satisfy some part of the assignment, it will scan all other parts of the assignment for pieces that rely on that one deatil. So if I have to make a quick update because a student identified a bug, the AI will manage the change in that structure, and the correspondence between assignment text and autograder script, much more quickly than I could myself, or by using basic tools like grep.

Another thing AI is good at is driving automated testing loops. Because we have an autograder script, and I can create new infrastructure and deploy code onto it, the loop that builds the test script becomes very fast. I can test a bunch of scenarios by manually setting up the intended architecture, and the AI will write the autograder script that checks for and manages the differences between those scenarios. It makes me much more productive.

I do notice the AI makes mistakes, but they tend to be (1) conceptual, it doesn’t understand the purpose behind an assignment component, or (2) it makes different trade-offs when trying to solve problems, and chooses a different solution from the sphere of possibilities than the one I think is appropriate. To avoid these becoming systemic issues, I rely heavily on the “plan mode” with Claude Code. I’m able to fully review the plan document before any implementation starts, and I often go back and forth to resolve small details with it. Smaller changes are easy and quick to review, especially if I’m able to review a backlog of changes as it is moving ahead with the implementation. Using version control is important here, frequent commits improve confidence and let you backtrack easily. Integral is the practice of maintaining 100% line coverage in automated tests. It’s not a silver bullet, but it’s an effective “golden master” test, verifying code updates do not introduce regressions by verifying behavior has not changed, even if it hasn’t perfectly verified the behavior is correct.

Now for what I DON’T use AI for. All my writing, the text of the assignments, the weekly reports, is in my voice. For better or for worse! To make the cloud assignments interesting, fun, and engaging is a lot of work, and although I use AI for research (Gemini is great here), I don’t use it for writing.

Cloud Assignment 1 reflection

Ed Issues

11 issues, 3 remained private.

There were questions about requirements.txt. A student pointed out I had not updated the due date in Brigtspace. Three students had issues running out of temporary storage when installing Python dependencies on the server. A student explained how to set the architecture type in order to find the instance needed for the assignment. A student had a clarification question about installing Python on the EC2 instance. A student could not connect to her instance, but did not respond to inform me of the root cause. A student has two AWS Accounts, and use different ones for CA0 and CA1, leading to some confusion. ███████████ posted about not being eligible for the free tier. And finally, an hour before midnight, there was a 27 message back-and-forth debugging session between a student and I to help him complete the assignment before the due date, the root cause was a requirements.txt file that installed unnecessarily large dependencies.

From looking at the lecture responses from students about CA1, I remember many mentioned the parts of the assignment where they had to figure out some task without explicit hand-holding, for example, many mentioned how figuring out rsync was valuable. I will keep this in mind for future assignments, that students like to figure things out on their own.

Week 6

2,384 words · 12 min read

Summary

February 16 - February 22

Meetings

Accomplishments

Cloud Assignment 2 Data

Cloud assignment 2 had an average 18 submissions and 4.5 hours spent per-student . I’m happy to report 76% of students achieved a 100%, and 94% received at least a 50%. 5 hours seems like a reasonable amount of time for an assignment that is due for 10 days, and it was likely exacerbated by container build times for several students.

Submission times are clustered right after class, dropping a bit around dinner time, with the biggest spike between 10pm-midnight. There are very few submissions in the morning, students work on cloud assignments in the afternoon and at night. Over a third of submissions were made on the due date. Only a small percentage of students started the assignment on the last day, but 50% of students started working on the assignment only 3 days before it was due, and a majority of students only achieved a perfect score on the due date.

We now have a lot of data about student engagement for an assignment. When we write the Cloudbank proposal, we can use this data to produce a more accurate cost estimation of the course over the whole semester (they require us to provide a “Period of Performance (PoP) Cloud Spent management” table that projects month-by-month cloud spending for the research endeavor). This is also useful cost and engagement data we can bring to request more funding for the class. We’ll hold off on any specific projections until a couple more assignments are complete.

I did k-means clustering on a group of features and the students were best split into two cohorts. The most distinguishing features of these cohorts were (1) over how many days did they work on the assignment, (2) total number of attempts, (3) total number of sessions, (4) estimated time spent, (5) number of days actively working on an assignment. The logistic regression surfaced similar features, none are predictive though. We’re left with common sense: the more time and effort you put into the assignment, the better score you’ll get. Also, we shouldn’t build predictive models for cloud assignments, we’re not here to predict an outcome, but identify what students are struggling and push them all to the best outcome. Predictive analysis is limited in this domain, time is best spent observing descriptive statistics of student progress and effort.

https://www.cs351.cloud/assignments/1/
CA2 Score Distribution
https://www.cs351.cloud/assignments/1/
CA2 Time Spent Distribution
https://www.cs351.cloud/assignments/1/
CA2 Submission Time of Day
https://www.cs351.cloud/assignments/1/
CA2 Due Date Submissions
https://www.cs351.cloud/assignments/1/analysis/
CA2 Feature Importance
https://www.cs351.cloud/assignments/1/analysis/
CA2 StudentClusters

AWS Account Organization

I’ve overhauled how the instructional site is managed on AWS to anticipate:

  1. Course infrastructure handover to new staff.
  2. Cloudbank integration requiring infrastructure handover.
  3. Student account management requiring proper billing and permissions.

How the instructional site was intially set up has several downsides:

  1. Billing ambiguity - course costs are mixed with my personal use of AWS.
  2. Blast radius - misconfigured IAM policies or inappropriate resource use can escalate without limitation.
  3. No student isolation - difficult to impossible to properly restrict student account boundaries in the event of deeper course integration.
  4. Handoff friction - account handoff requires removing and shifting resources, causing operational burden and service downtime.

I’ve addressed these issues, and provided simpler solutions for the eventualities, by creating a dedicated AWS Organization for the CS351 project. Instructional infrastructure is isolated to a dedicated member account of the organization, and students may also be given their own personal member accounts, with all member accounts governed by Service Control Policies (SCPs), which provide greater guardrails than IAM policies.

Future maintainers can take over governance of the organization, or the member accounts can be transferred to another AWS organization under another institution’s control (Cloudbank, Purdue, RCAC). All resources in operation in the moved account at the time of the move will be unaffected. Student accounts may be created at the beginning of a semester, and deleted at the end, and will be governed according to different SCPs than the instructional infrastructure.

Billing is now isolated to each individual member account, making cost easy to measure. Resource use restrictions are easier to audit, and automation to this effect will be simpler to implement.

The instructional infrastructure permission model is simple to understand, it has full access to the capabilities of an AWS account. Let’s discuss a student account’s capabilities and characteristics.

A student’s AWS Account will be:

  1. Fully isolated - no cross-account access can occur, these safeguards are as strong as the ones between any normal users’ AWS accounts.
  2. Secure by default - we are able to enforce MFA and other security policies for any student account.
  3. Easy cost attribution - any billing activity within a student account is only attributable to that student.
  4. Easily torn down - student accounts will be defined in infrastructure-as-code, and can be created, modified and destroyed consistently and immediately (e.g. instant autograder configuration, updates for a new assignment).
  5. Assignment guardrails - Students will be granted resource access by assignment-specific SCPs limiting resource use to only what’s needed to complete the assignment.

Let’s discuss some of the initial guardrails, to paint a picture of what’s possible.

  1. Region lock - students can only use and create resources in “us-east-1”.
  2. IAM restrictions - students are limited to one IAM user provisioned for them by the instructional staff (for logging into the AWS console).
  3. Pre-approved Instance types - students may only use approved instance types in EC2.

I’ve hinted at ways to limit what services a student has access to, and how to limit their choices within a service, but I haven’t discussed how to limit the quantity of those services they’re able to consume. The practical approach is to have an AWS Budget alert at some threshold value ($20, $50, …), which when triggered causes an AWS Lambda function to apply a deny-all SCP to the account that surpassed the threshold. Instructional staff would also be alerted, and would have to get involved to determine what the student did to exhaust their resource quota, and what that means for their assignment grade.

We are now ready to (1) accurately track course infrastructure cost, (2) handover its management to any responsible party, (3) onboard Cloudbank as a funding source for cloud computing at Purdue, and (4) grant accounts to this semester’s student cohort and accept billing responsibility for future cloud assignments.

In order to organize a move to Purdue-controlled AWS accounts, from the instructor perspective, we would have to request a quota increase to the number of AWS accounts we’re allowed to make in an organization, the default is 10. From the student perspective, in-class we would show them how we use the cloud to help manage the course, and explain we are gathering data for a NSF proposal that will secure funding for the class, so we can do cool projects.

I suggest we bring this organization structure to the graduate section of the course. It may help them set up the infrastructure needed for their cloud project at the end of the semester.

graph TD
subgraph Before["Before: Single Account"]
  A1[Personal AWS Account]
  A1 --- SP[Senior Project Website]
  A1 --- II[Instructional Infra]
  A1 --- SA[Student Resources<br/>shared namespace]
end

subgraph After["After: AWS Organization"]
  MGMT[Management Account<br/>org definition + billing]
  MGMT --> INFRA[Infra Account<br/>EC2, RDS, ECR, CloudFront]
  MGMT --> S1[Student Account 1]
  MGMT --> S2[Student Account 2]
  MGMT --> SN[Student Account N]
  PERSONAL[Personal Account<br/>senior project website<br/>unchanged]
end

Before --> After
https://us-east-1.console.aws.amazon.com/organizations/v2/home/accounts
AWS Organization for CS351

Gradescope service interruption

Gradescope deployed an update on Thursday before the assignment was due that affected the test harness they deploy onto autograder instances. We hadn’t updated our container code for a few days, so the error did not originate from us. Gradescope’s test harness phones-home at each startup to download an updated script. Something in that update was not executing the grading script. I reached out to their help email that day. They were responsive, and the problem was fixed in about 6-8 hours. We were not the only group affected by the issue.

This provokes the question, can we operate the course when Gradescope is down? At the moment, no. However, we have a clear path forward in case we want to re-implement any features, it won’t be very difficult. The primary challenge is onboarding students to hold accounts on the instructional site. SSO through Purdue is likely out of the question, so the practical approach is “magic link” onboarding/login that is restricted to Purdue emails, and only those emails assigned to a class within the instructional site. Once students are onboarded, implementing autograding is easy.

If we want to run a “dual-stack”, that is, let students choose between submitting to Gradescope and the instructional site, that is also easy. The Gradescope container simply polls the instructional site for the autograding result, and when displaying the final score in the user interface, will link to a results page hosted on the instructional site.

We do need more error monitoring, I was only alerted to the issue because students on Ed brought it to my attention. Unfortunately, it’s a chicken-and-egg issue. We can’t log a failure if Gradescope doesn’t even run our code.

Ed Integration

I’m syncing all activity in the Ed discussion board to the instructional site. I’ve cloned the discussion board, and created an activity summary dashboard.

https://www.cs351.test/ed/
Instructional Site Ed Discussion
https://www.cs351.test/ed/dashboard/
Instructional Site Ed Dashboard

I’ve also integrated RCAC’s GenAI service and started experimenting with AI summaries of Ed categories.

https://www.cs351.cloud/ed/ai-summaries/
Instructional Site Ed AI Summaries

I have a couple more experiments in mind.

  1. AI Ed Triage - answer student questions immediately using the mix of different data we have (class syllabus, cloud assignment text, student aws account snapshot, etc.).
  2. AI Real-time Assignment Debugging - If a student has an issue with a cloud assignment, have the AI run verification loops against the autograding script and “reference architecture” (a solution deployed in the instructional infrastructure).

Estimating the cost of instructional infrastructure.

So far, the instructional site has cost me ~$25. The new infrastructure will cost more, I’ve added

In order to audit these costs transparently, I’ve added an “Infrastructure Costs” section to the instructional site that uses the AWS Cost Explorer API and Cloudwatch “EstimatedBilling” statistic to display costs conveniently on-demand.

Instructional Site Downtime

I had downtime, due to pushing a feature out without an integration test between the autograder container and the instructional site. They were tested invidually, but not together. Compounded by accidentally rotating the assignment API token, which took me a moment to realize how to replace. The first domino that tumbled the rest was PostgreSQL rejected a raw file upload because it contained null characters, for which I had inadequate logging to realize quickly.

I’ve added rolling application deployments to the instructional site so data isn’t lost during deployments. The autograder containers have been updated to perform retries with exponential backoff when submitting student assignment results in order to accommodate blips in API availability during deploys.

I’ve also added rolling EC2 instance deployments by using the Elastic IP feature from AWS. Once I’ve confirmed a new instance is healthy after an AMI upgrade, or change in type/size, I can point the Elastic IP to it and remove the old instance. This is a two-step instance switchover primarily driven through Terraform and Ansible.

Our infrastructure is now more flexible, robust to both application and system upgrades, but could use more monitoring so we can be aware and pro-active when addressing performance issues and bugs.

Instructional Infrastructure improvements

Submission page load times grew to 10-20s, now at 500ms-1s. Our submission table was growing quadratically in size due to how much data Gradescope includes in the submission_metadata.json file. Our separated table is still growing quadratically in size, but it’s contained, and that problem is easy to address, the production database is 40MB on-disk out of 20GB available.

Remember, we’re still a t4g.small. It has 2 vCPU with burst, but that drops to .4 baseline vCPU. And 2GB of memory. The database is even smaller, a 1GB 1vCPU db.t3.micro, it accrues CPU credits at half the rate of the t4g.small. Both network and storage bandwidth are throttled. The application is pretty fast considering these limitations, burst scheduling fits it well.

Speedups:

Solutions:

Total speedup: ~20x

In-class questions

I will be making an effort to record in-class questions and address them in Cloud Assignment content.

Content-delivery networks

Confusion about CDN vs data center networking. George drew a good distinction between

I will update Cloud Assignment 2, the final CDN section “Push it to the edge”, with a “<Info>” section that addresses some of these ideas.

Docker image naming

Confusion about Docker image names (id vs name vs tag). Will be addressed in Cloud Assignment 3

Cloud Assignment 2 Issues

Lots of student issues to work through this assignment, it was not a breeze. the code was good but not great, and lots of bugs slipped through. There were two primary problems:

Stuck at “push it to the edge”

Many students were stuck at 80 points, failing the last section, because the straightforward path through the assignment could cause caching issues. This part of the assignment I built when I was tired and working quickly, so its no surprise it is challenging to complete for students. This should have been a better experience for them, it was unnecessarily frustrating. Typically, obstacles in cloud computing are a learning experience, but I think this one went a little too far. It should be updated and refined for next semesters class.

Image Build Times

Took hours for many students. I counseled one student to build the image on the EC2 instance and it went much faster.

Part of the issue was I had students use Python 3.14.2, the most recent release, and a lot of packages didn’t have pre-compiled binaries available, so it default to compiling them during the build stage. A better Python version target would have been 3.12.

x86 images had a lot of GPU code. I could have given them better hints for how to compile a slimmer CPU-only image.

Cloud Assignment 4

Technical Motivation: introduce microservices by taking the strangler fig approach, transition our N-tier web architecture to Kubernetes piece-by-piece. Helps “scale” parts of the service that have different reliability and load requirements.

Social Motivation: We have a big engineering team stepping on each others toes. Introduce Conway’s laws, and microservices, as a solution to this approach. (idea: should students work in teams? Gradescope allows group submissions). Different teams have different KPIs and perform different work (reliability vs agility, front-end vs back-end, etc.). How will you as a DevOps engineer make everyone productive?

Week 7

2,471 words · 13 min read

Summary

February 23 - March 1

Meetings

Accomplishments

1-on-1-on-1 Meeting Outcomes

Assignment 3 Part 1 on Tuesday. Due Saturday night. Gradescope assignment “CA03 Part 1”. Will be worth 50 points.

Assignment 3 Part 2 on Saturday. Due Friday night. Gradescope assignment “CA03 Part 2”. Will be worth 50 points. Autograder will give 0/0 points for points from Part 1.

Assignment 4 will use student AWS Accounts under instructional control and will be a group assignment.

Connor’s office hours: I will do 2:30-5:00pm on Fridays.

Theory of Assignment Development and Student Engagement

Split the 10-day cloud assignment into two 5-day assignments.

Students who start earlier do better. 2:30h per assignment part is do-able in 5 days for students.

Issues are caught earlier. Less pressure to deliver on both faculty and students. Establish a class cadence of practical projects, like shop class.

We can enable this by making hard things easy through automation, so students can tackle larger problems over the semester.

We should tell them it’s ok to use AI. They can use RCAC GenAI service for this class, it’s free for students.

I’ve implemented magic-link onboarding for staff, and magic-link logins for staff and students. Over the next month, I will be transitioning the website to a purely email-based login flow for security and ease-of-use. For now, there continues to be a password based login option.

First, we side-step issues surrounding accidental password disclosure by pushing that responsibility to the email provider. Second, it reduces the number of passwords users need to keep track of and makes sign-in straightforward (although you will have to wait for an email to arrive). For heavy users of the site (visiting more than once every two weeks), they may never have to login again. For temporary visitors (visiting once in a two week period), they simply have to wait for the email to arrive to login.

Staff will have a new onboarding flow. They’ll receive an onboarding link in the mail which can be used once. Once used, the account is confirmed and they can login with a different magic link. Students onboarding flow is tied to the enrollment of a class. If a student is assigned to an active class (which already records their email), then magic-link logins will immediately work for them. Students do not have the same access to the instructional site that staff does, they will see pages only related to making submissions to an active cloud assignment.

I’ve been trying to get AWS to approve production usage of our email service, they denied the first request, it’s still stuck in sandbox mode (you have to pre-approve every receiver and they have to confirm). I’m going to try again, but George, you may have to create an AWS account with the instructional site and jump in as an official representative of this Purdue course. This is blocking having students use the instructional site directly for submissions, so for now we will continue to rely on Gradescope as the primary interface students use for assignment submissions.

https://www.cs351.cloud/accounts/login/?next=/
Instructional Site Login Page

Implementing Autograding

This feature requires significant architectural considerations, I’ve made an effort to keep them simple and cost-effective. First, I’ll list some key features, then mention the added infrastructure.

The autograder jobs run on a dedicated t4g.small worker instance that coordinates with the main application server through the database, using a Django 6 feature called “tasks” with a PostgreSQL backend. My original idea was to spawn ECS tasks on-demand, but there are a lot of edge cases surrounding the ECS/Fargate integration that are avoided by having a dedicated worker node. The pros are better local/production parity and control over the runtime environment, the con is now there is a risk the site can’t handle a bunch of concurrent student submissions and we carry a higher baseline cost of compute.

It’s worth mentioning the series of features I had to implement to make this possible.

  1. Enable AWS Organization SSO for the instructional site (granting access to AWS console for manual tasks like creating support requests that enable email)
  2. Integrate email-sending into the application (enables login and onboarding)
  3. Course management, student enrollment statuses (manages student access to website)
  4. Magic link onboarding (piggybacks on Outlook’s Purdue SSO integration to authorize students)
  5. Staff impersonation of student accounts (allows instructors to see what a particular students sees, necessary for testing and debugging the student dashboard)
  6. Observability dashboards (tracking emails sent, logging all HTTP requests, and surfacing traffic insights in a dashboard for security and to understand usage)
  7. Load testing under burst and baseline conditions (to prepare for students rushing to the site to submit around deadlines)
  8. Container Registry / ECR integration with cloud assignments (so autograding runs can pull an assignment-specific autograding image)
  9. Dedicated task queue worker node in production (this worker node runs all background jobs, not just autograding tasks)

After all that, I was finally able to construct a simple student dashboard that first triggered autograding jobs locally, working through the challenges that came up one-by-one until I finally got a working production deployment. I then finished by revisiting and reimplementing Gradescope autograding, this time powered by the instructional site. Phew!

https://www.cs351.cloud/dashboard/
Instructional site student dashboard
https://www.cs351.cloud/dashboard/job/8/
Instructional site pending job
https://www.cs351.cloud/dashboard/job/8/
Instructional site completed job
https://www.gradescope.com/courses/1214735/assignments/7764520/submissions/394279250#
Gradescope completed polling
https://www.cs351.cloud/autograde/
Instructional Site autograde jobs

Load testing

I’ve been running load tests against the application stack (haproxy + django) to tune healthcheck settings and identify slow pages, preparing for increased usage from students and instructional staff.

I’m running the application stack both in “burst” mode, where the containers are given their full burst capacity of 2GB of memory and 100% of CPU time, and “baseline” mode, where CPU time is restricted to 40% and 20% CPU time for app and db, respectively.

The first insight from this test was that any un-paginated views in the application would increase unbounded in size.

The second insight was that under sustained load in baseline settings, Django handles 100 concurrent users, but response times can increase above the healthcheck threshold that HAProxy uses to determine if a backend is unhealthy. Increasing the tolerance for slow health check responses from 5s to 15s dramatically drops the false postive rate of an unhealthy backend, improving application availability under high-load.

The third insight was as concurrent users rise to 80+, the default two django workers had insufficient threads to multiplex the user sessions, causing dropped connections. Increasing haproxy timeouts, and giving workers 8 threads each instead of two, dropped the error rate from ~40% with 100 concurrent users to 0%.

There are a few parameters we now use to tune performance:

Under sustained load from 100 concurrent users with ~10 requests per second, the failure rate is 0% and response times rise to 10s per-page on average at baseline settings.

I want to call out there are a lot of DB writes per-view because I track a lot of usage statistics. There is a lot of opportunities for optimization, but that is outside the scope of this project. Our usage is relatively low, and we use the cheapest and weakest EC2 instance types. If the website gets slow, the best and obvious choice is to rent bigger instance types. We also don’t have any performance metrics that would be part of an SLA. These performance metrics would define an acceptable performance boundary against which we would set harder limits on parameters to guide performance improvement decisions, balancing cost against user experience.

https://www.cs351.cloud/db/
Instructional site database performance
https://www.cs351.cloud/requests/
Instructional site http request log

Agent Tool Calls

A lot of people aren’t aware of what these coding agents are capable of. Can I quantify it somehow? The best idea I have at the moment, is keeping track of every tool call made by an agent while developing the instructional site.

I took this idea from “entire.io”, a tool by ex-GitHub engineers I was using to keep tracking of my token usage and agent conversations. Their tool was slow, and failed often, so I analyzed it with claude code to understand all the data they were tracking, then recreated it in a Rust CLI tool that stores data locally in a SQLite DB and shows a more useful statistics summary then they were exposing in their web application.

Is the vibe-coding class tracking agentic tool use statistics across all their students?

Here is the agent usage from my weekend of cloud assignment development, pulled from my tool claude-track:

Terminal
$ claude-track stats
=== Claude Code Usage Stats ===
Database: /Users/connor/.claude/claude-track.db (15.3 MB)
Tracking since: 2026-02-28T03:01:23Z

--- Sessions ---
Total sessions:         126
Total duration:     81h 14m
Avg session:            40m
Sessions today:          16

--- Models ---
Model            I/O Toks  Sessions
───────────────  ────────  ────────
claude-opus-4-6  1,631,309       106  ████████████████████

--- Token Usage ---
Input tokens:             175,256
Cache creation:        14,929,111
Cache reads:          514,766,398
Output tokens:          1,456,053
API calls:                  8,127
Cache hit rate:             97.2%
Est. cost (total):       $387.97

--- Prompts ---
Total prompts:          829
Avg per session:        6.9
Avg length:         119 chars

--- Plans ---
Total plans:            280

--- Tool Usage ---
Total tool calls: 12,232
 Calls  Tool
──────  ───────────────
 4,718  Read             ████████████████████
 3,166  Bash             █████████████
 1,244  Edit             █████
   851  Grep             ████
   607  Glob             ███
   541  TaskUpdate       ██
   277  TaskCreate       █
   247  Write            █
   187  Agent            █
    98  ExitPlanMode     █
    68  WebFetch         █
    67  Skill            █
    44  WebSearch        █
    35  EnterPlanMode    █
    28  Task             █
    27  AskUserQuestion  █
    19  EnterWorktree    █
     6  TaskOutput       █
     2  TaskStop         █

--- Top 10 Files Read ---
 Reads  File
──────  ────
   120  ~/.../gradescope/tests.py
   109  ~/.../accounts/tests.py
   103  ~/.../accounts/views.py
    91  ~/.../app/settings.py
    83  ~/.../requestlog/tests.py
    82  ~/repos/claude-question/src/commands/stats.rs
    74  ~/.../app/urls.py
    71  ~/.../requestlog/dashboard.html
    63  ~/.../submissions/views.py
    60  ~/.../requestlog/views.py

--- Top 10 Bash Commands ---
  Runs  Command
──────  ───────
   695  git
   458  find
   443  cd
   341  ls
   263  grep
   125  docker
   120  instructional_site/bin/django-tests
    80  cat
    72  cargo
    62  python3

--- Activity by Date ---
Date         Calls
──────────  ──────
2026-02-28   6,022
2026-03-01   4,853
2026-03-02   1,357

--- By Project ---
 Calls  Project
──────  ───────
11,232  ~/repos/senior_project_cloud_computing
   986    ↳ gradescope-api-skill
   538    ↳ dapper-percolating-flamingo
   310    ↳ iridescent-imagining-rocket
   276    ↳ cache-headers
   240    ↳ giggly-hopping-lerdorf
   233    ↳ nav-search
   210    ↳ statistical-models-evaluation
   143    ↳ radiant-waddling-lollipop
   122    ↳ request-log-pagination
    94    ↳ invite-instructor-wording
    75    ↳ shimmying-juggling-sky
    67    ↳ assignment-course-display
    60    ↳ test-django-dev
    28    ↳ responsive-nav
   968  ~/repos/claude-question
    31  ~/repos/dotfiles
     1  /tmp

Best thing since sliced bread.

SSO access to AWS Organization

I’ve enabled “AWS SSO” in the “IAM Identity Center” for the AWS management account. This allows us to provide AWS console access to the infrastructure account for instructional staff. The management account continues to be restricted to root login only (I have those credentials). Any users with access to the infrastructure account on AWS will be required to register an MFA key for login. There is a link on the instructional site dashboard to log into the infrastructure AWS console.

To enable a new user to log into the AWS console, currently it’s a manual process where the terraform configuration has to be updated creating a user with a target email. It will remain that way for the time being, I am the only one developing this project.

I had to implement this so I could see the status of our email integration. We send “noreply@cs351.cloud” emails using AWS SES, which has its own approval process.

Gradescope API integration

I’ve integrated Gradescope syncing by storing my personal login credentials in the instructional site, then authenticating against the Gradescope website and parsing their webpage HTML. Gradescope is a Ruby on Rails application that server-side renders some React-driven components to the page. This is a classic startup architecture, and is favorable to scraping by simple html-parsers.

There were challenges developing against the API. When they make certain web page updates, it will break our parsing logic, so I developed a live-data fixture-based testing loop to verify our integration will continue to work over time. The implementation is tied pretty tightly to the instructional site. If I did it over again, or if we move towards tighter integration with gradescope rather than away from it, I would make it a separate python library totally, and run integrations on a schedule periodically so the package self-heals when Gradescope makes any API changes. The added benefit is that I could then create a typed and tested library interface to the gradescope API for the instructional site, avoiding context pollution and test bloat in the instructional site.

This is one of the first pieces of technical debt we’ve accrued in the instructional site, and should be addressed in the future when we reassess our use of Gradescope.

AWS Bill

I got a $38.70 bill from AWS for February. Not bad considering all the instructional site and cloud assignment 2 work.

Cluster Analysis from Cloud Assignment 2

The 2 cluster student categorization ranks students against two axes.

The horizontal axis is a proxy for the span of time a student has worked on the project. The right-most outlier is ████████████, who worked on the assignment over 5 days, and the left-most outlier is ████████████████, who worked on the assignment over 5 hours.

The horizontal axis may be a proxy for sustained effort regardless of point outcome.

The vertical axis has a positive outlier, ██████████████████, who started the assignment early and finished in 2 hours in 2 sessions over 2 days. It seems to be a proxy for rapidity of progress and starting early.

I’ve added a visualization to the cluster data, when you select a factor, it changes the size of a plotted dot proportional the distance from the mean the factor was for that student. Makes it easy to do a visual check of student clusters by particular factors.

I bet we’ll have more, and clearer, student clusters with more data. Once we have that data, let’s think of some questions to ask.

Here’s a quick discussion of what clusters appeared when applying K-means and K-medians to the Cloud Assignment 2 data.

K-Means: k=2 — a simple binary split:

K-Median: k=6 — six distinct behavioral profiles:

Removing instructor outliers let k-median explore higher k values and find meaningful sub-populations. The standout finding is Cluster 2 — 12 students working almost exclusively at night on weekends with only a 33% success rate. That’s a group k-means completely misses by lumping them into the generic “low effort” bucket. Similarly, k-median distinguishes the efficient early starters (Cluster 4, 100% success in just 8 attempts) from the persistent grinders (Cluster 5, 92% success but needing 30 attempts).

https://www.cs351.cloud/assignments/1/analysis/
cloud assignment 2 k means analysis
https://www.cs351.cloud/assignments/1/analysis/
cloud assignment 2 k medians analysis

Data inconsistencies

█████████ is a student where I noticed a score discrepancy between Gradescope and the instructional site. See: https://www.cs351.cloud/students/55/.

His last score on Gradescope is an 80. His last score on the instructional site is 5.

Gradescope seems to have a race condition where the submission_metadata does not correspond with the results.json. The student submitted twice in less than a minute, I believe this caused the inconsistency. There may also be an issue with concurrent autograder scripts running, a reboot was a part of the last assignment, I have to block concurrency for autograding attempts, that is, a student can only have one autograding happening at a time, otherwise their results will not be accurate. I’ve also made so even if a double submission happens on Gradescope, only one autograding run will be triggered and both Gradescope runs will be given the same results endpoint to poll, eliminating concurrent autograding jobs.

In this case it turned out alright for Ariel, his highest score was the one that stuck.

Week 8

98 words · 1 min read

Summary

March 2 - March 8

Meetings

SES production rejection

We are trying to enable magic-link logins to the instructional site for students. That requires “production” SES access. Our account, being new, only has “sandbox” access, limiting what email addresses we can send to. Our initial request for production access has been rejected, we need to implement a system for handling email blocks and complaints, as well as add a privacy policy to the website.

Other Classes

This week I spent most of my time on my other classes, CS 354 and CS 381.

Week 9

198 words · 1 min read

Summary

3/9 - 3/15

Meetings

SES Production Access

I’ve re-opened the service request to get production SES access, allowing us to send emails from our domain cs351.cloud.

Obsidian Plugin

I’ve created an obsidian plugin for writing Cloud Assignments. I’ll dogfood it when writing Cloud Assignment 3.

Student Access to instructional site over spring break

A student, ██████████████, requested some way to practice Cloud Assignment 2 over Spring Break.

Steps to enroll student:

Improvement: A cloud assignment can be in one of several modes:

Then, I enable “Practice” mode on a cloud assignment for the whole class. I would also record in what mode a submission was performed in, so grading is accurate.

Week 10

653 words · 4 min read

Summary

3/16 - 3/22

Meetings

Accomplishments

Cloud Assignment Part 1

Cloud Assignment Part 1 is ready for release on Tuesday 3/24, it will be due Sunday 3/29 at 11:59PM.

Part 2 will be ready for release on Saturday 3/28, it will be due Friday 4/3 at 11:59PM.

Cloud Assignment Part 1 introduces students to

Planning out the rest of the semester

We don’t have many weeks left in the semester. I’d like to discuss the final deliverables for my Senior Project so I can manage my time effectively across all my courses.

Confirmed Deliverables:

Unconfirmed Deliverables:

Cloud Assignment 3

Plan complete, assignment in progress.

Cloud Assignment 4.

Plan in-progress, group assignment where students deploy a vibe-coded application to a Kubernetes cluster.

Cloud Assignment handoff

I’d like to discuss how we’ll perform the handoff to whoever will run the Cloud Assignments next course session.

Instructional Website maintenance

I’d like to discuss who will maintain the instructional website after I’m gone.

Research Paper

I’d like to discuss what we might publish, and whether that paper should be delivered during the semester or during the summer.

Authoring Cloud Assignments

A concern raised in previous meetings was the method by which I author Cloud Assignments. Initially, I would edit assignment markdown documents in vim and run an Astro dev server in another terminal, allowing me to preview the rendered assignment in a web page. Obviously, this requires the writer being comfortable working from the command line in a typical software engineering workflow. It was suggested that Obsidian is a convenient GUI for editing markdown documents, and is familiar to existing instructional staff.

I’ve created a custom Obsidian plugin for authoring Cloud Assignments, letting a preview sit side-by-side an editing pane. It integrates directly with the instructional site, and you are able to authenticate against the instructional site within the plugin.

Obsidian

Features:

This is the first step towards a user-friendly platform for authoring Cloud Assignments.

Enabling e-mail for the Instructional Site

The Instructional Site is still stuck in the SES sandbox, and is unlikely to get approved for production access. I spent a few days trying to get the approval to go through.

In an effort to get the approval, I implemented:

I clearly outlined our use case, but it was still denied. The reasons for the denial were likely:

There are a few paths forward, both of which require addressing the above issues.

The first option is more practical, but encumbers the instructional site with additional services to depend on. The second option requires students click an email link within 24 hours and is subject to aggressive rate-limits. The third option is a sizable manual effort, with no guarantee that we will be approved.

We will have to discuss the trade-offs. For now, students will continue to use Gradescope to work on Cloud Assignment 3 .

Week 11

2,522 words · 13 min read

Summary

3/23 - 3/29

Meetings

Accomplishments

Cloud Assignment 3 Part 2

Cloud Assignment 3 was completed this week. The final tasks of the cloud assignment are:

Part 1: N-Tier Architecture

Part 2: Squirrel-due University

I’ve kept a running list of obstacles and architectural decisions for Part 2.

The first task introduced HTTPS and SSL certifications. My initial idea was to set up Let’s Encrypt’s Certbot running on the proxy, and have the students configure Nginx to read the certificates it generates. Certbot is subject to rate limit of 50/week for a single domain (our cs351.cloud), however we have 70 students in the class. It’s unfortunate, it’s one of the universal and accessible methods for implement TLS encryption. Instead, I’ve had them put a Cloudfront server in front of the proxy to perform TLS termination, although traffic between the CDN and the EC2 instance remains unencrypted. AWS Certificate Manager handles certificates for Cloudfront and has high rate limits (2,500/year per account).

Future assignments incorporating HTTPS should give more thought to provisioning certificates. We would have to run a central server that manages DNS and coordinates with Let’s Encrypt using DNS-01. Students would interact with it through an API/UI to manage assignment DNS. The student experience could be enhanced by providing Amazon Machine Images (AMI) that automate this.

Deploying production applications is tricky, and not always easy to encapsulate in an assignment. Now, Cloudfront+ACM is quite robust, I use it for most of my personal projects, so this assignment can stay the same for future sections. Whether it transfers neatly to other clouds, I’m not as sure.

Claude has gotten really good. Context size increased from 250,000 tokens to 1,000,000 tokens while I was developing this assignment, improving its ability to complete long tasks. I’ve also noticed its ability to explain complex topics as improved, and it has developed a propensity for creating charts and examples as part of these explanations. My crazy idea seems to be more possible every day, the key is understanding what type of scaffold (aka “agent harness”) is needed to deliver that type of experience.

Had to increase the YOLO model size from nano to large (that’s a size increase from 11MB to 75MB), the nano wouldn’t detect squirrels. Curiously, the xlarge model was worse at detecting squirrels and birds than the large. It has to do with the open text embeddings we use to power this NLP-based image detection, smaller models can be trained to better detect these open-ended embeddings than large models. The increase in model size increased response times 10x, and memory use has increased quite a bit too. One time during inference the website returned a 503, the thread had been killed due to Out-Of-Memory, the t4g.small is not the right fit for ML applications. I rode my bike around Purdue on Sunday taking pictures of squirrels and birds, so the gossip protocol is seeded with images and locations. You can visit my application at https://ccouetil.ca3.cs351.cloud. The YOLO model is really bad at detecting birds and squirrels!

Here’s a screenshot of the new feature students implement in Part 2.

https://ccouetil.ca3.cs351.cloud/map/
bird.ai map

Keeping this assignment in the free tier was difficult. ECR Public vs Private size limits and rate limits caused issues. TLS Certificate rate limits other issues. Disk space limits on EC2 instances another restraint. Overall, I estimated the cost of this assignment to be around $35 if a student leaves their infrastructure running for 10 days outside of the free tier. The resources used in the assignment are:

The EXIF data stripping didn’t go quite as expected, iOS aggressively strips location data from images, even if you explicitly allow location access in a bunch of settings. I think the website would have to be an smartphone application in order to make the location stripping work effectively. It wasn’t feasible to do GeoIP for this assignment. To keep the EXIF intact for images I took with my iPhone, I had to enable location services and export the photos out of the Photos app onto my Desktop folder before uploading those files to the web application. I think this section should be tweaked for future versions of this assignment.

I developed an instructional API server to support the DNS and gossip features. The instructional API server supports both:

the students:

and the autograder:

The API server uses an open-source Python library called FastAPI, and runs on a separate EC2 instance from the instructional site. I can’t really justify why I made this API separate from the instructional site, I just preferred the idea of any supporting resources for a particular cloud assignment to be separate from the instructional site itself, which is meant to support all cloud assignments. However, this is another burden of deployment and maintenance when this cloud assignment is run again in future sections of the course.

Cloud Assignment 4

RCAC has gotten back to me, apparently I missed their initial message during Spring Break. I will try to get a meeting with them next week, and see what they can offer in terms of support for this assignment.

Let me consider what the two parts of CA4 will be.

Part 1: Migrating bird.ai to Kubernetes

Part 2: Vibe code something and deploy to Kubernetes

I believe at this point in the semester, if the students ran the architecture from all assignments for the full 10 days of each release period, we would be about 2/3rds of the way through the $100 AWS Free Tier.

I will research AWS EKS (Elastic Kubernetes Service) this week.

Cloud Assignment 5 ideas

This will not happen this class section, but in my conception CA5 would be implementing a function-as-a-service on Kubernetes.

Learning outcomes:

Task list:

Advanced Cloud Assignments

Cloud assignments could be adaped for the graduate course. Instead of detailed starter code, the assignment would contain guidelines and they would have to code it all themselves, using autograder feedback to guide their implementation to the correct solution. Basically, they vibe code their way to the solution with no starter code, just the goals, principles, and deliverables expressed in prose, diagrams, rules, and tables.

Riffing on IEEE Spectrum Article

The article provoked some thought. What does the future of software look like? Or computational products more broadly?

I love tennis. Let’s train a video model to perform line calls and score tracking for a tennis match. Use a simulated world model of a tennis match to RLHF the video model.

This paper built a tennis game simulation to train a physical robot to play tennis, and successfully. They open sourced the code.

https://zzk273.github.io/LATENT/static/scripts/Humanoid_Tennis.pdf

The training pipeline from the open-source repository is:

  1. Motion capture 5 amateur tennis players
  2. Convert human poses to robot joint angles using LocoMuJoCo
  3. Train a movement policy using reinforcement learning in a physics simulator (MuJoCo)

Additional steps taken but not released as open-source are:

You would partner with Schwartz Tennis Center to bootstrap the real human play video data used to (1) categorize a larger variety of play styles the robot can learn from (2) fine-tune the general purpose video model to a particular environment, perspective, or video device, using diffusion models that alter each frame of the simulation video to match the target environment. It’s about developing a data flywheel.

Once an accurate and successful model exists, you would use NVIDIA deep learning accelerator (https://nvdla.org/) to design a chip to do this for on low power, and explore deploying it in different form factors: with a solar panel, on a drone, etc. Live video feeds, line calls, and score tracking would stream directly to your phone through Wifi and Bluetooth during a match, and be backed up to the cloud using the phones as a proxy, to be accessable at any time.

What is the future of software? Are we sick of our screens? Is our most important work embodied in the real world? Are sensors and robots the future of the technology industry, is SaaS doomed? What happens when “I’m a coder” means “I tweak AI model architectures” instead of “I tweak database queries”? Is prompting doomed as an inexact science, will it be replaced by a training data set and reinforcement learning policy?

Who knows, but this is all pretty fun. I created a tennis court in the simulator and watched the rendered model run around. I didn’t pull the trigger on performing a training run, instead using human motion reference data, but it would have cost me around ~$15-30 over 1-2h renting a p4d.24xlarge (8x A100 GPU) using the AWS spot-market.

I will confidently claim (and put my ignorance on show for fun) that:

Cloud Assignment Development Process

I’m concerned that my development process for cloud assignments will not scale beyond me, or be easily inherited by other instructional staff. In fact, I’m quite certain of it. Let’s list some challenges

If this course continues to move in the direction we’ve established, we’ll need to build additional tools that help instructors maintain the assignments. Let’s list what assets an assignment needs, and consider what any additional tools may look like.

Assignment deliverables:

Helpful tools for instructors:

Student Experience:

Regardless, the assignment development process should be standardized and fully integrated with the instructional site.

Week 12

4,970 words · 25 min read

Summary

3/30 - 4/5

Meetings

Accomplishments

Cloud Assignment 3 Retrospective

This assignment took 2/3rds less time to complete than the last. However, students procrastinated this assignment about 25% more, almost half of all submissions occured on the due date, and the average time a student made a submission shifted a few hours towards night. 78% (53) of students achieved a perfect score, the exact same as Cloud Assignment 2.

Assignment Analysis

Let’s take a look at our Cloud Assignment analysis page.

https://www.cs351.cloud/assignments/2/
chart

Average time was shorter this cloud assignment, and the confidence interval is tighter.

https://www.cs351.cloud/assignments/2/
chart

I’ve added our class time to the course data model.

https://www.cs351.cloud/assignments/2/
chart

The submission timeline chart nows displays vertical lines showing when 50% of students submitted or completed the cloud assignment, marking how much time before the due date the event occurred. 50%+ of submissions occurred 2d 1h 39m before the due date, and 50%+ of competions occurred 7h 37m before due.

https://www.cs351.cloud/assignments/2/
chart

We now have a Student Planning section, where I provide actionable recommendations for students to improve their score. I want students to improve by helping them manage their time more effectively. Computer science assignments are hard to judge how long they’ll take. By examining students who scored a 100 on the assignment, and taking the median of how many days before the due date they started, the number of working sessions, and the time spent per session, we can give data-driven advice to students on when to start working and how much time to expect to spend.

https://www.cs351.cloud/assignments/2/
chart

Looking at the data, the advice for Cloud Assignment three is very reasonable: “Start at least three days before the due date and plan three working sessions each about an hour in length.” I’ve already shared this with the students in Ed.

Characterizing Student Performance

I’m always looking for measurements of how students compare to each other, so I can (1) identify what adequate class performance looks like so I can help students manage their assignment workload, and (2) identify outliers, or inadequate performers, so we can decide to take pro-active action to help a student or reflect on our assignment experience more broadly.

https://www.cs351.cloud/students/?course=1
chart

Effort vs Outcome compares students scores and estimated assignment times averaged across assignments, separating students into four quadrants. We do not need to worry about most students in Efficent or Strong. Outliers in Efficient suggest cheating, outliers in Strong indicate conceptual misunderstanding or inability to satisfy assignment pre-requisites (tools, hardware, permissions, money). Disengaged students are your classic C-students, to improve their performance you have to address and encourage them as a group. Struggling students merit individual attention. They are spending time on assignments but persistently failing.

https://www.cs351.cloud/students/?course=1
chart

Nineteen students (28% of the class) do not have a perfect score on both CA2 and CA3. Assignment trajectories demonstrate how those students performance changed between cloud assignments. I’ve categorized students into groups based on trajectory features and quadrant features. I’ll discuss the features with the largest number of students: Improving, Declining, and Lost. Eight students were Improving and saw their score increase from CA2 to CA3. Ten were Declining, and saw their score decrease. Five are Lost and scored significantly worse on CA3 than on CA2.

https://www.cs351.cloud/students/?course=1
chart

I also collect the score progressions for individual students in a chart sequence on the student list page.

https://www.cs351.cloud/students/
chart

API Data from “Gossip Squirrel”

CA3 required special instructor-hosted infrastructure to enable in-assignment features, namely, a DNS interface for students and the coordination server for the gossip protocol. I’ve realized this may be part of future assignments, and is another opportunity to collect data on student engagement with the assignment.

I created a CA3 Analytics Dashboard in the instructional site to demonstrate different engagement data visualizations.

https://www.cs351.cloud/ca3/
ca3 analytics

Performance was very stable, no errors and average response time <100ms. The API server was a Python server using the FastAPI framework with a SQLite database. It exposed an /analytics endpoint to sync data to the instructional site. Authentication was enforced on the student side using their AWS Account ID and External ID, and on the instructional site side with a secret API token.

https://www.cs351.cloud/ca3/
ca3 analytics

It’s fun to see what domain names students chose for their deployment.

https://www.cs351.cloud/ca3/
ca3 analytics

The Gossip feature of the assignment had low participation. Only two out of 53 students uploaded their own location enabled picture. I had hoped more of the class would have made an effort to take a photo of a bird or squirrel on campus. Next class session, I would demo the feature in class to encourage students to try it themselves, I didn’t encourage them at all really this semester.

https://www.cs351.cloud/ca3/
ca3 analytics

I also kept a live activity feed so I could debug any student issues that came up, but I didn’t need to use it.

Ed Issue Summary

I’ve improved the AI Summary interface for the Ed discussion board. You can select assignment categories, then trigger a run of multiple models at the same time.

Remember, LLMs take on the tone of the content they summarize. This assignment was a success for most students. Students who post on Ed are generally exceptions encountering issues. With that said, I found Gemma 3’s analysis the best. let’s examine its high-level view of the student experience of CA3.

Student Experience: Initial stages (Part 1 setup) seemed relatively smooth, with questions focusing on clarification of due dates (#98) and initial confirmation of tools being in place. However, as students progressed, particularly into Part 2 and the more complex integration of Terraform, Ansible, and image/model updates, the difficulty increased significantly. Several students reported issues with infrastructure setup and configuration, leading to frustration and attempts to restart from scratch (#123, #125). There’s a clear pattern: initial enthusiasm and understanding at the beginning, followed by increasing complexity and more frequent issues requiring support. Students seemed to appreciate the detailed assignment documentation, but the depth of integration required between multiple tools proved challenging. The late-stage issues suggest that time management was also a concern, as one student admitted to running out of time and attempting to revert to a previous submission (#129).

The summaries also include an anaysis of (1) Common Issues & Errors, (2) Assignment Text Improvements, and (3) Instructor Action Items. It’s nice scaffolding to kick start a reflection on a Cloud Assignment.

https://www.cs351.cloud/ed/ai-summaries/?category=Assignments&subcategory=CA3
ai summaries
https://www.cs351.cloud/ed/ai-summaries/run/1/
ai summaries

I also had Claude summarize Ed usage data. This is easy, because all Ed data is synced to our instructional site, even more than the usual interface. I’m able to track deleted posts, determine what time posts occurred, and derive additional statistics.

CA3 has roughly half the volume of CA2 but higher reply density per thread (4.9 vs 3.6). The AI auto-reply system was active for CA3 (21 runs, 18 bot comments synced back), while CA2 had minimal AI involvement (3 runs, 0 synced). Answered rate is much lower for CA3 (39% vs 81%), and engagement (views, votes) is down.

https://www.cs351.cloud/ed/dashboard/
ed dashboard

Assignment Flaws

The terraform init section was misleading, I had re-organized the assignment and it became out of date. The sequence of commands needs updating. For example, I asked students to inspect their terraform state file before they had generated it.

Sometimes its not clear when a student is asked to perform a task. At the moment, the cloud assignments test student’s reading comprehension. I don’t think this is a big issue, and we can improve this with better automation anyway.

We need more direct feedback from students after each Cloud Assignment.

Office Hours

Office hour participation was low. I had one student attend. It was valuable to watch her move through the assignment, and be able to ask her questions.

Students who used Ed reply bot after my office hours on the last day:

Half of last minute Ed posters achieved 100/100. It’s not clear if the auto-reply bot helped them, neither of them responded to any of its replies. Of the other two students, only Andrew replied to the reply bot, he used it three times.

Grading High Scores

I’m promising students we give them their highest recieved score, so I have to make sure thats the case. Do I need access to brightspace grading or do I already have it? I might be able to manually do this in Gradescope, it wouldn’t take much time.

Data-driven CA4 Groups

For CA4, any student that didn’t get 100 (15 of them) should be spread across all the groups. We’ll do a random grouping of the full scorers, and then distribute the partial scorers.

The groups will schedule a time to meet with me for 15 minutes every week, either in-person or on zoom.

CS351 Ed ReplyBot

I’m experimenting with automating Ed replies. I’ve implemented a AI replybot, where it responds to every thread, and if the student responds back to its reply, it will continue generating responses, like they are having a conversation.

Originally I was using deepseek-r1:70b, its a reasoning model that gave high-quality responses, but it has high-latency on the GenAI cluster, requests were timing out or taking ~5 minutes to complete. I switched to gpt-oss:120b which is quite fast, ~10s of latency, and medium quality. It tends to guess and mislead students for two reasons: (1) it doesn’t have reasoning enabled (2) it doesn’t have the context of a students code. It’s a great model to use for this experiment though, very reliable, it likely has more cluster resources allocated to it than other models. I’ve just recently switched to gemma3:27b, I really like its writing style. I’m realizing different models are better at the “frontend” (talking to users), and others are better at the “backend” (adhering to instructions).

There are some limitations to my implementation. (1) student posts can contain images. The bot will only see the URL of the image, not the image content, and it has no tools to inspect images. This makes it harder for the model to understand the full context of the student question when students share screenshots of their terminal. (2) The RCAC cluster enforces a rate limit of 20/calls per minute.

I’ve considered some other improvements: (1) up-to-date open-weight models like Google’s Gemma 4, StepFun’s Step 3.5 Flash, and Kimi’s K2 Thinking, all high-performing open-weight models. (2) I could keep a running AI generated summary of all posts and issues for a category on Ed. The summary would have two sections: a summary of un-resolved posts, and a summary of resolved posts, giving the AI a running idea of the collection of issues students are experiencing now and how previous issues were solved. Each new post and comment will get sent to the AI with the existing summary, and the AI returns an updated summary that incorporates the new information, which will then be included as context by future AI auto-replies. AI software systems are a new way of thinking about data!

An additional feature I implemented was transforming the markdown that LLMs emit into the XML format Ed uses for its comments, so that an LLM natural output format looks nice. This is more robust than getting the LLM to try to output Ed’s XML format, something it hasn’t been trained on. Parsers are everywhere in AI applications.

Let’s examine a student’s interaction with the AI auto-reply feature. ███████, posted Friday morning while I was playing tennis with a Terraform question. He had correctly noticed that Terraform was going to delete and recreate the credential he had made and attached to an EC2 instance manually. He was wondering if this was intended and if he had to modify the Terraform code to avoid this. Now, the AI replied, but because it lacked the source code in its context, it misled him by requesting that he update the provided Terraform code. Then, he replied to the AI message with more debugging information! This is exactly what I wanted!

Hilariously, he ended his message with “I’d like to talk to a human if possible.” Its important we provide frequent opportunities for students to interact with instructional staff throughout the semester, even if they don’t take advantage of it at first. We should not automate all human interaction, we should provide students tools that give them feedback. ███████ was clearly frustrated, and part of that is he is working on the assignment on the last day, part of that is the AI output is misleading and quite long (the classic “AI sounds right but is actually wrong”). Clearly, the reply bot’s prompt should be updated according to student feedback on how helpful the response was, but the mechanism is not clear.

I noticed students ask a question on Ed, then delete their post. This happened 3 times during CA3. These types of posts the ReplyBot is perfect for. They get quick feedback without feeling embarrased waiting, and the rest of the class gets to benefit from their question, improving class success rate.

Automating Cloud Assignment Development

A past weekly report discussed splitting Cloud Assignments into two parts. The benefits were (1) practical, splitting work into smaller chunks with more frequent due dates, and (2) structural, “Part 1” implements a reference architecture, while “Part 2” implements an application feature enabled by that architecture.

Now, how do you automate creativity? It’s hard creative work to create a cloud assignment. When you have a hard problem, split it up into smaller easier problems. “Part 1” is actually well-described. “You will implement N-Tier architecture” simply means Part 2 will have a proxy, application server, and database it can use to solve a problem.

Let’s describe additional architectures:

An instructor would select an architecture that has been developed separately, and the AI will adapt that architecture to help them create a “Part 2”.

I’ve discussed ideas for “Part 2” earlier, but I believe this is the tougher problem. Solving the easier problem of “Part 1” reference architectures is the right starting point for creating a curriculum development tool.

Claude Code Source Code Leak

Claude Code had their source code leaked because they accidentally published Javascript source maps alongside their new build release. We can use Claude Code as a reference point for how to create a high-performing agent harness to drive the student experience of cloud assignments.

RCAC Meeting

We had an RCAC meeting on Friday 4/3 at 9am with Dane Deemer. I reached out on 2/27 at the behest of Justin Gillingham to seek help and resources with Cloud Assignment 4.

Two topics were discussed:

Dane has been at Purdue since 2020, he does computational biology. He’s been at RCAC since October 2025. They use the Rancher GUI to manage their Kubernetes clusters. There are two clusters at RCAC: Anvil, an NSF resource (Dane works on this one), and Geddes, a Purdue cluster. Dane suggested Geddes is the better choice for this class. Part of the Rancher GUI provides a link to a kubectl configuration that lets students deploy directly to the cluster. They can use “Minikube” for running a cluster on their local computer, then deploy it separately to RCAC.

We informed them we will have ~15 student groups who need access to the cluster, and suggested resources on the scale of ~8GB RAM and ~80GB disk each. The assignment will run for 17 days, Dane suggested it should be easy to get a month of access.

They have some type of to enable DNS for custom domains, we should be able to take advantage of that. They have IP whitelisting and other firewall tools we can configure for the cluster if needed. Purdue is pushing RCAC customers to use their object storage, built using a Ceph cluster that provides distributed block storage.

Dane can be reached by email, ddeemer@purdue.edu, and he said he’ll reach out to the Geddes team and the team running the GenAI studio. I suggested adding Gemma 4 to the cluster and explained to him how we are using GenAI studio to help manage the course.

I have a good feeling about this, we’ll see!

Supporting Student Success

When I mentioned the estimated time to complete Cloud Assignment 3 in class (it should take 2-4 hours), I saw a female student breath a deep sigh of relief. Giving students some indication of how long an assignment takes so they can manage their time more effectively will reduce a lot of stress that students have.

I’ve discussed instructor tools, and student experiences of assignments, but what other student tools can we provide? I have some ideas:

Gradescope failures

We had gradescope failures. One of the issues is there are not enough resources for job queue on Prod. I will upgrade. The analysis document is in the repo: docs/2026-04-02-ca3-autograde-timeout-analysis.md

I’m using the AI to analyze past usage patterns, and judge how much we need to scale the service to accommodate the increase load. Call it an AI-assisted informed guess. It’s looking at the past peaks in load on the last due day, to judge what we should expect on Friday as students rush to submit. At some point in the future, this process will be automated, and auto-scaling will happen based on past usage data automatically.

This task is made easier because I tracked so much data. If (1) you track the right data and (2) the AI has tools to query that data, you can answer a lot of tricky questions quickly and robustly, and you pave a path towards further reflection and automation.

The problem was that autograding jobs were starved of resources due to high-latency AI jobs filling the single queue. The fix was to increase the priority of autograding jobs relative to AI related jobs, add another worker process to the worker node in production, and increase the thread limit per worker.

We had no more errors after those updates were deployed.

Refunds

A student spoke with me after class, he received a bill from AWS for January, February, and March. He’d like to be reimbursed if possible. Here’s are the totals for each month:

█████████: (███████@purdue.edu)

It would be nice if we could generate a cost report alongside each finished cloud assignment, representing how much money it cost (1) students and (2) instructors to run the assignment.

Cloud Assignment Features

How can we make cloud assignments an exceptional student experience?

VibeCheck component for student feedback

Students can submit a sentence that describes how they feel at this point in the assignment, or in reply to a prompt. Other students can then vote on the response they vibe with the most, or add their own new response to the list. Rich source of student engagement data, while also giving opportunities for students to be fun and social within a class assignment, so they don’t feel so alone.

Before the assignment begins, instructors would create 2-3 VibeCheck prompts to ask the students at various points in the text. These VibeCheck prompts should be designed to answer a particular question the instructors have, to get feedback on ideas from students, or be an opportunity for students to have fun making memes. The assignment should always end with a VibeCheck soliciting feedback on the particular cloud assignment.

Having a meme VibeCheck is kind of like how the New Yorker ends each magazine with a “Submit a Caption” contest. It’s one of the best parts of the magazine.

At the end of the assignment, the instructors would review the leaderboard and reflect on their initial questions and the student experience of the assignment.

This is my leaderboard idea, but instead of best score, it’s best memes. There is prior art, the teaching assistants for CS 381 run a meme contest in the class Ed discussion board after every midterm exam.

Reflection on learning outcomes

During my impromptu office hours, I was helping a student, █████████, debug issues with the ECR portion of the assignment. I found myself listening to her experience, and guiding her through the assignment. I saw a small portion of the assignment through her eyes, it was illuminating. Clearly, we must sit with users and work with them through an assignment to improve the experience. I found myself discussing the <Reflect> section titled “Food for thought”. The section ask the student to reflect on the steps they took to deploy in CA2, and what they just did in CA3, and to compare the two. The point I wanted to drive home was that all the manual steps we performed in CA2 (docker build, make swap, set authorized_keys) is now done in one Ansible command.

What can we draw from this experience? A student was working through a section of the text. I narrated and guided part of the assignment for her. We landed on a <Reflect> component. I identified a key learning outcome that should result from that reflection. I asked the student to reflect, and we talked until I brought her to understand the learning outcome.

I thought this was a valuable learning experience, and I considered how I would use AI to scale this interaction to all our students, as I can only sit down and guide one student at a time. I believe the key is recognizing I wanted the student to state the learning outcome herself, so I could be sure she knew it.

If each assignment had a set of learning outcomes, we could make <Reflect> components for each one. As the student progresses through the assignment, accumulating points, the AI introduces the reflections. A reflection is a separate, but stateful conversation reflecting on the question posed by the prompt. The student has to converse with the AI by text or voice until they are able to state the learning outcome. The AI will teach the concepts and guide them during the conversation.

A student will gain points from sections separate from these reflections, reflections, but they must finish every reflection in the assignment in order to have the points count towards a grade. That means, reflections are required but get you no points, and if you don’t answer them all, you get a zero on the assignment.

Developer Website and CLI

This project would benefit from having a web application and command line interface product-izing how the instructional site is developed. Basically, build a web app that helps you build a web app.

Will help developers onboard if we get interns, teaching assistants, or staff. Enables flywheel for AI-powered product development by creating and providing tools specific to our use-case for use by coding agents.

Tools:

I’ve noticed the AI guesses at commands and working directories a lot. It loses track of where it is in the file system, and when there are multiple ways to use a tool, conventions can be ambiguous causing it to fail until it finds out “how we do things around here”.

If one person continues to work on this application, it’s practical if tools remain ad-hoc. I believe there is real value in building a interface to query application logs, however. The instructional site has a long way to go before its properly observable.

For all our progress, the instructional site remains a proof-of-concept. There is a lot more work to do on it.

UI Polling traffic

Website traffic went from ~200 requests/day to ~20,000 requests/day because I implemented one user interface feature. I added a CA3 infrastructure analytics page that polls the server for updates, and left 1 tab open all day. I’ve separated this traffic from normal user traffic in the “Request Log” page, it’s labeled as “HTMX”, the javascript library used to perform this polling update.

Real-time updates are good. I want the website to check for updates. Clearly, real-time updates massively increase request volume to the application. How do we mitigate risks as we add more real-time features?

  1. have a global polling loop on the instructional site web page so polling events are batched and request number is minimized. We will collect metrics on load and request volume based on the polling parameters, and surface it in a UI, so we can tune that polling interval/behavior over time.
  2. Add appropriate HTTP headers to cache HTML partials used in real-time update requests, reducing load on origin servers. This may be fine tuned according to SLAs (update frequency) and projections (active daily users).
  3. create our first automated benchmark test, which load tests only those pages with the real time polling features. These load tests are actually feasible: they don’t test complex business logic, they simply test if polling requests for updated state complete successfully and quickly. Those are easy scenarios to create with database mocks, we already do it for our unit tests.

This class is running pretty well on two t4g.small instances (app+worker), I’m impressed. We’ve bumped up against the limits a few times now, but most of the time we’re feeling cozy.

Expanding Submission Materials

I met with a student, ███████████. He was stuck at the “Container Repository” section. He was having trouble pushing to the ECR repository. He had created a docker compose, and instead of building Nginx, he had been building bird.ai. Bird.ai is too big to fit within AWS Free Tier limits, and too big for ECR public repository limits. This is an easy mistake to make, its logical to assume bird.ai is the next part to work on, even if the assignment states to build the “proxy”. In fact, I was originally planning to make this the path through the assignment until I ran into AWS free tier limits and had to change my plan.

Why was AI unhelpful in debugging this? The error did not lie in his submitted credentials, AWS snapshot, or EC2 instance state. The error was in the docker compose file on his laptop. Andrew made an interesting suggestion. Why don’t students submit their whole working directory to the autograder to get graded, not just the credentials file?

I like this idea for three reasons:

  1. We can verify student file structure matches what is assumed by the instructions in the cloud assignment.
  2. We can provide additional checks and assign points based on their written code, not just deployed infrastructure, even running unit tests easily! (which I’d like to do now but is tricky in our current setup).
  3. We can provide all their code (or a diff against the starter code) to an AI to help them debug. This is the missing piece for any GenAI models to have the full context of what they’ve been working on.

I might implement this for Cloud Assignment 4, I have to think about it. It will be a good way for us to inspect the vibe-coded application code the students created, and after their presentation give them targeted/generated feedback based on the actual application code they submitted. Basically, finish the assignment, present your demo, and then receive a code review as feedback.

How to inspire new cloud assignments

There are a few universities that introduce Kubernetes in their computer science coursework, I’ll highlight a few, I’m sure there are more.

Undergraduate courses:

Graduate courses:

None of those courses have our auto-grading approach. Student solutions are manually reviewed as they have small class sizes.

How do we test out new cloud assignment ideas? We host a hackathon in the Fall, where students select from 1-5 new cloud assignments, spend 1-2 hours going through them with a group, and spend the rest of the hackathon using what they learned from the assignment to build whatever they want. At the end of the hackathon we get data about (1) what new cloud assignment is most popular with students (2) feedback about student experience of a new cloud assignment (3) issues highlighted in a non-graded setting way ahead of when the class would run them in the spring. Additionally, this hackathon acts as advertising for the Spring class, and would increase enrollment and visibility.

One of NYU’s projects is quite interesting. It benchmarks different ML algorithms across hardware so you make informed system architecture decisions. Similar to us in that they have students deploy a ML microservice on Kubernetes.

Inspired by NYU’s class, we could create an additional Cloud Assignment series. It will emphasize benchmarks and observability for software deployed on different cloud systems.

Week 13

1,860 words · 10 min read

Summary

4/6 - 4/12

Meetings

Accomplishments

CA4

Cloud Assignment 4 took a lot of iteration. The Geddes cluster was unwieldy (students would have to use Purdue VPN), on-par cost-wise with AWS EKS (final accounting TBD), and fatally, had scheduled downtime during the two busiest days for student submissions this assignment.

I transitioned to AWS EKS and built a CA4 dashboard that provides monitoring and control of the assignment experience for students.

https://www.cs351.cloud/ca4/
ca4 dashboard

I then designed the CA4 assignment to be a straightforward Strangler Fig migration. We had discussed (1) making this assignment a group assignment, and (2) making the second part creative, where students either deploy to the Kubernetes cluster an open-source application and create an RFC for a contribution to the project, or vibe-code their own application and create an elevator pitch demo video for it. I was time-limited and we reduced scope, and I believe the student experience this semester will be the better for that.

Our EKS cluster provides a namespace for each student, representing a claim on resources in the cluster. Student’s authenticate against the cluster using their personal AWS account. Because we know their account ID, we can grant them cross-account access limited to only their namespace and an ECR repository, and nothing else in the instructional account. A student namespace provides 1.5vCPU and 2.5GiB of RAM on average, with the ability to burst up to 6vCPU and 9GiB of RAM. Students are also limited to 15 pods, 5 services, and 3 volume claims.

The baseline cluster capacity rests on three t3.2xlarge nodes, which give us the best bang-for-buck capacity vs cost for this assignment. t3.2xlarge nodes are Intel Xeon Platinum 8000 processors with 8 vCPU, 32 GiB memory, and 40% baseline performance, costing $0.3341/hour. I added auto-scaling, primarily as a cost control measure, as the majority of student resource use occurs in the last three days of the assignment period, and I didn’t want to manually manage the cluster size. The maximum size of the cluster is set to 13 of the t3.2xlarge instances.

In some ways this assignment was harder to develop as I did not have much Kubernetes experience, it took a lot of iteration to get to a good final state. In some ways, it was easier to develop , because I didn’t have to keep everything in the free tier, I simplified the N-Tier architecture for this assignment, and Kubernetes provides a lot of primitives as part of the platform that AWS makes separate paying services (no wonder AWS wants you to use ECS on EC2 rather than EKS).

Claude got a lot smarter again. It handles large contexts flawlessly, is exceptionally detail-oriented, and adjusts to varied tasks very well. And now it’s good at understanding how long a task will take an assignment (I saw it debating whether a student could write a Dockerfile in 30 minutes and it decided it was unlikely). These AI are becoming very good assignment writers, and good at estimating the time to perform a task (and will become better with data). Again, my crazy idea is becoming more real.

A lot of the final prose of this assignment was written by Claude as I don’t have enough Kubernetes experience for certain explanations. I had it examine my past assignments to understand the tone to use and pedagogical flow. When I examined its output, I was impressed, and I only had to edit and refine about half the content. Part of my process for driving assignment development revolved around an assignment log, where bugs and fixes were recorded across iterations, in essence a history of the development process. I also had it create a thorough assignment walkthrough used as reference for the final assignment document text and autograder test suite. I believe a few conventions like these will make assignment development much easier.

When part of an agent harness, I believe a systems software assignment could be developed and deployed to students in 2-8 hours, plus any time for brainstorming. The deliverables would include an assignment document, starter code, supporting infrastructure, and an autograding script. It would automatically hook into the shared assignment infrastructure that performs data collection and produces the assignment analysis page.

More assignments should focus on Kubernetes. It’s role is expanding past containers into VM management and ML training/inference architecture. It’s such a big topic that one assignment is not enough. I was unable to discuss etcd, how Kubernetes schedules containers, container networking, and many other topics to constrain assignment size. I think 3 Kubernetes assignments would be appropriate and I would suggest the following Cloud Assignment series for next class section:

Final Presentation Outline

I think I might be able to get this done by quiet week, I’m really not sure.

  1. Overview of Senior Project
    • Goals and Accomplishments
    • Cloud Assignments (bird.ai, submission data analysis)
    • Instructional Site
    • Student Engagement and Success metrics
    • Conclusion
  2. Cloud Assignment Experience and Pedagogy
    • Bird.ai application evolution
    • Assignment Architectures
    • Learning outcomes (assignment data analysis)
    • Lessons learned guiding future assignment development
    • Ideas for future assignment series
  3. Instructional Site Infrastructure and Automation
    • Use of Claude Code to develop site and assignments
    • Architecture of Instructional site
    • Architecture used to support assignment experience for students
    • Ed scraping, category summarization, and auto-reply bot
    • Astro for writing cloud assignments, integration with Obsidian plugin
  4. Challenges, evolution, and opportunities
    • AWS Free Tier (contraints, cost monitoring)
    • Gradescope vs. Self-hosted infrastructure (integration details, discussion of downtime)
    • Automation to reduce technical demands for instructional staff.
    • performance improvement work (load balancing, work queue optimization)
    • GenAI RCAC
    • monitoring and observability
    • Discuss history of feature implementations, and how each enabled the next (e.g. submission tracking + Ed syncing + assignment storage/obsidian plugin = enabled ed auto reply both with full context)
    • need to dig through weekly reports
  5. Future Work
    • AI-driven cloud assignments
    • Student tools for success
    • Enhanced data collection and analysis
    • Automating Cloud Assignments Part 1 (reference architetures, pre-instrumented VMs and software for monitoring/autograding)
    • Multi-cloud
    • Institution provided accounts for students
    • Metered and monitored AI access for students
    • Onboard students to instructional site
    • Course management features, integrations with other tools (brightspace)

Student Best Scores highlighted

A student’s last score may not be their best score. I’ve highlighted students who meet that criteria on the assignment detail page, to identify what students need manual verification of a final Cloud Assignment grade.

https://www.cs351.cloud/assignments/2/
summary

Learning Outcome Studies

You could test out AI guided conversations towards learning outcomes by asking lecturers to collaborate, using their class session to collect data.

Before the lecture, we ask the professor to identify one key learning outcome he expects for his students out of the material. During the middle of lecture, we’ll interrupt and have students go through an AI guided lecture reflection, and measure the content (we can anonymize if needed).

As part of the proposal for this study, I would have to clearly define the data I intend to collect, and the analysis and visualizations I will produce using this data, and share that with the professors we collaborate with.

We would perform these studies in a multi-disciplinary fashion, starting in the College of Science, but I believe the business school or liberal arts would be good candidates.

AI: The greatest opportunity for Teachers and Researchers

AI makes hard things easy. Now we can ask students to do harder things, and scale how we share information and crowd-source troubleshooting.

The physics professor at College of Science at Purdue won the Murphy award, a key reason was that he wrote a physics textbook that brought Quantum Physics to the undergraduate level. No one believed it was possible, now his teaching methods are global.

What are LLMs good at today? They excel at summarizing vast quantities of information, and communicating it to a user. What is teaching? The summarization and communication of information. My word, we have tools that teach intelligently. LLMs may not be able to write the perfect software (yet at least), invent a new kind of rocket, or fix your emotional problems, but it can definitely teach you new ideas and guide you through obstacles to understanding.

AI will be a rolution in teaching. We’ve been teaching the same way for over a century. The internet began to disrupt that, that’s where I learned software engineering, and AI is a force multiplier feeding off that initial disruption.

Universities are at risk of more and more competition. High schools are already banning phones because they can’t keep students’ attention. A university will not be able to ban phones or computers. So how do they respond?

Researcher’s work just became more important, they are at the forefront of knowledge. Procedural knowledge has been automated, not new knowledge. And knowledge has just received an incredibly effective new form of distribution. With distribution comes influence. Researchers will become more influential, if they can understand how to package and share their knowledge in new ways.

Let’s discuss computer science curriculum. Computer science assignments are difficult to develop due to a variety of student hardware and the rapid pace of change in software.

Student time spent on assignments increases significantly when they run into incidental issues with software configuration. Debugging a minor software issue for hours is not a good use of time now that we have AI debugging. Student time should be spent on conceptual problem solving and engineering plan implementation.

A centralized learning platform with AI analysis of issues students experience in real time helps instructors highlight issues with assignment assumptions and respond rapidly to unblock students on assignments. This is partly a question of distribution: how do we get updated learning materials to students? How do we communicate information that will help them succeed in a course? The other component, assignment progress summarization and analysis, is tractable, as my experiments in this course have suggested. An experiment with distribution was attempted when I implemented a student dashboard in the instructional site, but its realization as a class tool was blocked by email configuration constraints.

With new tools that can evaluate intelligently, can we invent a new style of exam? Is a multiple choice question on a test the best way to assess student learning outcomes, or is it simply the easiest way to grade a large class? This is the flip side to curriculum development, student evaluation. We’ve invented a new kind of homework assignment, a new way to teach. But we haven’t invented a new way to test. The instructional site has no concept of an exam.

To close, AI depends on high-quality training data. I believe universities are in a position to create that data. In our case, creating a data set of high-quality assignments different AI can be fine-tuned on or benchmarked against.

Purdue’s position as a research institution benefits from this. They can afford to hire instructional staff, whose focus is creating high-quality written materials that can be used as assignment structure, prompts, and training data, powering an agent interface professors or TAs will use to create class-specific assignments in much less time, and with much less need to be a talented writer or curriculum developer. Professors will be able to focus on being experts in their field, sharing their knowledge and perspective, rather than laboring over the mechanical production of teaching materials.

Week 14

862 words · 5 min read

Summary

April 13 - April 19

Meetings

Accomplishments

Deliveries this week

Cloud Assignment 4 is humming along, it looks like it’s taking students about 2 hours at the moment.

I had to increase disk allocation to the nodes in the cluster. When a student reaches the final strangler fig portion of the assignment, they have to build and push a large (3-4GB) container image to the node. When the node only has 20GB, two or more of those images take up all space on disk. My integration tests did not discover it, because I tested with two student test accounts, not three. Nodes now carry 80GB of disk space.

The initial cluster for the assignment was over-provisioned. I had too much redundancy, too much initial capacity. I had overshot the obvious scale of the class, so I downgrade the EC2 instance type from a t3.2xlarge to a t3.large, half the cost, and reduce the min cluster size to two nodes from three. The installed autoscaler will do the rest, I’m curious to see how it will handle the last minute submission burst. This change brought the estimated cost to run the assignment from $500 to $200.

https://www.cs351.cloud/cost/
cost graph
https://www.cs351.cloud/cost/
cost breakdown

We have 3 students who are always the first to start. I’ve improved the Ed AI reply bot to have the starter code and k8s cluster state in its context, which should help students debug. The larger context has been causing timeouts in RCAC GenAI Studio with gemma3:27b, so I implemented auto-reply retries, falling back to gpt-oss:120b. Ed issues are minor so far, and the starter code only needed a single file update after release to fix a bug. One bug and 12 perfect scores.

We’re just about at the amount of data, 120MB, that common workloads bring memory pressure in the db.t3.micro VM (2vCPU, 1GB of burstable memory), below our <100MB threshold. This DB has served us well this semester.

I’ve implemented a “Redact mode”, which obscurs all student names, emails, and IDs on the website. It’s off by default, but when the course ends, I will enable it by default, to respect student privacy. I will also anonymize those same identifiers in the database itself and remove any infrastructure snapshots and audit logs. We may also consider publishing the anonymized data.

https://www.cs351.cloud/accounts/me/
redact setting
https://www.cs351.cloud/assignments/3/
redacted submissions

It would be easy to scale the cloud computing class to double the number of students next semester. With the preliminary usage data, we can prepare easily for resource use by students, improve the assignments to address v0 issues, and use historical debug logs to improve student feedback from the autograder.

Perplexity, Jerry Ma

Jerry Ma, a West Lafayette High School graduate and VP at Perplexity AI, gave a talk at Purdue.

His opening slide show showed how open source model intelligence had caught up with frontier models this year. He followed with an examination of an abstract computer consisting of two parts: Compute and I/O. Compute, the internal state of the abstract computer, unites two different executions of computing. A Model, a stochastic process, and a Sandbox, a deterministic process. I/O, or the computer’s interaction with the outside world, also consists of two parts. Search, a vaguely defined concept, and Embeddings, representing an ontology, or a base of known facts.

He found it important to explain the model training process (pre-training, mid-training, and post-training), emphasizing the importance of training has shifted from pre-training to mid- and post-training. He put this in context by invoking the Jagged Frontier, a measure of a model’s generality, specifically its performance at a range of professional tasks (coding, creative writing, law, etc.). Usage data at Perplexity shows different models are suited to different tasks due to the architecture of their mid-training and post-training pipelines, and what “they emphasize”.

He finished with an exploration of the AI Stack. In some ways, it’s a measure of the intellectual property of a software product. Two years ago in 2024, the model was the largest part of the stack, the largest source of IP. Now, interfaces are becoming larger parts of the stack. People need to be able to use AI. What is intelligence without action?

I was able to ask a few questions:

Q: What is the future of systems software research?

  1. rolling back state changes in OS/system
  2. sandboxing not just a single container/vm, but a distributed system.

Q: How do they perform evaluations?

  1. they look at usage metrics of their users, they have a lot of data.
  2. If and when a user switches models is an important signal.
  3. In-house model evaluations are too much manual labor, and get out of date.

Q: Intellectual Property for AI generated work?

Very contested, he can’t speak to it, said to “ask perplexity” for the latest developments in this space.

Fascinating breath of fresh air flown in from California. Some points he made lend themselves well to our endeavor.

Week 15

246 words · 2 min read

Summary

April 20 - April 26

Meetings

Accomplishments

Senior Project Website

The senior project website is now publicly available, no password protected pages. I’ve gone through and redacted any student names or other protected/publicly identifiable information. As the semester settles, I will begin sharing my senior project with others. Job applications, social media posts, and more are on the table. I spoke with Jonathan Poole on Thursday who is a technical writer with RCAC, they may be running a “this week in science” style post on me to advertise their community efforts.

Cloud Assignment 4

There was some confusion as to the due date of CA4, so it has been extended to next Friday. About half the class has completed the assignment, and the estimate time of completion is around 2 hours. Not too many obstacles have been encountered by students so far. There were a couple bugs raised on Ed, but they have been addressed with updated starter code and instructions. I’ll provide a full summary and analysis of the assignment when it is complete next week, in my final report.

End-of-semester survey

I created a public survey for students in the instructional site.

There two primary goals for the survey.

  1. Get feedback on the student experience of cloud assignments so we can improve them for next year.
  2. Understand what skill areas students feel like they improved on.

I’m looking forward the data we’ll collect.

https://www.cs351.cloud/survey/

Week 16

585 words · 3 min read

Summary

April 27 - May 3

Meetings

Accomplishments

When mistakes mean bills

We had a student accumulate ~$78 dollars in bills at the end of the semester.

I took two approaches to analyzing this problem

The snapshots indicated that he had left two Aurora db.r7g.large instances stopped, and that he left the t4g.small running for 6 weeks. Snapshots don’t covera ll possible AWS services, and are a sparse data point unable to provide a clear picture of a student’s AWS use, to know whether billing resulted from a cloud assignment or personal use.

I tried to have claude examine his AWS usage more closely using the same role assumption authentication for autograding, but he has disabled his role. So I had it estimate the costs based on the resources we were able to see. It’s possible that the full $78 dollars was generated by leaving assignment resources running, but his bill screenshot showed service charges from WAF (not part of any cloud assignment) and data transfer (high considering assignment requirements).

These are clues something else is going on, but the instructional site does not have enough data to perform a proper accounting. The student billing feature doesn’t run on a cron, nor does the individual snapshot, which is an oversight I never attended to. I’ve recommended he go into AWS Cost Explorer and export the data as a CSV we can inspect.

Let’s consider how we’ll remedy this in the future:

Daily cost tracking is cheap (one or two API calls on a cron), and should have been the default already. CS351-specific labels was never implemented in a cloud assignment, but can be a pre-requisite for passing a cloud assignment, allowing us to enforce a billing attribution structure in student accounts. Per-student assignment budgets requires us to estimate the cost of completing a cloud assignment, so we can set thresholds for alerts. We can do a better job helping students manage costs by verifying a student has properly torn down an assignment’s resources, either by pro-active monitoring after they get a perfect score, or by having the autograder for later assignments verify previous assignment infrastructure has been torn down.

Cloud Assignment 4

The assignment took most students just under 2 hours to complete. It’s an atypical assignment, it was an easier one, and students had 18 working days instead of the usual 10. Let’s take a look at the assignment graphs.

https://www.cs351.cloud/assignments/3/
https://www.cs351.cloud/assignments/3/
https://www.cs351.cloud/assignments/3/
https://www.cs351.cloud/assignments/3/
https://www.cs351.cloud/assignments/3/
https://www.cs351.cloud/assignments/3/analysis/

The students have improved. Almost all have pushed themselves into positive territory on the cloud assignments (one of the “Disengaged” cohort dropped the course, leaving two others). 48 students (~70%) received a perfect score on all of the last three cloud assignments.

https://www.cs351.cloud/assignments/3/analysis/

I deprovisioned all student deployments and destroyed the EKS cluster and all other AWS resources powering the CA4 assignment. The dashboard I created for CA4 is now a historical record of activity during the assignment.

https://www.cs351.cloud/ca4/
https://www.cs351.cloud/ca4/

We incurred almost $500 dollars in costs during this cloud assignment. While planning, I posted in Slack on April 13th that “expected spend during the 10 days of the assignment is ~$500”. We hit that after 18 days of running the assignment, my post-launch cost optimization tweaks worked well.

https://www.cs351.cloud/cost/

This concludes the cloud assignment experience this year. Thank you!