5,662 words · 29 min read
January 19 - January 25
Students encountered several problems while working through the assignment, I’ll summarize those problems shared through Ed.
There were three issues involving the credentials file.
Let’s discuss how to remediate these problems.
(1) can be fixed by testing the cloud assignment with more people ahead of time, to catch any typos. Often the person writing the assignment holds different assumptions than those working through it for the first time. In my case, I had my old credentials file lying around, which I re-used, and which didn’t contain quotes, rather than making a new one during testing.
(2) can be fixed by having our grader script load any file with “credentials” in the name. Postel’s law would serve us well: “be conservative in what you send, be liberal in what you accept.”
(3) can be detected by the grader script which can convert it from one encoding to another. Worse issues, like uploading a PDF, can be avoided by improving assignment instructions.
The AWS Free Tier changed on July 15, 2025, last summer after the previous course section ended. I will discuss the specific changes in a later section, here the relevant detail is that the EC2 instance types eligible for free tier billing changed from t2.micro and t3.micro to t3.micro, t3.small, t4g.micro, t4g.small, c7i-flex.large, m7i-flex.large.
My account was created before July 15, 2025, so I was on a different free tier than students in the current section. For this reason, I did not realize the free tier instances had changed, and I developed my grader script to verify the constraints of the old free tier. It took me several deployments and back and forth with the students to have the new grading script working.
This exposed several problems: (1) the free tier assumptions inherited from past assignments no longer hold, requiring us to reconsider our approach to developing assignments. (2) the develop-debug-deploy loop for Gradescope’s autograder is relatively slow (3) we don’t have an automated testing framework for our grader scripts, forcing manual testing and (4) Ed, while quite good, is not the ideal for back and forth debugging with students, I’m forced to ask for screenshots and for them to try things and get back to me.
Let’s momentarily discuss some steps we could take to address these problems.
(1) is a problem that has become an opportunity. The new free tier is much more flexible, and opens up new opportunities for assignments. Instead of the free tier dictating the architecture for our cloud assignments, our architectural choices have been freed, and now the free tier simply dictates the scale. More on this later.
(2) The develop-debug-deploy loop for the autograder script can be sped up. Gradescope allows for two deployment options: Manually uploading a zip file, our current aproach, which requires Gradescope to build a Docker image on our behalf based on its contents. Or we can register a link to an image in a container registry we control. We simply build and deploy our image, and the autograder will pull it on each container run. This gives us more flexibility with how we engineer our Docker container, and many more opportunities to improve the automation of and speed at which we can deploy our grader scripts.
(3) Our scripts are manually tested at the moment. Automated testing would improve our confidence in their results, and help prevent any regressions as we update and develop the assignments. Additionally, it would help future sections of the course as new teaching assistants inherit maintenance of these assignments. However, it is challenging to develop integration tests that includ third-party components, particularly infrastructure. I think we can take two approaches, of which I favor the latter: First, determine if its feasible to mock API call results to make sure our test assertions are appropriate, which requires us to fake what AWS API responses will look like. And second, we could develop ansible and terraform scripts that deploy solutions on our own infrastructure, figure out some way to trigger an autograder grader either locally or automatically, and perform an end-to-end integration test at development time, not deploy time.
(4) Ed is a great platform for asynchronous communication, but debugging with students is more effective synchronously, and when they can share what they’re looking at. Currently they’re restricted by how fast they can type and how many screenshots they can share. Perhaps we should consider an office hour for cloud assignment issues, or a weekly zoom meeting that students can drop into to share their screen and have a quick discussion.
One student had an issue Thursday (1/22) evening. When you register for an AWS account, they send a verification code to the provided email that has an expiration time of 10 minutes. That email was consistently delivered to the student’s inbox more than 10 minutes later, impeding them from verifying their account. The email verification step is required if students are to create their own AWS accounts. That night I sent my self an email from my personal email account to my purdue account and it took 4 hours to get delivered, and it was to my junk email folder, where I had to report it as non-spam. The next morning I sent another test email, and it was delivered within a minute. Unfortunately, this has exposed that we are at the mercy of Purdue’s e-mail infrastructure when students are setting up their AWS accounts and working on the cloud assignments.
A possible fix for this issue touches on a prevailing conversation about administering this course: to grant students regular IAM accounts registered under an instructional staff-controlled AWS account. Thus, students would receive a simple username and password login at the start of the course, without email verification, and on first login could be easily required to register a proper MFA device for subsequent logins. I will discuss course administration approaches later in this report.
In a private Ed post, a student posted a screenshot of their AWS credentials with me. Ed uploads files to their content delivery network (CDN), which does not perform authentication for performance reasons and access reasons. I informed them of the risk they are taken, removed the image from their message, and advised them to rotate their credentials. In future assignments I will emphasize the privileged nature of their AWS credentials, they risk they assume if they are irresponsible guarding them, common mistakes to avoid such as committing them to source control, how to evaluate whether or not to put certain data on cloud providers, and useful way to store and secure secrets for their personal projects.
Let’s take a look
| Week | Dates | Textbook | Topics | Assignments | Exams |
|---|---|---|---|---|---|
| 1 | 1/12-1/18 | Ch. 1,2 | The Motivations for Cloud, Elastic Computing and its advantages | ||
| 2 | 1/19-1/25 | Ch. 3 | Types of Clouds and Cloud Providers | CA0 (1/20-1/23) | |
| 3 | 1/26-2/1 | Ch. 4,5 | Data Center Infrastructure and Equipment, Virtual Machines | CA1 (1/27-2/6) | |
| 4 | 2/2-2/8 | Ch. 6 | Containers | ||
| 5 | 2/9-2/15 | Ch. 7 | Virtual Networks | CA2 (2/10-2/20) | |
| 6 | 2/16-2/22 | Ch. 8 | Virtual Storage | Midterm 1, Feb. 17 | |
| 7 | 2/23-3/1 | Ch. 9 | Automation | CA3 (2/24-3/6) | |
| 8 | 3/2-3/8 | Ch. 10 | Orchestration: Automated Replication and Parallelism | ||
| 9 | 3/9-3/15 | Ch. 11 | The MapReduce Paradigm | CA4 (3/10-3/20) | |
| 10 | 3/16-3/22 | Ch. 12 | Microservices | ||
| 12 | 3/23-3/29 | Ch. 13,14 | Controller-based Management Software, Serverless Computing and Event Processing | CA5 (3/24-4/3) | |
| 13 | 3/30-4/5 | Ch. 15 | DevOps | Midterm 2, Mar. 31 | |
| 14 | 4/6-4/12 | Ch. 16 | Edge Computing and IIoT | ||
| 15 | 4/13-4/19 | Ch. 17 | Cloud Security and Privacy | ||
| 16 | 4/20-4/26 | Ch. 18 | Controlling the Complexity of Cloud-Native Systems |
I’ll note there are 1-2 weeks of space to delay a cloud assignment. I expect to release Cloud Assignment 1 on January 27th, and Cloud Assignment 2 on February 10th.
Dan Marinescu’s Cloud Computing textbook is very good, I’m enjoying reading it so far. I’m through the first two chapters, and have read the two appendices, which are the most relevant to the cloud assignments for CS 351. I’ll discuss the appendices and several of their suggestions in this section.
Appendix A discusses possible cloud projects for students to complete. They are as follows:
I’ll break the suspense — I think “A cloud service for adaptive data streaming” or “A simulation study of machine-learning scalability” are possible choices for an assignment.
“A cloud service for adaptive data streaming” is a project to find the optimal architecture for adaptive data streaming problems. Consider adaptive audio streaming, which is a multiobjective optimization problem. From the text, “We wish to convert the highest quality audio file stored on the cloud to a resolution corresponding to the rate that can be sustained by the available bandwidth; at the same time, we wish to minimize the cost on the cloud site and also minimize the buffer requirements for the mobile device to accommodate the transmission jitter. Finally, we wish to reduce to a minimum the start-up time for the content delivery.” The performance of a solution depends on resource constraints: available CPU cycles, buffer space on the sender and receiver, and network bandwidth.
“A simulation study of machine-learning scalability” is based on work done control a video game, StarCraft, which will likely engage students who are fans of video games in general or the game itself. Graduate students were asked to build a convolutional neural network (CNN) to predict the computational effort required to build a deep neural network (DNN), and then (1) build a dataset by running a scenario 20,000 times, (2) train the model to predict a “best” action, then (3) rerun game scenarios using the new predicted best action.
I chose these assignments because they are relevant to current industry trends: streaming multimedia to clients, and training AI/ML algorithms. They are also easily benchmarked, allow us to evaluate student’s implementations by comparing their performance against each other. This produces a rank of solutions by performance, with which we can confer extra credit to the best performing solutions, incentivizing students to be creative and go above and beyond.
The textbook contains the cloud architecture they used to implement some of these projects projects, but the assignments would have to be adapted for the skill level of our students. Additionally, these projects are outside the scope of my expertise, and I would require support from other instructional staff.
Now to discuss the remaining assignments. Project 1 is a distributed computation where students would be expected to implement the algorithm. Its a simulation of a cloud architecture, not an implementation of the cloud architecture. It’s both outside the scope, and in a way trivial, for this course. Project 2 uses an algorithm to assess what nodes in a cluster or network are malicious. I believe it would use too many resources to implement, or require adaptation to multiple containers on a node, while not being relevant to most industry work at the moment. Project 3 is quite fun, but more an exercise in object-oriented simulation. Project 5 uses a third-party tool for electrical and computer engineers. Project 6 is a condensed matter physics simulation that is algorithmically focused, and whose cloud principles, that is choosing the right instance for the right job, is better expressed in another manner. Project 8 uses a considerable number of AWS services to develop a web application, whose architecture does not fit in the free tier, whose effort exceeds 10 days, and whose principles can be taught more simply. Project 9 is an IoT data-streaming application, which would require us to provide or simulate IoT devices to stream data to student servers.
Appendix B is an introduction to application development on AWS. It’s useful as a tutorial and exposes useful features, but suffers from some issues: (1) it’s specific to a single cloud vendor, (2) language examples are in C# and Java neither of which were used in the course last semester, and (3) the information is out of date.
My experience developing CA0, the issues students have had working on the assignment, and my curiosity with regard to student engagement with the cloud assignments, have spurred questions about our “instructional infrastructure”.
In my formulation, instructional infrastructure is the processes and rules used when producing content for the course, and the mechanisms that enable feedback and review of that work. Let’s consider cloud assignment 0. I took a series of steps to (1) develop the assignment, through consideration of current textbook content and discussion with the instructional staff, (2) test the assignment, by producing an autograder script and manually testing it, (3) publishing the assignment, by writing the assignment using a web framework and exporting the resulting web page as a PDF shared through brightspace, and (4) received feedback about the assignment, by observing Ed discussion, answering questions, and tracking student progress using Gradescope’s log of grading results.
I believe the most valuable practice we can adopt this class semester is a culture of assessment and reflection. We can improve our ability to assess student experience and engagement with assignments by collecting more and higher quality data, and we can tighten feedback looks with regard to that data through automation and clear processes. I will discuss some possible steps to enrich these factors.
Currently, we get information about student engagement with the assignment through Ed discussion, which is working fine. Using Gradescope submission statistics, we can see student progress on an assignment, but Gradescope only shares the data of their final submission attempt. Finally, we have the capability to inspect a student’s AWS account using the autograder, but only at the time that they perform an assignment submission, and we don’t take advantage of its full potential. We also do not get any signal from how they experiment with AWS between submissions.
Here I’ll propose some ideas to tackle problems (1) granular tracking of assignment progress and (2) evaluation of student AWS use.
Consider the autograder: Gradescope invokes a docker container on each student submission. The container contains our grading script, which runs arbitrary code, and is where we have defined the point values of assignment “questions” corresponding to particular assertions we make about the state of a student’s AWS account and derived resources, which add up to the students final grade. Now, to work towards (1), we can place monitoring code in the grading container that records the time of submission, a snapshot of the state in a student’s AWS account, and the point total of that particular submission, and store that information in a database under instructor control. We’ll then be able visualize and calculate statistics related to student engagement: submissions per assignment, speed of completion, activity over time, and much more. We can use this information to assess cloud assignments and learning outcomes in order to improve future assignments. It would require setting up cloud infrastructure under management of the instruction staff to collect, store, and display this data.
With regard to (2), we have inverted our control as instructional staff. Student’s grant us permission to review their solutions, on their terms. Thus, to get more information about a student’s use of their account, we must ask permission or use indirect means of access, currently, the “credentials” files uploaded during assignment submission (also note that this method places privileged student data — an access token with full administrative access to their personal AWS account tied to their credit card — on Gradescope’s servers, subject to their security controls).
Let’s consider how to address this problem. Our goal is two-fold, in fact, we are desperately trying to achieve them today, but our means are insufficient. Those two goals are (1) how do we control student resource use such that they do not incur charges for completing course assignments, and (2) how can we understand and enable student cloud use so we can assess and meet learning outcomes.
I suggest we create a class AWS account under the control of instructional staff. At the start of the course, we can create IAM users for each student based on their career account username, and assign a temporary password for their first login. After logging in, they can change their password and configure an MFA device, which we can enforce as account administrators. For each new cloud assignment, we can create a permission group that grants student access to only those resources needed to complete the assignment. This lets us control whether a student has permission to create an EC2 instance or a Lambda function, however, it is limited in that the security group will not limit the number of EC2 instances a student can spin up. So, we still have the problem of a student spinning up Bitcoin miners on our dime.
To address that particular issue, we can register an AWS EventBridge rule that triggers a Lambda function function every time an ec2 instance is launched. It would track student instance use in a DynamoDB cluster, and if a student goes over the alloted resource limit for a particular assignment, the function could restrict EC2 service access temporarily and kill any excess instances, notifying us if necessary. Events are not limited to EC2 instance spinups, we can actually monitor billing in real-time, letting us track course spend granularly during the semester.
By creating a method by which we are account administrators with full control over student resource access, we now have the appropriate permission level to monitor student resource usage holistically, and the data by which to evaluate that usage.
Two of our previous discussion topics require us to rephrase cloud assignment proposals in financial terms. The changes to AWS’s Free Tier limits total cloud usage for the course to under $100 per student (possibly $200), and the suggestion to bring students under an instructor-controlled account makes cloud assignments a departmental budget item. Here I’ll suggest a framework for how to price a cloud assignment.
A cloud assignment uses several AWS resources, is available during the time period defined by its release and due date, and is expected to be worked on by all students enrolled in the class. Using this information, we’ll define a simple formula to evaluate the cost of an EC2-based assignment.
(Assignment Length in days) * (EC2 Instance cost / hr) * (Number of instances used to complete assignment) * (Number of Students) = Maximum Total cost of assignment
We can use this formula to estimate funding for the class. The formula assumes the worst case of max utilization per student for the whole assignment period, aka, the “class of bitcoin miners” scenario. We don’t have the data for a more accurate estimate. We can now architect assignments from two directions: what cloud principles would we like them to learn and how many resources will they have access to learn those principles? The synthesis of these two approaches clearly defines possible system architectures students can implement.
Note: EC2 instance cost depends on instance type and chosen operating system. Amazon Linux is the most affordable option, Ubuntu demands a 4% premium, RHEL %28, SUSE 66%, and Windows a whopping 103%. These premiums can change depending on the instance type. An instance type’s base price is a ratio between allotted type of CPU chip, number of vCPU, GiB of memory, network bandwidth, block storage bandwidth, and hypervisor scheduling algorithm. AWS has different hypervisor scheduling formulas to assign hardware resources capacity to each virtual machines, the two available to free tier instances are “burst” and “flex”.
Let’s take as an example the traditional three-tier web application system architecture containing 1 proxy server, 3 application servers, and 1 database server. Allow me to digress, to develop more deeply the details needed to produce both an appropriate estimate of the cost of the cloud assignment, and an appropriate architecture reflecting real-world considerations.
An application server will need a general purpose CPU. The Proxy server should be memory and/or network optimized, it wants to cache a lot of stuff, hold connections open, and route requests quickly. The database needs an I/O optimized instance with plenty of block storage bandwidth, whose memory size will depend on the data its storing.
Let’s examine the properties of free tier instances:
| Type | CPU | Price/hr | vCPU | Memory | Network | Block Storage | Hypervisor Schedule |
|---|---|---|---|---|---|---|---|
| t3.micro | Intel Xeon Platinum 8000 1st/2nd Gen | $0.0104 | 2 | 1GiB | 5 Gbps | 2.8 Gbps | Burst |
| t3.small | Intel Xeon Platinum 8000 1st/2nd Gen | $0.0208 | 2 | 2 GiB | 5 Gbps | 2.8 Gbps | Burst |
| t4g.micro | AWS Graviton 2 | $0.0084 | 2 | 1 GiB | 5 Gbps | 2.8 Gbps | Burst |
| t4g.small | AWS Graviton 2 | $0.0168 | 2 | 2 GiB | 5 Gbps | 2.8 Gbps | Burst |
| c7i-flex.large | Intel Xeon 4th Gen | $0.08479 | 2 | 4Gib | 12.5Gbps | 10Gbps | Flex |
| m7i-flex.large | Intel Xeon 4th Gen | $0.09576 | 2 | 8GiB | 12.5Gbps | 10Gbps | Flex |
AWS Graviton 2 are proprietary chips (latest is gen. 5) and the t4g family is “burstable”. “Burstable” is AWS’s answer to the noisy neighbor and bin packing problems of virtual machines. VMs compete for resources — you can’t respond to an HTTP request on a cloud server if another VM is using the phsyical interface you’re requesting, i.e. the network interface card — so hypervisors control a virtual machine’s access to resources, AWS has a proprietary hypervisor “Nitro” they use for most of their EC2 instances.
t3 and t4g instances are heavily throttled by the hypervisor, they are only guaranteed a 10%-20% baseline performance per vCPU. A VM earns “CPU Credits” when idle, which are paid to the hypervisor to “burst” and get more resource access. Those resources could be CPU time, network bandwidth, or disk bandwidth. t3 instances are Intel Xeon Platinum chips, while t4g are AWS Graviton CPUs based on the ARM “Neoverse” design.
c7i and m7i are Intel Xeon chips, they has half the maximum cores of the ARM chips, but 4x more cache and a 25% faster clock rate. Databases perform better on these chips. m7i are “memory optimized” with a 4:1 ratio of memory to vCPU, twice as much as the c7 family. Both are “flex” scheduled, which guarantee a more generous 40% baseline performance.
Returning to our 3-tier architecture, the database server can use the c7i. Assignment datasets are small so the DB doesn’t need much ram, but it should be a non-burstable instance because it will serve multiple clients, we want it to be consistently fast. Flex is the next best free option. The proxy will use the m7i instance type, it will need to hold open a lot of concurrent connections and copy a lot of data from memory, the extra memory will directly contribute to scaling the service for more active users.
The application servers will be performing a variety of jobs, all with different execution patterns. Application servers are also often written in a single threaded language. Javascript runs serially using an event loop for concurrency. Python has a “Global Interpreter Lock” preventing parallelism. They cannot take advantage of more vCPU easily. Using affordable, general purpose instances with a low vCPU count that are easy to recreate and fail independently is a good strategy. The t3 and t4g instances types fit the bill, t3 are x86 and t4g are ARM.
Now we can plan out how much a cloud assignment will cost to implement a 3-tier architecture using three t4g instances, a c7i-flex.large, and a m7i-flex.large.
(77) * (3 * .0168 + .08479 + .09576) * (10 * 24) = $4267.956 ($55.428/student)
If we use t4g.micro instead, it’s $3802.26 ($49.38/student). Remember, this is at max utilization for the whole assignment period.
I hope this approach to pricing assignments is clear. In fact, I’ll suggest implementing this particular architecture as a good goal for cloud assignment 2. The AWS free tier change has caused short-term problems, but inspired long-term possibilities.
Currently covered chapters in the textbook are:
This assignment will incorporate elastic computing concepts, use technologies likely introduced in previous coursework (web servers, databases), and be an introduction to deploying an application on infrastructure-as-a-service. It is not too difficult, but is significant enough to merit a 2 week assignment period (10 days). It will address the same topics and use the same resourcs as Patryk’s past cloud assignment, but to my taste.
The previous Cloud assignment had students spin up an EC2 VM, create an init.sql file with a very simple schema and some example values, and load it into a sqlite db using a Dockerfile. It also had students create a http webserver using any approach, although nginx was recommended.
I like the idea of a student building a web application and “shipping” it on cloud infrastructure, but let’s have them deploy a realistic application.
Using the Django web framework and SQLite, students will implement a simple user interface powered by the Compiler Explorer API. It will have a text area where a user can type in Python code and click a button to display resulting Python bytecode. There will be another button to click, that will run the code in a container environment on their VM, and display the result. The students will learn how to use an application framework (Django+SQLite), and solve little web application problems along the way. This will introduce students to making a Software-as-a-Service application using an Infrastructure-as-a-Service provider.
The students will be graded on:
By the end of the assignment, they will have installed dependencies on the server, created an web application project, implemented application functionality using the framework, examined database query statistics, become familiar with REST principles, and have deployed an application to the cloud.
The benchmark score is meant to incentivize students to be creative and go above-and-beyond. We can load test their final application and create a leaderboard of every student. It should be a separate Gradescope assignment/submission, so if the load test crashes their VM, it doesn’t slow them down getting a 100% on every other section. The top 5 students can get extra credit, but only if they stand up in class and share the optimizations they used to get the highest score. For example, they could put a proxy server written in a compiled language to improve cache performance, or tune OS settings to improve network performance and CPU utilization. There is the issue of only “burst” and “flex” instances being available, so performance is highly dependent on the hypervisor scheduling algorithm, but this is meant to be fun. The autograder can check they are not using more than one instance of the allowed type to enforce fairness. I can release it on the Tuesday before the Friday it’s due, 7 days in, so I can have more time to implement the leaderboard, and so students that finish the assignment early get another task to learn from and challenge them.
The assignment should be completed on a single EC2 instance, so we as instructors can be confident the first same is firmly in the free tier.
Previously, particular services had time-based restrictions on usage, as well as by usage type, for example, “750hrs of t3.small per month” for EC2, which lasted for 12 months. Now, new accounts receive $100 in credits (rather than time) that can only be spent on certain resources and within particular usage limits. These credits last for a shorter time than before, 6 months, and only by using up the $100 credits will your account be credited an additional $100 before the free tier ends. This is AWS’s way to encourage people to experiment with a variety of services shortly after creating their account.
Accounts created before the free tier changes remain on the old plan.
Free tier is limited to instance types: t3.micro t3.small t4g.micro t4g.small c7i-flex.large m7i-flex.large
Free tier use is capped at 30 GB of storage, 2 million I/Os, and 1 GB of snapshot storage with Amazon Elastic Block Store (EBS).
Free tier choices are db.t3.micro and db.t4g.micro instances and 4 engines: MySQL, PostgreSQL, MariaDB, and Microsoft SQL Server
Capped at 1,000,000 free requests per month. Up to 400,000 GB-seconds or 3.2 million seconds of compute time per month
No usage limits on the free tier.
These are AI focused services also in the free tier.
There are many additional services with a free tier allotment, but they are not commonly used.
I will propose the n-tier architecture project for cloud assignment two. The textbook chapters covered during assignment 2’s release period are:
We can benchmark students implementations by performing a load test. We can choose a problem such that we submit data to the student cluster they have to process, and then we can query that data somehow.
Will be about automation and orchestration. Should introduce Ansible, Terraform, and Kubernetes.
Professor Adams also wants it to be about networking and storage too. I could do a Kubernetes dive into these things. Also emphasize firewall settings (See Table B.1 in Dan Marinescu’s Cloud Computing, Appendix B). From Appendix A, project “A.4 A cloud service for adaptive data streaming” could be a good one here, or assignment 2. ████ and ████ could also work on implementing some of these algorithms while I work on the surrounding infrastructure and create the write up. They could do “A.7 A simulation study of machine-learning scalability” together.
Probably something to do with the RCAC visit.
Should be the FaaS project and implementation
Goal: Students will develop their own SaaS on top of IaaS, becoming familiar with EC2 virtual machines.
Application: Detect a bird in a photo.
Constraints: 1 t4g.small EC2 VM. No other AWS services.
Total Possible Cost: Using our above formula, the Max Cost per student for this assignment is (10 * 24) * (0.0168) = $4.032.
The assignment was designed to meet four goals.
Inspired by the XKCD comic “Tasks”, I wanted to demonstrate to students that what we’re able to do today was unthinkable 10 years ago. We can run a complicated image recognition task that takes a natural language input and transforms that not just into a classification of an image but the generation of a bounding box and a coloring of the object shape itself. All of this on the cheapest available virtual machine on AWS, and in less than a second. Unbelievable.
I also wanted students to have experience with a realistic and common type of application they will encounter in industry. Having them deploy one of the most commonly used web frameworks using the basic tools you would find on a POSIX-compatible system introduces struggle and manual grunt work, the recent memory of which will motivates the adoption of cloud-focused tooling like containers and automation. For example, struggling to install Python helps you realize how useful it is to be able to reference a specific version of Python in a Dockerfile.
Another point is that a realistic application can be scaled according to systematic, non-contrived principles. Slightly modifying the requirements of the application can necessitate major modifications to the system architecture. The bird image recognition application can be taken different directions with regard to scale and requirements without compromising the conceptual integrity of the service. This allows students to have a stable reference point for system design decisions, and recognize more easily the subtleties of change and its implications. For example, if students were asked to implement a history feature, we would need sufficient storage space for uploaded images which motivates the introduction of S3. Or if they were asked to scale the service to accommodate more load, we could introduce a N-tier web architecture with multiple EC2 instaces to serve as the proxy, application, and database servers.
Finally, image recognition is a ethically charged technology. Today, human rights abuses are enabled by malicious application of face detection and other image-based machine learning algorithms. Discrimination and bias can easily intrude on in naive image recognition applications. We’ve started with an innocuous software service, bird recognition, but we could raise the stakes easily by having students strip geolocation data from EXIF metadata embedded in uploaded images, bringing up questions of user privacy. Or the “Bird.ai” startup introduced in Assignment 1 could raise VC-funding contingent on implementing a feature where the background of a bird photo is analyzed to mine features for another ML algorithms, raising questions data sovereignty. Students today are the future professionals that must grapple with these technologically driven issues.
I hope this assignment serves a good foundation for following course assignments, and may provide interesting ideas for the development of coursework in future sections of the course.