Orphaned - The Death of Your AWS Budget
The Hidden Tax: Finding Cloud Waste in Federal Infrastructure
I’m Pursuing the AWS Solutions Architect Associate Certification
I’m currently working through the AWS Certified Solutions Architect – Associate (SAA-C03) curriculum. I’m about one third through with Stephane Maarek’s Udemy course and pretty much immediately I can see the value for federal IT environments. It’s eye opening.
Federal Cloud Is Here and AWS is Huge
Throughout the past decade, I’ve worked with mostly on-premises infrastructure. But the landscape has shifted. Cloud computing in the U.S. federal government is not only the norm, it’s almost considered a requirement. Even if you find savings or efficiency under the on-prem model, the policy pressure to go cloud is very real. A series of federal directives, from earlier Cloud First and Cloud Smart policies to recent OMB mandates, agencies are pushing to modernize legacy systems and migrate workloads to cloud environments (Don’t get me started on how they define legacy). For most agencies, this means AWS GovCloud or Microsoft Azure Government. Both platforms are tailored to address the compliance and security requirements of the public sector.
IT means operating in the cloud and not just reading about it, but working in it at the program management and architecture level, and with AI ramping up, it’s expected that program level managers even get into the weeds.
Finding Baked in Waste
Just a couple of weeks into the course, one thing becoming clear to me. The most immediate, practical value a solutions architect brings isn’t necessarily designing new infrastructure from scratch. It’s in auditing what already exists and identifying inefficiency. It’s everywhere!
A few concrete examples of the kinds of waste that are hiding in plain sight. You don’t have to know the specific terms to understand the waste.
Amazon EC2 offers instance types optimized for different use cases (varying combinations of CPU, memory, storage, and networking capacity). Right-sizing is the process of matching instance types and sizes to actual workload requirements. Organizations tend to provision instances for peak theoretical load and never revisit them. Right-sizing must become an ongoing process, not a one-time exercise.
Consider an agency running a document processing application on an m5.4xlarge EC2 instance with 16 vCPUs, 64 GB of RAM, provisioned during initial deployment. Fast forward two years: AWS Cost Explorer shows average CPU utilization hovering around 8% and memory utilization rarely exceeding 12%. Nobody revisited the instance type after go-live because the application was “working.” More than likely, the stakeholders want the extra bandwith for a ‘just-in-case’ scenario. This is very common and usually the wrong move.
That single instance runs roughly $560/month on-demand. A right-sized m5.xlarge — 4 vCPUs, 16 GB RAM — would handle that actual workload at around $140/month. That’s $420 saved per month on one instance. Now multiply that across 20, 50, or 100 instances and you’re looking at hundreds of thousands of dollars in waste. All without design changes, just lowering the resources.
Multi-AZ deployments (replicates data across multiple locations) and read replicas come at a cost. Running in multiple zones cost more, but may be required for production high-availability workloads. Some workloads can remain in a single zone if they are non-critical. How does the government define critical is the real question. Getting the stakeholders agreement that your application is not critical is the hard part. Defining the application as non-critical alone would save the costs without any other work involved - it’s free!
S3 Lifecycle policies allow you to transition infrequently accessed data to cheaper storage tiers automatically. If data is only retrieved once or twice a year, storing it in S3 Standard is throwing money into the trash. S3 Glacier or S3 Intelligent-Tiering would be more appropriate. Amazon S3 Storage Lens can identify cost optimization opportunities, and S3 Intelligent-Tiering can automate data lifecycle management. Define infrequent with your stakeholders. I bet they provision based on what-if and just-in-case scenarios. This is bad practice.
The underlying theme here is that it’s the human in the loop causing the costs to be high. Redefine what is critical. Stop worrying about edge cases. Be realistic about your needs. Behavior changes are the hardest changes to make in an organization, but they cost absolutely nothing, and can save thousands.
AI and ML Workloads
There’s another reason that makes this skillset increasingly important. AI and machine learning workloads are exploding within the government. This is driving demand for GPU compute, large-scale data pipelines, and complex storage architectures. What you need is the ability to evaluate the full infrastructure holistic view. Knowing where the data lives, how it moves, how frequently it’s accessed, what the compute pattern looks like, and whether the architecture actually matches the business need is a valuable skill to have. Don’t forget, convincing the decision makers will still be the hardest part. This soft skill can’t be underappreciated.
Understanding access patterns, retrieval frequency, data pipelines, and cost tradeoffs at the infrastructure level is where program managers with technical depth can differentiate themselves.
Where I Am and Where This Is Going
I’m inching towards the halfway point with Stephane Maarek’s SAA-C03 course and working hands-on in a live AWS environment. Certifications alone are worth absolutely nothing. I could drill TutorialsDojo practice exams, pass the SAA-C03, and still be useless in a real cloud environment. The credential is a door opener, something to compliment my other skills, nothing more. Hands-on projects, documented cost savings, architectural decisions is what you want in your portfolio.
That said, I’m aware this post stays at the conceptual level. I plan to go much deeper with the posts soon. I’m debating creating some hands-on deep dives that get into the granular mechanics of cloud cost optimization: how to actually find the waste, what the findings look like in the console (love the command line!), and what specific remediation steps produce real savings. Until then, there are plenty of other great resources for those looking to up skill.
Thank you for visiting my Substack at newsletter.markgingrass.com.
