Cloud Cost Optimizer

Status — Concept. This is a design and architecture study. No public repository or live deployment exists yet. Everything below describes the intended approach; nothing here has been shipped.

One-line summary

A planned dashboard that walks an AWS account, ranks the top sources of waste by projected monthly cost, and exposes one-click corrective actions.

Problem

AWS environments accumulate forgotten EBS volumes, idle Elastic IPs, oversized RDS instances, and dev resources that never got cleaned up. The Cost Explorer console shows totals but doesn't rank waste in dollars-per-month or surface a clear next action. Engineers see "your bill went up 18%" and have to hunt.

Solution

A Next.js app that pulls metrics from AWS Cost Explorer and CloudWatch, applies a small set of opinionated heuristics, and presents each finding as a card with the resource, the projected savings if you act, and a button that runs the corrective action against AWS via signed server actions.

My role

If built, this would be a solo end-to-end project: architecture, AWS IAM scoping, the Next.js app, the Postgres schema, the Terraform module for the Lambda + scheduler, and the CI pipeline.

Tech stack

App — Next.js (App Router), TypeScript, Tailwind, shadcn/ui
Backend — Server actions, AWS SDK v3, PostgreSQL via Prisma
Infra — Terraform module for IAM role, Lambda scanner, EventBridge cron
CI/CD — GitHub Actions with typecheck, lint, build, and terraform plan
Auth — IAM role assumed via OIDC; the app holds no long-lived AWS keys

Architecture diagram

                    ┌─────────────────────────┐
                    │   GitHub Actions OIDC   │
                    └───────────┬─────────────┘
                                │ assume role
                                ▼
┌──────────────┐   nightly   ┌───────────────┐   metrics    ┌─────────────┐
│ EventBridge  ├────────────▶│ Lambda scan   ├─────────────▶│ Postgres DB │
└──────────────┘             └──────┬────────┘              └──────┬──────┘
                                    │                              │
                                    ▼                              ▼
                              ┌────────────┐                ┌────────────┐
                              │ AWS APIs   │                │ Next.js UI │
                              │ (CE, CW)   │                │ + actions  │
                              └────────────┘                └────────────┘

Key features (planned)

Rank every finding in projected dollars-per-month, not "high/medium/low".
One-click safe actions: stop instance, delete unattached volume, release EIP.
Per-account RBAC; sensitive actions require a second click and log the actor.
Diff view between scans to see what changed since last week.
All Terraform-managed — no ClickOps in the AWS console.

Anticipated challenges

IAM scoping. The Lambda role needs the smallest set of permissions that still covers Cost Explorer plus per-service metrics — usually multiple iterations to get right.
Cost Explorer rate limits. Will need jittered backoff and a 24h cache in Postgres so the savings aren't eaten by the API bill.
Action idempotency. Endpoints have to be safe to retry, since network blips are the norm at scale.

What I expect to learn

Terraform modules pay back the second time you spin up an environment.
Designing for dollars instead of "best practices" makes prioritization obvious and product-shaped.
OIDC plus short-lived role assumption beats stored AWS keys for both security and operational sanity.

Improvements planned beyond v1

Azure and GCP collectors behind the same finding interface.
Slack alerts when a single finding crosses a configurable savings threshold.
A read-only public demo seeded with synthetic data.