
Many teams ask for an “AWS startup architecture” that ships quickly, stays secure, and won’t buckle at Series A. The challenge is an unclear baseline stack: too little and reliability suffers; too much and delivery slows. This blueprint offers a pragmatic 10‑service foundation using AWS managed services, so small teams can focus on product while retaining a clean path to scale.
The 10‑Service Stack (What and Why)
1- VPC (Networking foundation)
- Isolated network with public/private subnets, NAT Gateways, and VPC Endpoints for private access to AWS services.
- Why: Security perimeter, controlled egress, and future-proofing for multi‑AZ/region.
2- Application Load Balancer (ALB)
- HTTP/HTTPS load balancing with path/host routing, WAF integration, and auth offload.
- Why: Scalable entry point for web/mobile/API traffic with zero downtime deployments.
3- Compute: ECS Fargate or EKS (containers) or Lambda (serverless)
- ECS Fargate: simplest managed containers; no nodes to manage.
- EKS: Kubernetes when you need custom controllers, multi‑tenant platform engineering, or portability.
- Lambda: event-driven/serverless for spiky or low‑ops workloads.
- Why: Fit compute to team skills and workload patterns; start simple.
4- Data Store: RDS (Aurora/Postgres/MySQL) or DynamoDB
- RDS: ACID transactions, SQL, joins; great for typical SaaS CRUD.
- DynamoDB: serverless, low-latency key‑value at scale with known access patterns.
- Why: Choose by access patterns and SLOs; add caching later.
5- S3 (Object storage)
- Static assets, user uploads, logs, backups, and data lake foundation.
- Why: Cheap, durable, lifecycle tiers; decouple hot path from analytics.
6- CloudFront (CDN)
- Global edge caching for static assets and APIs, signed URLs for private content.
- Why: Lower latency and egress; reduces origin load and cost.
7- Observability: CloudWatch (baseline) and/or OpenSearch (search/log analytics)
- CloudWatch for metrics, logs, alarms, synthetic checks; OpenSearch if you need log search and relevance queries.
- Why: See problems early; searchable logs speed diagnosis.
8- Secrets Manager
- Centralized secret rotation, least‑privilege access via IAM, audit trails.
- Why: Eliminate hardcoded secrets; enable safe key rotation.
9- IAM (Identity and access management)
- Roles with least privilege, service-specific policies, SSO for humans, and access analyzer.
- Why: Security posture is built, not bolted on.
10- AWS Backup (and/or native backups)
- Central policies for RDS snapshots, DynamoDB PITR, EBS, and S3 backups (with lifecycle).
- Why: Compliance, disaster recovery, and no‑drama restores.
Reference Architecture (Narrative Diagram)
- Edge: Route 53 → CloudFront (TLS) → ALB (WAF optional)
- App Tier:
- Option A: ECS Fargate services across 2–3 AZs
- Option B: EKS with managed node groups (later Fargate profiles)
- Option C: Lambda behind ALB or API Gateway
- Data Tier:
- RDS Aurora (multi‑AZ, read replica) or DynamoDB (on‑demand → provisioned)
- S3 for assets, logs, and lake (Athena/Glue optional)
- Observability & Ops: CloudWatch metrics/logs/alarms, OpenSearch for log search (optional), X-Ray for tracing
- Security & Secrets: IAM roles, Secrets Manager, KMS keys, Security Groups, VPC Endpoints
- Backup & DR: AWS Backup policies, cross‑region snapshot copy (critical data)
Choosing Your Compute: ECS, EKS, or Lambda
- Choose ECS Fargate if: small team, containerized app, low ops tolerance, steady web/API workloads.
- Choose Lambda if: event-driven, bursty traffic, or you want minimal ops with pay‑per‑use; watch cold starts and limits.
- Choose EKS if: platform team exists, need advanced scheduling/controllers, or multi‑tenant clusters with strong isolation patterns.
Tip: Start with ECS or Lambda; adopt EKS when you have platform engineering capacity.
Choosing Your Data Store: RDS vs DynamoDB
- Pick RDS when: you need SQL, joins, transactions, and flexible queries. Start with Aurora Serverless v2 for spiky loads.
- Pick DynamoDB when: access patterns are key‑based with strict latency targets at scale; design keys/GSIs upfront.
- Common pattern: RDS as source of truth + ElastiCache for hot reads; S3 + Athena for analytics. Add OpenSearch for text search.
Security and Compliance by Default
- Network: Private subnets for app/data, security groups least‑open, NACLs sparingly.
- Identity: Principle of least privilege IAM roles; use IAM Identity Center (SSO) for humans.
- Data protection: Encrypt at rest (KMS) and in transit (TLS), signed URLs for private S3 content.
- Monitoring: GuardDuty, CloudTrail, Config rules, Security Hub for posture.
- Secrets: Store in Secrets Manager; rotate keys; remove long‑lived access keys.
- Backups/DR: PITR for DynamoDB, automated RDS snapshots, periodic restore tests.
Cost‑Aware Defaults (FinOps from Day One)
- Compute: Right‑size tasks/functions; scale to SLOs; consider Graviton; mix Savings Plans for baseline + Spot for burst.
- Network: Use VPC Endpoints to reduce NAT costs; cache via CloudFront; keep chatty services in the same AZ.
- Storage: S3 lifecycle rules to IA/Glacier; right‑size RDS IOPS; smart GSIs in DynamoDB.
- Observability: Sample traces; set log retention by environment; avoid noisy metrics.
- Governance: Enforce tags (owner, env, cost‑center), budgets, and anomaly detection alerts.
Observability That Tells a Story
- Metrics: RED/USE patterns, custom business metrics (signups, checkouts).
- Logs: Structured logs with request IDs; centralize in CloudWatch Logs; ship to OpenSearch if search is needed.
- Traces: X‑Ray or OpenTelemetry across services for root cause clarity.
- SLOs: Start with latency, error rate, availability; manage with error budgets.
IaC and Automation
- Use Terraform/CDK/CloudFormation to define the stack; store state securely; PR‑based changes.
- Pipelines: CI/CD with blue/green or canary deployments via CodeDeploy/Spinnaker/GitHub Actions + ECS/EKS/Lambda integrations.
- Golden paths: Service templates with logging, metrics, tracing, health checks, and tagging baked in.
2‑Week Implementation Plan (MVP to Production‑Ready)
Week 1
- Day 1–2: Provision VPC, subnets, gateways, endpoints; set baseline IAM policies and SSO.
- Day 3–4: Stand up ALB + ECS Fargate service (or Lambda) for a “hello world” API.
- Day 5: Create RDS (or DynamoDB) with encryption, backups, and Secrets Manager integration.
Week 2
- Day 6–7: S3 buckets (assets, logs), CloudFront distribution with TLS and caching rules.
- Day 8: Observability, CloudWatch dashboards/alarms, X‑Ray; optional OpenSearch for logs.
- Day 9: AWS Backup policies; cross‑region snapshot copy for critical data.
- Day 10: CI/CD pipeline with blue/green; load test; run a DR restore drill.
Deliverables: IaC repos, runbooks (deploy, rollback, incident), tagging standards, budget alerts, and an SLO dashboard.
Migration and Growth Path (Seed → Series A)
- Add environments: Staging with prod‑like topology; separate accounts via AWS Organizations.
- Horizontal scale: More ECS services, autoscaling policies, read replicas (RDS) or provisioned capacity (DynamoDB).
- Specialized needs:
- Analytics: S3 + Glue + Athena or Redshift.
- Search: OpenSearch for full‑text and relevance.
- Eventing: EventBridge/SNS/SQS for decoupled workflows.
- Security maturity: Private API endpoints, WAF, rate limits, detective controls, and periodic pen tests.
Common Pitfalls (And How to Avoid Them)
- Over‑engineering early: Don’t start with EKS unless you must; ECS/Lambda usually suffice.
- Skipping tagging/governance: Leads to bill surprises; enforce tags and budgets now.
- Mixing analytics with OLTP: Move heavy queries to S3+Athena/Redshift to protect app performance.
- Ignoring backups/DR tests: Snapshots aren’t a plan, verify restores quarterly.
- YAML sprawl: Use templates and modules; keep golden paths simple.
FAQs
RDS or DynamoDB for SaaS?
Start with RDS if you need joins and SQL; choose DynamoDB for predictable low‑latency key‑value at scale with well‑designed access patterns.
Can we mix ECS and Lambda?
Yes. Use Lambda for event-driven tasks and ECS for long-lived services.
When should we adopt EKS?
When you have platform engineering bandwidth and Kubernetes‑specific needs.