Courses/SAA-C03/Domain 2: Design Resilient Architectures
Practice questions →
AWSSAA-C03

Domain 2: Design Resilient Architectures

Topic 2 of 4 · Study notes

AWS Certified Solutions Architect – Associate (SAA-C03) — Domain 2: Design Resilient Architectures

Exam Code: SAA-C03  |  Level: Associate
Domain Weight: 26%  |  Total Domains: 4  |  Passing Score: 720/1000


Table of Contents

  1. Decoupling and Messaging Patterns
  2. Serverless and Event-Driven Architectures
  3. Load Balancing and API Management
  4. Highly Available and Fault-Tolerant Architecture
  5. Storage Durability and Resilience
  6. Resilience Patterns and Observability
  7. Exam Tips & Quick Reference

1. Decoupling and Messaging Patterns

Tight coupling means Service A calls Service B directly — if B is slow or down, A is blocked. Decoupling with queues or event buses breaks this dependency so each component scales and fails independently. This is one of the most tested architectural concepts on the SAA-C03.

1.1 Amazon SQS — Message Queuing

Standard vs. FIFO Queues

Feature Standard Queue FIFO Queue
Throughput Unlimited 300 TPS (3,000 with batching)
Ordering Best-effort; not guaranteed Strictly guaranteed FIFO
Delivery At-least-once (duplicates possible) Exactly-once processing
Naming Any name Must end in .fifo
Message Groups Not supported Yes — parallel processing within ordered groups

Key SQS Concepts

Concept Value / Behavior
Visibility Timeout Default 30 seconds; max 12 hours. Message hidden after receipt; reappears if not deleted in time.
Message Retention 1 minute to 14 days; default 4 days
Dead Letter Queue (DLQ) Receives messages that fail processing N times; used for debugging
Long Polling Consumer waits up to 20 seconds for messages; reduces empty responses and cost; always prefer over short polling
Delay Queue Delay delivery 0–900 seconds; useful for initial processing pause
Message Size Up to 256 KB per message

Exam Tip: SQS Visibility Timeout is a critical concept. If your consumer takes longer than the timeout to process, the message becomes visible again and another consumer may pick it up. Extend the timeout if processing is slow; don't rely on the 30-second default for complex workloads.

SQS Auto Scaling Integration

Custom CloudWatch Metric:
  ApproximateNumberOfMessagesVisible / NumberOfRunningInstances

Target Tracking Policy:
  Target = desired messages per instance (e.g., 100)
  
Result: ASG scales workers proportionally to queue backlog

SQS FIFO — Deduplication and Ordering

A MessageGroupId is required and determines ordering scope — messages with the same group ID are processed in strict FIFO order. Deduplication uses either content-based deduplication (SHA-256 hash of body, 5-minute window) or an explicit MessageDeduplicationId.


1.2 Amazon SNS — Pub/Sub Messaging

SNS is a push-based publish/subscribe service. Publishers send to a Topic; all subscribers receive the same message simultaneously.

SNS vs. SQS

Feature SQS SNS
Pattern Pull — consumers poll the queue Push — SNS pushes to all subscribers
Persistence Yes, up to 14 days No — fire-and-forget
Multiple Consumers No — each message goes to one consumer Yes — all subscribers receive the message
Ordering FIFO queues only Not guaranteed

Fan-Out Pattern (Frequently Tested)

Order Placed → SNS Topic
                    ├── SQS Queue A → Fulfillment Service
                    ├── SQS Queue B → Notification Service
                    └── SQS Queue C → Analytics Service

Each queue processes independently. A failure in one downstream service does not affect the others. This is the correct answer for "process one event in multiple systems simultaneously."

SNS Subscribers: SQS, Lambda, HTTP/HTTPS endpoint, email, SMS, mobile push notifications, Kinesis Data Firehose.


1.3 Amazon EventBridge

EventBridge is a serverless event bus for loosely coupled application integration. It is more powerful than SNS for complex routing scenarios.

Feature EventBridge SNS
Event Sources AWS services, SaaS apps, custom apps Publishers (AWS services or code)
Routing Complex pattern matching on event fields Topic subscription only
Schema Registry Yes No
SaaS Integration Yes (Zendesk, Datadog, etc.) No
Event Replay Yes (archive and replay) No
Scheduling Yes (Scheduler — replaces CloudWatch Events) No
Number of Targets 20+ target types ~10 protocols

Key Concept: EventBridge Pipes create direct point-to-point connections between a source (SQS, DynamoDB Stream, Kinesis) and a target with optional filtering and enrichment via Lambda. Use Pipes when you need to connect two services with minimal code.


1.4 Amazon MQ

Amazon MQ is a managed message broker for Apache ActiveMQ and RabbitMQ. Use it when migrating existing applications that use standard messaging protocols (AMQP, STOMP, MQTT, OpenWire, JMS) to avoid rewriting application code.

Exam Tip: For new applications, always choose SQS/SNS (more scalable, fully managed, AWS-native). Choose Amazon MQ only when the question mentions existing applications using standard broker protocols that cannot be refactored.


2. Serverless and Event-Driven Architectures

2.1 AWS Lambda — Deep Dive

Lambda Limits and Characteristics

Property Value
Maximum timeout 15 minutes per invocation
Memory range 128 MB – 10,240 MB (CPU scales proportionally)
Ephemeral storage (/tmp) Up to 10,240 MB
Concurrent executions 1,000 default per region (can request increase)
Deployment package 50 MB (zip) / 250 MB (unzipped)

Lambda Invocation Types

Type Triggered By Error Handling
Synchronous API Gateway, ALB, SDK, CLI Caller receives error; caller handles retries
Asynchronous S3 events, SNS, EventBridge Lambda retries 2× automatically; then routes to DLQ
Event Source Mapping SQS, DynamoDB Streams, Kinesis, MSK Lambda polls the source; batch processing

Lambda Concurrency Controls

  • Reserved concurrency — limits the maximum concurrent executions for one function; prevents it from consuming the entire account concurrency (throttles excess requests).
  • Provisioned concurrency — pre-initializes N function instances to eliminate cold starts; charged per hour provisioned; critical for latency-sensitive applications.

Key Concept: Cold starts occur when Lambda initializes a new execution environment for the first invocation. Mitigate with Provisioned Concurrency (any runtime) or Lambda SnapStart (Java — snapshots the initialized state for fast restore).

Lambda Best Practices

  • Move expensive initialization (DB connections, SDK clients) outside the handler function so it is reused across warm invocations.
  • Use Lambda Layers for shared libraries to reduce deployment package size.
  • Store configuration in environment variables or SSM Parameter Store.
  • When Lambda needs VPC resources (RDS, ElastiCache), attach it to the VPC — but add a NAT Gateway for internet access and VPC Endpoints for AWS service calls.

Lambda Destinations (Async Only)

Route the result of asynchronous invocations to a next step without polling:

  • On Success → SQS, SNS, EventBridge, or another Lambda
  • On Failure → SQS, SNS, EventBridge, or another Lambda

2.2 AWS Step Functions

Step Functions orchestrates multi-step workflows as state machines, coordinating Lambda functions and AWS services with built-in error handling, retries, and parallel execution.

Feature Standard Workflow Express Workflow
Max Duration 1 year 5 minutes
Execution Semantics Exactly-once At-least-once
Execution History Full audit in console CloudWatch Logs
Cost Per state transition Per execution duration + requests
Use For Order processing, long ETL, human approval workflows High-volume, short-duration event processing

2.3 Container Orchestration — ECS and EKS

ECS Launch Types

Feature EC2 Launch Type Fargate Launch Type
Who Manages EC2 You — patch, scale, manage AWS — fully managed
Pricing Per EC2 instance Per vCPU/memory/second
Use Case Steady workload, cost control, GPU Variable load, no management overhead

ECS Key Concepts

Concept Definition
Task Definition Blueprint: container image, CPU, memory, port mappings, environment variables, IAM task role
Task A running instance of a task definition
Service Maintains N running tasks; auto-restarts failed tasks; integrates with ALB
Cluster Logical grouping of tasks and services

ECS Networking Modes:

  • awsvpc (recommended) — each task gets its own ENI and security group; best isolation and control
  • bridge — shared host network; port mapped via NAT on host
  • host — task shares the host's ENI directly; maximum performance, minimal isolation

Exam Tip: ECS tasks have two IAM roles. The Execution Role allows ECS to pull the image from ECR and write logs. The Task Role gives your application code permissions to call AWS services. These are separate and both may be required.

Amazon EKS

Managed Kubernetes control plane. Worker nodes run on EC2 node groups or Fargate. More complex than ECS but portable (standard Kubernetes API) across cloud providers. Use when the team is already Kubernetes-native or when workloads must be portable.

Amazon ECR

Managed container registry integrated with IAM. Features: image scanning (basic on-push or enhanced/continuous via Inspector), lifecycle policies to automatically remove old images, and cross-region/cross-account replication.


3. Load Balancing and API Management

3.1 Elastic Load Balancing — Full Comparison

Feature ALB (Application) NLB (Network) GLB (Gateway)
OSI Layer Layer 7 (HTTP/HTTPS) Layer 4 (TCP/UDP/TLS) Layer 3 (IP)
Static IP No Yes — per AZ No
Content-Based Routing Path, host header, query string, HTTP header No No
WebSocket / gRPC Yes Yes No
TLS Termination Yes Yes No
HTTP → HTTPS Redirect Yes (built-in rule) No No
Millions Req/sec Yes Yes Yes
Preserve Client IP Via X-Forwarded-For header Yes (native) Yes
Sticky Sessions Yes (cookie-based) Yes No
Use Case Web apps, APIs, microservices Low-latency TCP, static IPs, extreme throughput Third-party security appliances

ALB Content-Based Routing

Listener Rules (evaluated top to bottom, first match wins):
├── IF path = /api/*       → forward to API target group
├── IF host = admin.co.com → forward to Admin target group
├── IF header X-Version=v2 → forward to V2 target group
├── IF query ?color=blue   → weighted: 80% Blue TG, 20% Green TG
└── DEFAULT                → forward to Main target group

Target Types: EC2 instances, IP addresses (including on-premises via Direct Connect), Lambda functions, or another ALB.


3.2 Amazon API Gateway

API Types

Type Best For
REST API Standard HTTP/REST; most features; response caching
HTTP API Lower latency and cost; OIDC and OAuth 2.0 support; simpler routing
WebSocket API Real-time bidirectional communication (chat, live dashboards)

Key Features

  • Throttling — default 10,000 req/sec per account; burst limit 5,000; configurable per stage and per method
  • Caching — cache responses at the API Gateway level; configurable TTL (default 300 seconds); reduces backend load
  • Usage Plans — throttle and quota per API key; used for API monetization and partner access tiers
  • Stages — separate environments (dev/staging/prod) each with independent settings, throttling, and logging

API Gateway Authorizers

Authorizer How It Works
Lambda Authorizer Custom auth logic in Lambda; returns an IAM policy allow/deny
Cognito User Pool Validates JWT tokens from a Cognito User Pool; no Lambda needed
IAM Authorization Requires AWS Signature V4 signing; for internal service-to-service calls

3.3 Caching Strategies

Amazon ElastiCache — Redis vs. Memcached

Feature Redis Memcached
Persistence Yes (RDB snapshots, AOF) No
Multi-AZ Failover Yes — automatic No
Pub/Sub Yes No
Data Structures Sorted sets, lists, hashes, geospatial Strings only
Cluster Mode (Sharding) Yes Yes (multi-threaded horizontal scaling)
Transactions Yes No

Choose Redis when you need persistence, HA, pub/sub, rich data structures, or leaderboards. Choose Memcached when you need simple, pure caching with multi-threaded performance and no durability requirements.

Caching Patterns

Pattern Behavior Best For
Cache-aside (Lazy Loading) App checks cache → miss → read DB → write to cache Read-heavy; acceptable stale data window
Write-through Write to cache AND DB simultaneously No stale data; accepts additional write latency
Write-behind (Write-back) Write to cache; async write to DB Highest write performance; risk of data loss on failure
TTL Cache entries expire after a set duration All patterns; balance freshness vs. DB load

Amazon DAX (DynamoDB Accelerator)

In-memory write-through cache for DynamoDB. Uses the same DynamoDB API — transparent to the application. Provides microsecond read latency for frequently accessed items. Does not help write-heavy workloads. Cluster size: 1–10 nodes, multi-AZ.


4. Highly Available and Fault-Tolerant Architecture

4.1 Multi-AZ and Multi-Region Design Patterns

Availability Zones

AZs are physically separate data centers within a Region with independent power, cooling, and networking. Design all production workloads to span at least 2 AZs (3 recommended for critical applications). Auto Scaling groups automatically rebalance instances across specified AZs.

Disaster Recovery Strategy Comparison

Strategy Description RTO RPO Cost
Backup & Restore Periodic backups copied to DR Region; restore on failure Hours Hours $
Pilot Light Minimal core infrastructure (DB) running in DR; compute off 10–60 min Minutes $$
Warm Standby Scaled-down running copy of full environment in DR Minutes Seconds $$$
Multi-Site Active/Active Full production capacity in both Regions; live traffic split Near-zero Near-zero $$$$

Key Concept: RPO is the maximum acceptable data loss (measured in time). RTO is the maximum acceptable downtime. Lower RPO and RTO = higher cost. The exam often presents a cost constraint and asks which DR strategy fits.


4.2 Database High Availability

RDS Multi-AZ vs. Read Replicas

Feature Multi-AZ Deployment Read Replica
Primary Purpose High availability and automatic failover Read scaling and DR
Replication Synchronous — zero data loss Asynchronous — potential lag
Readable No — standby is not accessible Yes — redirect read queries
Automatic Failover Yes — 1–2 minutes; DNS updates automatically No — manual promotion
Cross-Region No — standby is in same region only Yes — cross-region replicas supported

Exam Tip: Multi-AZ standby is NOT readable. If a question asks about offloading read traffic, the answer is Read Replicas. If a question asks about automatic failover or high availability, the answer is Multi-AZ. These are different features that can (and should) both be used together.

Amazon Aurora HA Architecture

Aurora stores 6 copies of data across 3 AZs automatically (2 copies per AZ). It can sustain writes with 4/6 copies and reads with 3/6 copies, and self-heals corrupted blocks via peer-to-peer replication.

Aurora Feature Detail
Read Replicas Up to 15; shared storage volume (no replication lag for reads)
Aurora Serverless v2 Scales from 0.5 to 128 ACUs; per-second billing; minimum 0.5 (not zero)
Aurora Global Database Cross-region replication; RPO < 1 second; RTO < 1 minute; up to 5 secondary Regions
Writer / Reader Endpoints Writer always points to primary; Reader load-balances across all replicas

Amazon DynamoDB High Availability Features

DynamoDB replicates data across 3 AZs by default — no configuration needed.

Feature Detail
Global Tables Multi-region, multi-master active-active replication; requires DynamoDB Streams enabled
On-Demand Capacity Auto-scales instantly; no capacity planning; higher cost per request
DynamoDB Streams Captures item-level changes (INSERT, MODIFY, REMOVE); triggers Lambda for event-driven processing

4.3 EC2 Auto Scaling — Full Reference

Scaling Policy Types

Policy Type Behavior Best For
Simple One alarm triggers one fixed action; cooldown period Basic scaling needs
Step Multiple thresholds; proportional response steps Graduated response to varying load levels
Target Tracking Maintain a specific metric value automatically Most use cases; easiest to configure
Scheduled Pre-defined capacity changes at specific times Predictable load patterns (business hours, batch windows)
Predictive ML-based forecast; provisions capacity proactively before demand Recurring, cyclical traffic patterns

Key Concept: Target Tracking is the simplest and recommended default. You specify a target metric value (e.g., 50% CPU) and AWS automatically adjusts capacity to maintain it. Predictive Scaling learns from 2 weeks of history and pre-warms capacity before demand spikes.

Auto Scaling Lifecycle Hooks

Lifecycle hooks allow custom actions during scale-out and scale-in events. The instance is paused in Pending:Wait (scale-out) or Terminating:Wait (scale-in) state.

  • EC2_INSTANCE_LAUNCHING hook — configure the instance before it enters service (install agents, run tests)
  • EC2_INSTANCE_TERMINATING hook — drain connections, copy logs to S3, or deregister from service discovery before termination

Launch Templates vs. Launch Configurations

Feature Launch Template Launch Configuration
Versioning Yes — multiple versions No — immutable
Spot + On-Demand Mix Yes No
Required for New Features Yes No
Recommendation Preferred Legacy; avoid for new ASGs

4.4 Route 53 for Availability

Routing Policy Reference

Policy Behavior Use For
Failover Route to secondary when health check on primary fails Active-passive failover
Latency-Based Route to Region with lowest measured latency Multi-region for global users
Weighted Split traffic by percentage A/B testing; gradual version migration
Geolocation Route based on user's geographic location Content localization; regulatory compliance
Geoproximity Route by location with adjustable bias Fine-tune traffic distribution
Multivalue Return multiple healthy IPs Simple load distribution — not an ELB replacement

Route 53 Health Checks

Route 53 health checkers are globally distributed. Supported check types: HTTP, HTTPS, TCP, and string matching (verify response body contains specific text). A Calculated health check combines multiple health checks with AND/OR logic. Use a CloudWatch alarm health check for resources that are not publicly accessible (e.g., internal ALBs).


5. Storage Durability and Resilience

5.1 S3 Durability, Storage Classes, and Replication

S3 Storage Class Durability and Availability

Storage Class Durability Availability AZs Notes
Standard 11 nines 99.99% ≥ 3 General purpose; most resilient
Standard-IA 11 nines 99.9% ≥ 3 Infrequent access; retrieval fee
One Zone-IA 11 nines 99.5% 1 Lower cost; risk if AZ fails
Glacier Instant 11 nines 99.9% ≥ 3 Archive; millisecond retrieval
Glacier Flexible 11 nines 99.99% ≥ 3 Archive; minutes–hours retrieval
Glacier Deep Archive 11 nines 99.99% ≥ 3 12–48 hour retrieval; lowest cost
Intelligent-Tiering 11 nines 99.9% ≥ 3 Auto-moves between access tiers

Exam Tip: All S3 storage classes share the same 11-nines (99.999999999%) durability except One Zone-IA, which has the same durability rating mathematically but will lose data if the single AZ is destroyed. Availability (uptime SLA) differs across classes.

S3 Replication

Both replication types require versioning enabled on both source and destination.

Feature CRR (Cross-Region) SRR (Same-Region)
Purpose DR, compliance, reduce latency for distant users Log aggregation, test/prod separation
Latency Near real-time (asynchronous) Near real-time (asynchronous)
Replicate Existing Objects No — only new objects after enabling (use S3 Batch Ops for existing) Same
Delete Marker Replication Optional (off by default) Optional

5.2 Block and File Storage Resilience

EBS Resilience

EBS volumes are AZ-specific — they exist in one AZ only. For resilience:

  • Take snapshots (stored in S3, multi-AZ) at regular intervals
  • Copy snapshots to another Region for cross-region DR
  • Use AWS Data Lifecycle Manager (DLM) to automate snapshot schedules and retention

EFS Resilience

Amazon EFS automatically replicates data across multiple AZs in a Region (Standard tier). Use EFS Replication to create a read-only replica in a different Region for DR.

FSx for Windows — Multi-AZ

FSx for Windows File Server supports a Multi-AZ deployment with automatic failover between file servers in separate AZs.


5.3 AWS Backup

AWS Backup provides centralized, policy-driven backup management across: EC2, EBS, RDS, Aurora, DynamoDB, EFS, FSx, Storage Gateway, and S3.

Feature Detail
Backup Plans Schedules, retention periods, and lifecycle transition rules
Cross-Account Backup Copy backups to another account for isolation from operational account
Cross-Region Backup Copy backups to another Region for DR compliance
Vault Lock (WORM) Immutable backup vault; compliance mode prevents deletion even by root

6. Resilience Patterns and Observability

6.1 Architectural Resilience Patterns

Queue-Based Load Leveling

Place an SQS queue between a fast producer and a slow consumer. The producer never waits; the consumer processes at its own pace. Queue depth provides a scaling signal.

[Fast Producer] → [SQS Queue] → [Consumer Workers]
                                     ↑
                             (Auto Scaling based on
                             ApproximateNumberOfMessagesVisible)

Circuit Breaker Pattern

Stop calling a failing downstream service to prevent cascading failures. When the error rate exceeds a threshold, open the circuit (fail fast). Periodically allow a test request through to detect recovery.

Retry with Exponential Backoff and Jitter

Do not retry immediately on failure — wait an exponentially increasing interval. Add jitter (random delay) to prevent a "thundering herd" where all retrying clients hit the service simultaneously. AWS SDKs implement this by default.

Bulkhead Pattern

Isolate workloads so a failure in one does not affect others. Use separate SQS queues per consumer type, separate Lambda functions per purpose, and separate ECS services per workload. Avoid monoliths that fail entirely.


6.2 Monitoring and Tracing

Amazon CloudWatch

Feature Purpose
Metrics Time-series data from AWS services and custom sources; 1-second to 1-day resolution
Logs Centralized log storage from EC2 (via agent), Lambda, ECS, API Gateway, VPC Flow Logs
Alarms Trigger on threshold or anomaly; actions include SNS notification, Auto Scaling, EC2 reboot
Dashboards Cross-service, cross-region metric visualization
Container Insights Enhanced metrics for ECS/EKS (CPU, memory, disk, network per task/pod)
Anomaly Detection ML model of expected metric range; alarm when actual deviates

AWS X-Ray

Distributed tracing for microservices and serverless applications. Traces requests end-to-end across Lambda, ECS, EC2, API Gateway, SQS, and DynamoDB. The Service Map provides a visual representation of all components and inter-service latency. Use annotations (indexed) for filtering traces and metadata (non-indexed) for additional context. Sampling controls the fraction of traces collected to manage cost.


Exam Tips & Quick Reference

Scenario-to-Answer Mapping

Scenario Keyword / Requirement Correct Answer
"Decouple components to handle traffic spikes" SQS queue between producer and consumer
"Process messages in strict order, exactly once" SQS FIFO queue
"One event → multiple systems process simultaneously" SNS topic → fan-out to multiple SQS queues
"Migrate from JMS/AMQP broker to AWS" Amazon MQ (protocol compatibility required)
"Serverless multi-step workflow with error handling" AWS Step Functions
"Auto Scaling based on SQS queue depth" Custom metric (QueueDepth/InServiceInstances) → Target Tracking
"Eliminate Lambda cold starts for critical function" Lambda Provisioned Concurrency
"DynamoDB microsecond read latency" DynamoDB DAX
"RDS high availability for production" RDS Multi-AZ deployment
"Offload read queries from RDS primary" RDS Read Replicas + reader endpoint
"Aurora cross-region DR with RPO < 1 second" Aurora Global Database
"DynamoDB active-active multi-region replication" DynamoDB Global Tables (requires Streams enabled)
"Gradual traffic shift to a new app version" Weighted routing in Route 53 OR ALB weighted target groups
"Windows EC2 needs shared file system" FSx for Windows File Server (SMB)
"Linux EC2 needs shared file system across AZs" Amazon EFS (NFS)
"Retain instance logs before Auto Scaling terminates" Lifecycle hook (Terminating:Wait) → copy logs to S3/CloudWatch
"Real-time distributed tracing across microservices" AWS X-Ray
"Detect metric anomalies automatically" CloudWatch Anomaly Detection
"Scheduled task without managing EC2" EventBridge Scheduler + Lambda
"Long-running workflow up to 1 year" Step Functions Standard Workflow
"High-volume short-duration event workflow" Step Functions Express Workflow
"Automatic failover when primary Region fails" Route 53 Failover routing + health checks
"Pilot Light DR; restore compute in minutes" Pilot Light pattern (AMIs + Launch Templates pre-configured)

Common Traps

  • Multi-AZ standby not readable: The RDS Multi-AZ standby replica cannot serve read traffic. If the question mentions read scaling, use Read Replicas, not Multi-AZ.
  • DynamoDB Global Tables require Streams: Always enable DynamoDB Streams before creating Global Tables — a frequently tested prerequisite.
  • Amazon MQ vs. SQS: Only choose MQ when the scenario explicitly mentions existing applications using standard protocols (JMS, AMQP, STOMP). New applications should use SQS/SNS.
  • Step Functions Express duration: Express Workflows max out at 5 minutes. Standard Workflows support up to 1 year. Confusing these two is a common exam mistake.
  • SNS does not persist messages: SNS is fire-and-forget. If subscribers are unavailable, messages are lost. Add an SQS queue as a subscriber to provide durability.
  • Lambda timeout in VPC: Lambda functions in a VPC cannot access the internet without a NAT Gateway and cannot reach AWS services without VPC Endpoints.
  • S3 One Zone-IA risk: The 11-nines durability figure is misleading — data will be lost if the single AZ fails. Do not use One Zone-IA for data that cannot be recreated.

Key Terms — Domain 2

Term One-Line Definition
Visibility Timeout Time an SQS message is hidden after receipt; reappears if not deleted within this window
DLQ (Dead Letter Queue) Receives SQS/SNS messages that fail processing after N attempts
Fan-Out Pattern SNS → multiple SQS queues; one event triggers multiple independent consumers
Provisioned Concurrency Pre-warmed Lambda environments; eliminates cold starts; charged per hour
Task Definition ECS blueprint defining container image, CPU, memory, ports, and IAM roles
Task Role IAM role granting the application code inside an ECS container access to AWS services
Target Tracking ASG policy that maintains a specific CloudWatch metric at a target value
Lifecycle Hook Pauses an EC2 instance during ASG launch or termination for custom automation
Multi-AZ RDS standby in a separate AZ; synchronous replication; automatic failover
Read Replica RDS/Aurora async copy; readable; offloads queries; must be manually promoted
Aurora Global Database Cross-region active-passive Aurora cluster; RPO < 1s; RTO < 1 min
DAX DynamoDB Accelerator; in-memory write-through cache; microsecond reads
Circuit Breaker Pattern that stops calling a failing service to prevent cascading failures
RPO Recovery Point Objective; maximum acceptable data loss measured in time
RTO Recovery Time Objective; maximum acceptable downtime before recovery
X-Ray AWS distributed tracing service; end-to-end request visibility across services

End of Domain 2. Continue to Domain 3: Design High-Performing Architectures →

Ready to test yourself?

Practice questions for this topic

Start Practicing →

SAA-C03 Topics

Topic 2 of 4