Deployment, Provisioning, and Automation
Topic 3 of 5 · Study notes
AWS Certified CloudOps Engineer - Associate (SOA-C03) — Domain 3: Deployment, Provisioning and Automation
Exam Code: SOA-C03 | Level: Associate
Domain Weight: 18% | Total Domains: 6 | Passing Score: 720/1000
Table of Contents
- AWS CloudFormation — Deep Dive
- AWS Systems Manager — Complete Reference
- EC2 Instances — Provisioning and Purchasing
- Containers and Serverless
- Messaging and Decoupling
- AWS Organizations — Account Governance
- Exam Tips & Quick Reference
1. AWS CloudFormation — Deep Dive
CloudFormation is the primary infrastructure-as-code service for AWS. The exam tests template structure, dependency management, update controls, and multi-account deployment via StackSets.
1.1 Template Structure
AWSTemplateFormatVersion: "2010-09-09"
Description: "Template description"
Parameters: # Inputs at deploy time — enables reuse across environments
Mappings: # Lookup tables (e.g., region → AMI ID)
Conditions: # Conditionally create resources
Transform: # SAM/macros transformations
Resources: # REQUIRED — all AWS resources defined here
Outputs: # Values to export for cross-stack references or display
Parameters — Enabling Template Reuse
Parameters allow the same template to deploy across dev/staging/prod with different values:
Parameters:
EnvironmentType:
Type: String
AllowedValues: [dev, staging, prod]
Default: dev
InstanceType:
Type: String
Default: t3.micro
AllowedValues: [t3.micro, t3.small, t3.medium, m5.large]
Exam Tip: "Write a single CloudFormation template that can be reused for multiple environments" → use Parameters — not nested stacks, not user data, not stack policies.
Mappings — Region-Specific Values
AMI IDs are region-specific — the same AMI ID does not exist in both us-east-1 and eu-west-1. Use Mappings with !FindInMap to resolve the correct AMI per region:
Mappings:
RegionMap:
us-east-1:
AMI: ami-0abc123
eu-west-1:
AMI: ami-0def456
Resources:
WebServer:
Type: AWS::EC2::Instance
Properties:
ImageId: !FindInMap [RegionMap, !Ref "AWS::Region", AMI]
Exam Tip: "CloudFormation template works in us-east-1 but fails in eu-west-1 with 'AMI does not exist'" → add AMI IDs to the Mappings section per region, and run
aws ec2 copy-imageto copy the AMI to the new region first.
1.2 Resource Dependencies
DependsOn Attribute
Use DependsOn when CloudFormation cannot infer the dependency automatically — for example, an EC2 instance that connects to RDS on startup must wait for the DB to be created:
WebServer:
Type: AWS::EC2::Instance
DependsOn: DatabaseInstance
Properties: ...
Note: Changing the order of resources in the template does NOT affect creation order. CloudFormation builds a dependency graph; order in the template file is irrelevant. You must use
DependsOnexplicitly.
CreationPolicy — Waiting for Bootstrap Completion
For Auto Scaling groups and EC2 instances that run initialization scripts, use CreationPolicy to wait for cfn-signal before marking the resource as complete:
WebServerGroup:
Type: AWS::AutoScaling::AutoScalingGroup
CreationPolicy:
ResourceSignal:
Count: 3
Timeout: PT4H # ISO 8601 duration
Properties:
MinSize: "3"
MaxSize: "6"
The user data script must call cfn-signal at the end:
#!/bin/bash
yum install -y httpd
# ... configure application ...
/opt/aws/bin/cfn-signal -e $? \
--stack ${AWS::StackName} \
--resource WebServerGroup \
--region ${AWS::Region}
Exam Tip: "Stack creation fails because Auto Scaling group wait condition is not receiving required signals" → run
cfn-signalat the completion of the user data script. Ifcfn-signalis not called, CloudFormation waits until timeout then rolls back.
1.3 Stack Update Policies
Stack Policies — Preventing Accidental Updates
A stack policy is a JSON document applied to a stack that controls which resources can be updated:
{
"Statement": [
{ "Effect": "Allow", "Principal": "*", "Action": "Update:*", "Resource": "*" },
{ "Effect": "Deny", "Principal": "*", "Action": "Update:*", "Resource": "LogicalResourceId/ProductionDatabase" }
]
}
Note: Stack policies protect specific resources within a stack. IAM policies protect actions on CloudFormation as a service. They serve different purposes.
AutoScalingRollingUpdate Policy
Controls how instances in an Auto Scaling group are replaced during stack updates:
WebServerGroup:
Type: AWS::AutoScaling::AutoScalingGroup
UpdatePolicy:
AutoScalingRollingUpdate:
MaxBatchSize: 2
MinInstancesInService: 3
PauseTime: PT5M
WaitOnResourceSignals: true
AutoScalingReplacingUpdate creates a completely new Auto Scaling group and switches traffic after it is healthy — full capacity maintained, fast rollback, but double capacity cost temporarily.
1.4 Resource Management
Deletion Policies
Control what happens to resources when a stack is deleted:
| Policy | Behavior |
|---|---|
| Delete (default) | Resource is deleted with the stack |
| Retain | Resource persists after stack deletion |
| Snapshot | Takes a final snapshot before deleting (RDS, EBS, ElastiCache) |
Exam Tip: "Delete CloudFormation stack without deleting the DynamoDB table" → add
DeletionPolicy: Retainto the DynamoDB resource.
DELETE_FAILED Stack State
Stack gets stuck in DELETE_FAILED when a resource cannot be deleted. Common causes: S3 bucket contains objects, or a security group has dependencies from other resources.
Fix: delete the stack again but specify resources to retain:
aws cloudformation delete-stack \
--stack-name my-stack \
--retain-resources ProblematicSecurityGroup S3BucketWithData
Dynamic References to Secrets
Instead of hardcoding credentials, use dynamic references:
Database:
Type: AWS::RDS::DBInstance
Properties:
MasterUserPassword: "{{resolve:secretsmanager:MyDBSecret:SecretString:password}}"
# Or for SSM Parameter Store:
# SomeParameter: "{{resolve:ssm:/my/parameter/name}}"
# Or for SSM SecureString:
# SomeSecret: "{{resolve:ssm-secure:/my/secret/name}}"
1.5 StackSets
StackSets deploy the same CloudFormation template across multiple accounts and regions from a single management account.
Management Account
│
▼
StackSets ──► Account A (us-east-1, eu-west-1)
Account B (us-east-1, eu-west-1)
Account C (us-east-1, eu-west-1)
Exam Tip: "Enable AWS Config in all accounts of the organization and in all AWS Regions in the most operationally efficient way" → CloudFormation StackSets from management account with
SERVICE_MANAGEDpermission model.SERVICE_MANAGEDuses the Organizations service role, so new accounts added to the OU automatically get the StackSet applied.
When a StackSet stack instance shows OUTDATED, common causes are: the template is creating a global resource with a name collision across regions, the service is not available in the target region, or there is a permission issue in the target account.
1.6 Nested Stacks
Nested stacks reuse common template components across multiple parent templates. The parent passes parameters to child stacks and uses !GetAtt to read child stack outputs.
| Feature | Nested Stacks | StackSets |
|---|---|---|
| Purpose | Reuse components within one account/region | Deploy same template across multiple accounts/regions |
| Scope | Single account, single region | Multi-account, multi-region |
| Use case | Modular templates | Org-wide governance |
2. AWS Systems Manager — Complete Reference
2.1 Session Manager
Session Manager provides browser-based or CLI interactive shell access to EC2 instances and on-premises servers with no SSH port (22) required, no SSH key pairs, and no bastion hosts. All session activity is logged to CloudWatch Logs or S3.
Prerequisites: SSM Agent installed and running, IAM instance profile with AmazonSSMManagedInstanceCore, outbound HTTPS (443) from instance to SSM endpoints, and for private subnet instances without NAT: VPC interface endpoints for SSM.
Controlling Access by Instance Tags
Use IAM policy conditions on tags to restrict which instances a user can access:
{
"Action": "ssm:StartSession",
"Resource": "*",
"Condition": {
"StringEquals": { "ssm:resourceTag/Environment": "Development" }
}
}
2.2 SSM Run Command
Run Command executes commands or scripts on many instances simultaneously without SSH, targeting by instance ID, tag, resource group, or all managed instances. Documents include AWS-RunShellScript (Linux) and AWS-RunPowerShellScript (Windows). Output is stored in S3 or CloudWatch Logs.
Run Command vs. State Manager
| Feature | Run Command | State Manager |
|---|---|---|
| Execution | One-time, on-demand | Continuous, on schedule |
| Purpose | Ad-hoc task execution | Enforce desired state |
| Good for | Patching, security response, one-off tasks | Ensure consistent config, agent health |
2.3 SSM Patch Manager
Patch Manager Architecture
Patch Baseline → defines which patches to approve/reject
Patch Group → set of instances (tagged with "Patch Group: <name>")
Maintenance Window → when patching occurs (days, times, duration)
Run Command → executes patching using AWS-RunPatchBaseline document
Instances are assigned to patch groups via the tag key Patch Group (exactly this spelling, including the space). Each patch group can have its own patch baseline — for example, development baseline auto-approves patches 2 days after release while production baseline auto-approves after 5 days.
Exam Tip: "Dev instances patched 2 days after release; prod patched 5 days after release; 2-hour maintenance window for all" → two patch groups + two patch baselines + one maintenance window.
2.4 Parameter Store vs. Secrets Manager
| Feature | Parameter Store | Secrets Manager |
|---|---|---|
| Cost | Free (standard) | $0.40/secret/month + API charges |
| Secret rotation | Manual only | Built-in rotation with Lambda |
| RDS integration | None | Native RDS password rotation |
| Encryption | Optional (SecureString with KMS) | Always encrypted with KMS |
| Max value size | 4 KB standard, 8 KB advanced | 64 KB |
| Cross-account sharing | No | Yes (resource policy) |
| Requirement | Use |
|---|---|
| Store and auto-rotate RDS credentials every 30 days | Secrets Manager |
| Store CloudWatch agent JSON configuration for fleet | Parameter Store |
| Store non-secret application configuration values | Parameter Store |
3. EC2 Instances — Provisioning and Purchasing
3.1 AMI Management
Pre-Baking (Golden AMI) Pattern
User data scripts that install and configure software take 5–20 minutes, making scale-out responses slow. The solution is to pre-bake AMIs: launch a base instance, install all software, create an AMI from the configured instance, and use this golden AMI in launch templates. New instances launch with software already installed and are ready in under 2 minutes.
Use EC2 Image Builder to automate the golden AMI creation pipeline: source image → build steps (install software) → test steps → distribute to regions.
3.2 EC2 Placement Groups
| Group Type | Layout | Max Instances per AZ | Use Case |
|---|---|---|---|
| Cluster | Same rack, same AZ; 10 Gbps between instances | Unlimited | HPC, MPI, tight low-latency communication |
| Spread | Each instance on a different rack | 7 | Small deployments needing maximum fault isolation |
| Partition | Groups of instances per partition; each partition on separate hardware | Hundreds | Large distributed systems (Hadoop, Cassandra, Kafka) |
Exam Tip: "10 EC2 instances, highly available, distinct underlying hardware" → Spread placement group spanning multiple AZs. "HPC application, minimum latency between nodes" → Cluster placement group in a single subnet.
3.3 EC2 Purchasing Options
| Option | Commitment | Best For |
|---|---|---|
| Standard Reserved Instances | 1 or 3 years; specific family/size/region | Stable workloads; can sell on Marketplace |
| Convertible Reserved Instances | 1 or 3 years; exchangeable for different family/OS | When instance type may need to change mid-term |
| Compute Savings Plans | $/hour commitment; any EC2 + Lambda + Fargate; any region | Region migrations during commitment period |
| Instance Savings Plans | $/hour; specific family and region | Predictable workloads needing higher discount |
| Spot Instances | None; up to 90% discount; interruptible | Fault-tolerant batch, rendering, testing |
| Spot Blocks | 1–6 hour defined duration; not interruptible | Batch jobs < 6 hours that cannot be interrupted |
Exam Tip: "Batch jobs run < 2 hours; if a job fails it must restart from beginning" → Spot Blocks (defined duration). Regular Spot could be interrupted mid-job requiring a full restart.
Spot Fleet Allocation Strategies
| Strategy | Behavior | Use Case |
|---|---|---|
| lowest-price | Selects cheapest pool(s) | Maximum cost savings; higher interruption risk |
| capacity-optimized | Selects pools with most available capacity | Minimum interruption risk |
| diversified | Distributes across all pools | Balance of cost and reliability |
3.4 Elastic Beanstalk Deployment Policies
| Policy | Downtime | Capacity During Update | Rollback Speed |
|---|---|---|---|
| All at once | Yes | Reduced (all updating) | Redeploy (same speed as deploy) |
| Rolling | No | Reduced by batch size | Rolling back each batch |
| Rolling with additional batch | No | Full (extra instances launched first) | Roll back batches |
| Immutable | No | Full (new ASG created alongside old) | Instant (route back to old ASG) |
Exam Tip: "Maintain full capacity during deployment" → Immutable or Rolling with additional batch. Both maintain full capacity, but Immutable allows faster rollback by simply routing traffic back to the old Auto Scaling group.
4. Containers and Serverless
4.1 Amazon ECS — Task Networking
| Network Mode | ENI Assignment | Security Group | Use Case |
|---|---|---|---|
| awsvpc | Per-task ENI | Per-task security group | Production; micro-segmentation |
| bridge | Shared host ENI | Host-level security group | Simple apps; port mapping |
| host | Shares host ENI directly | Host security group | Maximum performance |
Exam Tip: "Monitor traffic flows between ECS tasks" → requires
awsvpcnetwork mode so each task gets its own ENI that can have VPC Flow Logs.
4.2 Lambda in VPC
Lambda functions run outside your VPC by default and have internet access but no access to VPC resources (RDS, ElastiCache, etc.). Placing Lambda in a VPC gives it access to VPC resources but removes direct internet access — just like any other private subnet resource.
To restore internet access for Lambda in a VPC: place Lambda in a private subnet and route through a NAT gateway in a public subnet.
Lambda → RDS connection problem: Lambda creates many short-lived connections and can exhaust the RDS max_connections parameter. Fix: deploy RDS Proxy between Lambda and RDS.
4.3 EKS Configuration
The kubeconfig file is required for kubectl to communicate with an EKS cluster. It maps the cluster API server endpoint, cluster CA certificate, and AWS IAM authentication credentials.
aws eks update-kubeconfig \
--region us-east-1 \
--name my-cluster
Exam Tip: "CloudOps Engineer needs to manage EKS cluster using kubectl. What must be configured?" → The kubeconfig file.
5. Messaging and Decoupling
5.1 Amazon SQS — Complete Reference
Standard vs. FIFO Queues
| Feature | Standard | FIFO |
|---|---|---|
| Message ordering | Best-effort only | Guaranteed strict ordering |
| Exactly-once delivery | At-least-once (duplicates possible) | Exactly-once processing |
| Throughput | Unlimited | 300 TPS (3,000 with batching) |
| Naming | Any name | Must end in .fifo |
Migrating from Standard to FIFO
You cannot convert an existing Standard queue to FIFO. You must: create a new FIFO queue (with .fifo suffix), enable content-based deduplication, update the application to include MessageGroupId in all messages, update consumers to read from the new queue, then drain and delete the old queue.
SQS Priority Queue Pattern
To process alarms before status updates: create two separate queues (high-priority and low-priority). The application polls the high-priority queue first and only polls the low-priority queue when the high-priority queue is empty.
6. AWS Organizations — Account Governance
6.1 Service Control Policies (SCPs)
Key Concept: SCPs set the maximum permissions for accounts in an organization. They apply to all users and roles including the root user of member accounts. SCPs cannot grant permissions — they only restrict. An explicit Deny in an SCP overrides any Allow in IAM policies.
Common SCP patterns:
Restrict to approved regions:
{
"Effect": "Deny",
"Action": "*",
"Resource": "*",
"Condition": {
"StringNotEquals": { "aws:RequestedRegion": ["us-east-1", "us-west-2"] }
}
}
Require tags on resource creation:
{
"Effect": "Deny",
"Action": ["ec2:RunInstances"],
"Resource": "*",
"Condition": { "Null": { "aws:RequestTag/CostCenter": "true" } }
}
SCPs can be applied at the Root (all accounts), OU level (all accounts in that OU), or individual account level. The effective restriction is the combination of all parent + account-level SCPs.
6.2 AWS Control Tower
Account Factory
Automates new account creation with pre-configured baselines: IAM Identity Center SSO, centralized logging, CloudTrail, Config, and guardrails. Every new account starts compliant.
Guardrails
Preventive guardrails (SCPs) prevent non-compliant actions. Detective guardrails (Config rules) detect and report non-compliance after the fact.
6.3 AWS Service Catalog
When a portfolio is shared with another AWS account, the recipient can view and launch products from the imported portfolio and add products to their own local portfolio, but cannot modify products in the imported portfolio — all changes must be made in the original admin account.
Exam Tip: "Most efficient way to create replica of existing infrastructure in a new AWS account" → Share the Service Catalog portfolio with the new account → import it into the new account.
TagOptions Library enforces consistent tagging at product launch by defining allowed tag keys and values that are required or automatically applied when a user launches a product.
Exam Tips & Quick Reference
Scenario-to-Answer Mapping
| Scenario Keyword / Requirement | Correct Answer |
|---|---|
| "Same template across environments" | Use Parameters in CloudFormation template |
| "CloudFormation fails — AMI not found in region" | Use Mappings section with AMI IDs per region |
| "Wait for bootstrap before stack completes" | CreationPolicy + cfn-signal in user data |
| "cfn-signal not received, stack fails" | Add cfn-signal call at end of user data script |
| "Stack DELETE_FAILED due to dependencies" | Delete again, specify resources to retain |
| "Prevent accidental updates to specific resources" | CloudFormation stack policy with Deny |
| "Apply same CloudFormation to all org accounts" | CloudFormation StackSets with SERVICE_MANAGED |
| "Store/auto-rotate RDS credentials in template" | AWS::SecretsManager::Secret + RotationSchedule |
| "Access to EC2 without SSH; centrally logged" | SSM Session Manager |
| "Run commands on fleet by OS tag" | SSM Run Command with tag-based targets |
| "Continuously enforce CloudWatch agent config" | SSM State Manager association |
| "Automated patching with different schedules dev/prod" | Two patch baselines + two patch groups + one maintenance window |
| "Moving regions; need committed discount" | Compute Savings Plans (not Standard RIs) |
| "Change instance type mid-commitment" | Convertible RIs |
| "Batch job < 2 hrs; cannot be interrupted" | Spot Blocks (defined duration) |
| "Maintain full capacity during Beanstalk deployment" | Immutable or Rolling with additional batch |
| "kubectl won't connect to EKS" | Configure kubeconfig file |
| "Lambda → RDS too many connections" | Deploy RDS Proxy |
| "Migrate from SQS Standard to FIFO" | Create new FIFO queue; cannot convert existing |
| "Prioritize alarm messages over status messages" | Two SQS queues; application polls alarm queue first |
| "Replica of infra in new account via Service Catalog" | Share portfolio → import in new account |
| "Prevent specific service in all org accounts" | SCP at organization root |
| "New account provisioning with guardrails" | AWS Control Tower Account Factory |
| "Tag enforcement when launching products" | Service Catalog TagOptions Library |
Common Traps
- Changing resource order in a template has no effect: CloudFormation uses a dependency graph. The only way to enforce creation order is
DependsOn. - StackSets vs. nested stacks: StackSets are for multi-account/multi-region deployment. Nested stacks are for modular template reuse within one account. These are frequently confused.
- Compute vs. Instance Savings Plans: If the scenario involves migrating regions or changing instance families during the commitment period, the answer is always Compute Savings Plans.
- SCP cannot grant permissions: SCPs can only restrict what IAM policies can do — they do not replace or override IAM.
Key Terms — Domain 3
| Term | One-Line Definition |
|---|---|
| CloudFormation Mappings | Lookup table section used to resolve region-specific values like AMI IDs |
| CreationPolicy | CloudFormation attribute that waits for cfn-signal before marking a resource COMPLETE |
| Stack Policy | JSON document protecting specific resources in a stack from accidental updates |
| StackSets | CloudFormation feature for deploying templates across multiple accounts and regions |
| DeletionPolicy: Retain | Causes a resource to persist after the CloudFormation stack is deleted |
| Spot Blocks | Spot Instances with a guaranteed defined duration (1–6 hours) |
| Immutable deployment | Beanstalk deployment that creates a new ASG alongside the old one |
| Compute Savings Plans | Cost commitment that applies to any EC2, Lambda, or Fargate in any region |
| SCP | Service Control Policy setting maximum permissions for all principals in an AWS account |
| Account Factory | Control Tower feature that automates new account creation with compliant baselines |
End of Domain 3. Continue to Domain 4: Security and Compliance →
Ready to test yourself?
Practice questions for this topic