Domain 1: Describe Cloud Concepts
Topic 1 of 3 · Study notes
Microsoft Azure Fundamentals (AZ-900) — Domain 1: Describe Cloud Concepts
Exam Code: AZ-900 | Level: Foundational
Domain Weight: 25–30% | Total Domains: 3 | Passing Score: 700/1000
Table of Contents
- What Is Cloud Computing?
- Benefits of Using Cloud Services
- Cloud Service Types
- Cloud Economics: CapEx vs OpEx
- Exam Tips & Quick Reference
1. What Is Cloud Computing?
Cloud computing is the delivery of computing services — including servers, storage, databases, networking, software, analytics, and intelligence — over the internet to offer faster innovation, flexible resources, and economies of scale. You pay only for the cloud services you use, which helps lower operating costs, run infrastructure more efficiently, and scale as business needs change. The AZ-900 exam tests your understanding of what cloud is, how it works, and why organizations adopt it.
1.1 Definition and Core Idea
Cloud computing is on-demand access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service-provider interaction. This definition comes directly from NIST (National Institute of Standards and Technology) and underpins all cloud concepts on the exam.
Key Point: Cloud computing shifts the ownership and management of infrastructure from the customer to the provider — so the customer only manages the service, not the physical hardware.
The five essential characteristics defined by NIST are:
| Characteristic | What It Means |
|---|---|
| On-demand self-service | Provision compute resources automatically without human interaction from the provider. |
| Broad network access | Resources are available over the network and accessible through standard mechanisms (laptop, phone, tablet). |
| Resource pooling | Provider's resources serve multiple consumers using a multi-tenant model; physical location is abstracted. |
| Rapid elasticity | Resources can be scaled up or down quickly, appearing unlimited to the consumer. |
| Measured service | Resource usage is monitored, controlled, and reported — enabling a pay-per-use billing model. |
Key Concept: The cloud does not eliminate infrastructure — it abstracts it. Servers still exist in Microsoft's data centers; you simply don't manage them physically.
The Traditional Data Center Model (Before Cloud)
Before cloud, organizations ran on-premises data centers. They had to:
- Purchase physical servers outright (large upfront cost)
- Provision capacity based on peak demand (leading to wasted idle hardware)
- Hire staff to manage racks, cooling, power, networking, and OS patching
- Wait weeks or months to provision new servers
- Absorb full financial loss if hardware became obsolete
Cloud computing solves every one of these problems.
1.2 The Shared Responsibility Model
The Shared Responsibility Model defines who is responsible for what between the cloud provider (Microsoft) and the customer. This is a foundational concept tested heavily on AZ-900.
Key Concept: Responsibility is divided between Microsoft and the customer — and which party owns which layer depends entirely on the service type (IaaS, PaaS, or SaaS).
The model divides responsibilities across several layers:
| Responsibility Layer | On-Premises | IaaS | PaaS | SaaS |
|---|---|---|---|---|
| Physical datacenter | Customer | Microsoft | Microsoft | Microsoft |
| Physical network | Customer | Microsoft | Microsoft | Microsoft |
| Physical hosts | Customer | Microsoft | Microsoft | Microsoft |
| Operating system | Customer | Customer | Microsoft | Microsoft |
| Network controls | Customer | Customer | Shared | Microsoft |
| Applications | Customer | Customer | Customer | Microsoft |
| Identity & directory | Customer | Customer | Customer | Shared |
| Data & information | Customer | Customer | Customer | Customer |
| Devices (endpoints) | Customer | Customer | Customer | Customer |
| Accounts & identities | Customer | Customer | Customer | Customer |
Exam Tip: The customer is always responsible for their own data, accounts, and end-user devices — no matter the service model. Microsoft is always responsible for the physical infrastructure.
Always Customer-Owned (regardless of model):
- Data stored in or processed by the cloud
- Devices used to access cloud resources
- Accounts and identities of users
Always Microsoft-Owned (regardless of model):
- Physical datacenter
- Physical network hardware
- Physical host servers
1.3 Cloud Deployment Models
There are three primary deployment models for cloud computing. Each suits a different organizational need.
Public Cloud
A public cloud is built, controlled, and maintained by a third-party cloud provider (e.g., Microsoft Azure, AWS, Google Cloud). Resources are shared across multiple organizations (multi-tenancy) and accessed over the public internet.
- No CapEx to scale up
- Fast provisioning; pay-as-you-go
- No hardware to manage
- Limited control over infrastructure specifics
Private Cloud
A private cloud is a cloud environment used exclusively by a single organization. It can be hosted in the organization's own datacenter or by a third party, but hardware is not shared.
- Maximum control and customization
- Higher cost — all CapEx is borne by the organization
- Meets strict regulatory or compliance requirements
- Common in banking, government, and healthcare
Hybrid Cloud
A hybrid cloud combines public and private clouds connected by secure network links, allowing data and applications to move between them. Organizations use the private cloud for sensitive workloads and the public cloud for burst capacity or less-sensitive apps.
- Most flexible model
- Organizations control which workloads go where
- Allows compliance requirements to be met while leveraging public cloud scale
- More complex to manage than either model alone
Multi-Cloud
A multi-cloud approach uses services from two or more cloud providers simultaneously. Common for avoiding vendor lock-in, or when different providers offer best-in-class services for specific workloads.
Note: Azure Arc is Microsoft's tool that extends Azure management to multi-cloud and on-premises environments — often tested in Domain 3.
| Feature | Public | Private | Hybrid |
|---|---|---|---|
| Hardware ownership | Provider | Customer | Both |
| Cost model | OpEx (pay-as-you-go) | CapEx + OpEx | Mixed |
| Scalability | Near-unlimited | Limited by hardware | Flexible |
| Control level | Low | High | Medium–High |
| Best for | Startups, agile teams | Regulated industries | Large enterprises |
| Security compliance | Provider-managed | Customer-managed | Shared |
2. Benefits of Using Cloud Services
Microsoft Azure, like all major cloud platforms, provides a core set of benefits that form the business justification for cloud adoption. These benefits are explicitly tested on AZ-900 and should be understood both conceptually and through real-world scenarios.
2.1 High Availability and Reliability
High Availability (HA) is the ability of a system to remain operational and accessible for the maximum possible amount of time — measured by uptime SLAs (Service Level Agreements) expressed as a percentage.
Key Concept: High availability is guaranteed through an SLA. If Microsoft fails to meet the SLA, customers receive service credits — not just an apology.
Common SLA tiers and their meaning:
| SLA Uptime | Allowed Downtime / Month | Allowed Downtime / Year |
|---|---|---|
| 99% | ~7.2 hours | ~3.65 days |
| 99.9% | ~43.8 minutes | ~8.76 hours |
| 99.95% | ~21.9 minutes | ~4.38 hours |
| 99.99% | ~4.38 minutes | ~52.6 minutes |
| 99.999% | ~26 seconds | ~5.26 minutes |
Reliability is the ability of a system to recover from failures and continue to function. Azure achieves this through redundant hardware, automatic failover, and geographically distributed resources.
Key Point: HA focuses on preventing downtime. Reliability focuses on recovering from failure. Both are required for a resilient system.
2.2 Scalability and Elasticity
Scalability is the ability to increase (or decrease) resources to meet demand. There are two types:
Vertical Scaling (Scale Up / Scale Down)
Adding more power to an existing resource. For example: upgrading an Azure VM from 4 vCPUs to 16 vCPUs.
- Simple to implement
- Has an upper limit (max VM size)
- Usually involves downtime in traditional environments (Azure handles this better)
Horizontal Scaling (Scale Out / Scale In)
Adding more instances of a resource. For example: increasing Azure VM instances from 2 to 10 behind a load balancer.
- Virtually unlimited scale
- No single point of failure
- Preferred architecture for cloud-native applications
Elasticity is automatic scalability — the system dynamically adjusts resource count based on real-time demand, then scales back down when demand drops. This is what Azure Autoscale delivers.
Exam Tip: Scalability is about capacity; elasticity is about automatic adjustment. Elasticity is a subset of scalability. An elastic system is also scalable, but not all scalable systems are elastic.
2.3 Agility and Geographic Distribution
Agility in cloud computing means the ability to rapidly develop, test, and launch software applications. Cloud infrastructure can be provisioned in minutes, not weeks, enabling development teams to iterate faster.
- Deploy a new environment in minutes via Azure Portal or CLI
- Spin up isolated dev/test environments without impacting production
- Delete resources immediately when no longer needed — pay only for what you used
Geographic distribution refers to Azure's global network of datacenters organized into regions. By deploying workloads to multiple regions, organizations can:
- Serve users with low latency by placing resources close to them
- Meet data residency and sovereignty requirements (some countries require data to stay within borders)
- Build geographically redundant architectures for disaster recovery
Note: Azure operates in more than 60 regions globally — more than any other cloud provider. This is a strategic differentiator Microsoft highlights both commercially and on the exam.
2.4 Disaster Recovery and Fault Tolerance
Disaster Recovery (DR) is the ability to restore business operations after a catastrophic event (datacenter fire, natural disaster, ransomware attack). In the cloud, DR is both faster and cheaper than traditional approaches.
Fault Tolerance is the system's ability to continue operating without interruption even when one or more components fail. It is achieved through redundancy — having duplicate components ready to take over instantly.
Key Azure DR concepts:
| Term | Definition |
|---|---|
| RTO | Recovery Time Objective — the maximum acceptable time to restore service after a failure. |
| RPO | Recovery Point Objective — the maximum acceptable amount of data loss measured in time (e.g., last backup was 4 hours ago). |
| Failover | Automatically switching to a standby system when the primary fails. |
| Geo-redundancy | Replicating data and services across geographically separated Azure regions. |
| Azure Site Recovery | Azure's managed DR service — orchestrates replication, failover, and failback. |
Normal Operation:
User Traffic ──► Primary Region (East US)
│
▼
[App + DB Active]
│
(Replication)
│
▼
Secondary Region (West US)
[App + DB Standby]
Disaster Event:
Primary Region FAILS
│
▼
Azure Site Recovery triggers FAILOVER
│
▼
User Traffic ──► Secondary Region (West US)
[App + DB now Active]
2.5 Security in the Cloud
Security in the cloud is a shared responsibility (covered in 1.2), but Azure provides a comprehensive set of built-in security tools:
- Physical security: Biometric access, 24/7 security staff, surveillance at all Azure datacenters
- Network security: DDoS protection, firewalls, private networking (VNet)
- Identity security: Azure Active Directory (Entra ID), Multi-Factor Authentication, Conditional Access
- Data security: Encryption at rest and in transit by default across Azure services
- Compliance: Azure holds 90+ compliance certifications including ISO 27001, SOC 2, GDPR, HIPAA, FedRAMP
Note: Security is not automatically handled by Microsoft — the customer is responsible for configuring identity, access controls, and data classification correctly. Azure provides the tools; the customer must use them.
3. Cloud Service Types
Cloud services are divided into three main categories — IaaS, PaaS, and SaaS — based on how much of the underlying infrastructure is managed by the provider versus the customer. Understanding this spectrum is critical for AZ-900.
3.1 Infrastructure as a Service (IaaS)
Infrastructure as a Service provides virtualized computing resources over the internet. The cloud provider manages the physical hardware, virtualization, networking, and storage. The customer manages everything above the hypervisor: OS, middleware, runtime, applications, and data.
Mental Model: IaaS is like leasing an empty office space. The building (hardware) is maintained by the landlord; you furnish and decorate it (OS, apps) yourself.
Azure IaaS Examples:
- Azure Virtual Machines (VMs)
- Azure Virtual Network (VNet)
- Azure Managed Disks
- Azure Blob Storage (raw object storage)
Common Use Cases:
- Lift-and-shift migrations from on-premises
- Test and development environments
- High-performance computing workloads
- Disaster recovery scenarios
Key Point: IaaS gives the most control but requires the most management effort from the customer. It is best when the team needs OS-level control or is migrating existing workloads.
3.2 Platform as a Service (PaaS)
Platform as a Service provides a managed environment for building, testing, and deploying applications. The cloud provider manages the infrastructure and the operating system, middleware, and runtime. The customer only manages the application and data.
Mental Model: PaaS is like leasing a fully furnished office with internet, electricity, and receptionist included. You just bring your work (application code).
Azure PaaS Examples:
- Azure App Service (web hosting)
- Azure SQL Database (managed relational DB)
- Azure Kubernetes Service (managed container orchestration)
- Azure Functions (serverless compute)
- Azure Cognitive Services (pre-built AI APIs)
Common Use Cases:
- Developing cloud-native web and mobile applications
- API hosting and management
- Data analytics pipelines
- Rapid prototyping
Exam Tip: PaaS eliminates OS patching, runtime management, and middleware configuration — the customer focuses only on application code and business logic.
3.3 Software as a Service (SaaS)
Software as a Service delivers fully built applications over the internet. The cloud provider manages everything — infrastructure, OS, application code, and data storage. The customer only uses the application, typically through a browser.
Mental Model: SaaS is like a hotel room. Everything is set up, cleaned, and maintained. You just use it.
Azure SaaS Examples (Microsoft ecosystem):
- Microsoft 365 (Word, Excel, Teams, Outlook)
- Microsoft Dynamics 365 (CRM/ERP)
- Azure DevOps (project management and CI/CD)
- Power BI (business intelligence)
Common Use Cases:
- Email and collaboration tools (Microsoft 365)
- CRM and ERP systems
- HR and payroll software
- Customer support platforms
Key Point: SaaS has the lowest management overhead but the least customization and control. It is ideal for standard business processes that don't require custom infrastructure.
3.4 IaaS vs PaaS vs SaaS Comparison
| Attribute | IaaS | PaaS | SaaS |
|---|---|---|---|
| Customer manages | OS, middleware, apps, data | Apps, data | Data only (configuration) |
| Provider manages | Physical infra, networking | Infra + OS + middleware | Everything |
| Control level | High | Medium | Low |
| Flexibility | Maximum | Moderate | Minimal |
| Management effort | High | Medium | Low |
| Time to deploy | Medium (configure OS) | Fast (deploy code) | Instant (sign in and use) |
| Azure example | Virtual Machines | App Service | Microsoft 365 |
| Best for | Lift-and-shift | App development | Productivity tools |
| Cost predictability | Variable | Moderate | High (subscription) |
Exam Tip: A scenario question that says "the team wants to focus only on writing code without managing servers or operating systems" is describing PaaS. A question about needing OS-level control or running a custom OS image points to IaaS.
4. Cloud Economics: CapEx vs OpEx
One of the most compelling business reasons to move to the cloud is financial. Understanding the difference between CapEx and OpEx — and how cloud shifts the model — is a guaranteed AZ-900 topic.
4.1 Capital Expenditure
Capital Expenditure (CapEx) refers to upfront spending on physical infrastructure that is then depreciated over time. In the traditional IT model, CapEx dominates.
Examples of CapEx in IT:
- Purchasing physical servers
- Buying networking hardware (routers, switches, firewalls)
- Building or leasing a data center facility
- Paying for software licenses upfront
Characteristics of CapEx:
- High initial cost before value is delivered
- Value depreciates over time (servers become obsolete)
- Difficult to scale — you over-provision for peak and waste money during low demand
- Long procurement cycles (weeks to months)
- Fixed: if your demand estimate is wrong, you're stuck with the hardware
4.2 Operational Expenditure
Operational Expenditure (OpEx) refers to spending on products and services as they are consumed — billed as a recurring expense with no upfront cost. Cloud computing is fundamentally an OpEx model.
Examples of OpEx in cloud:
- Paying for Azure VMs by the hour/minute
- Monthly subscription to Microsoft 365
- Per-GB storage fees for Azure Blob Storage
- Data egress fees
Characteristics of OpEx:
- No upfront investment; start immediately
- Pay only for what you use
- Easily scalable up or down based on real demand
- Costs are predictable through reserved instances or budgets
- Fully deductible as a business expense in the year incurred (tax advantage)
Key Concept: Cloud computing converts IT infrastructure from a capital investment into an operating expense — shifting spending from the balance sheet to the income statement. This is a major driver for CFO-level cloud adoption decisions.
4.3 Consumption-Based Model
The consumption-based model is the pricing philosophy of cloud computing: you pay only for what you use, when you use it. This is distinct from traditional licensing, where you pay for capacity regardless of actual usage.
Benefits of the consumption-based model:
- No upfront costs — start using resources immediately
- No wasted resources — pay for exactly what you consume
- Pay for additional resources when needed — scale up for peak events (Black Friday, product launches)
- Stop paying when resources are no longer needed — delete a VM and billing stops
Azure Pricing Models:
| Model | Description | Best For |
|---|---|---|
| Pay-as-you-go | Charged per second/hour of actual use, no commitment | Variable workloads, dev/test |
| Reserved Instances | 1 or 3-year commitment for significant discount (up to 72%) | Predictable, steady-state workloads |
| Spot Instances | Use unused Azure capacity at up to 90% discount | Batch jobs, fault-tolerant workloads |
| Azure Hybrid Benefit | Use existing Windows Server / SQL licenses on Azure | Organizations with EA license agreements |
| Dev/Test Pricing | Discounted rates for non-production environments | Development teams |
Exam Tip: "No upfront costs" and "only pay for what you use" are the two phrases that define the consumption-based model. If a question mentions predicting costs, point to Reserved Instances. If it mentions maximum discount with interruptible workloads, point to Spot Instances.
Traditional IT Spend:
┌─────────────────────────────────────┐
│ Month 1: $500K (server purchase) │ ◄── CapEx spike
│ Month 2: $5K (power + staff) │
│ Month 3: $5K (power + staff) │
│ ... │
└─────────────────────────────────────┘
Risk: capacity locked in for 5 years
Cloud Spend:
┌─────────────────────────────────────┐
│ Month 1: $8K (10 VMs, avg load) │
│ Month 2: $3K (traffic dipped) │ ◄── OpEx, scales with demand
│ Month 3: $22K (Black Friday peak) │
│ Month 4: $7K (back to normal) │
└─────────────────────────────────────┘
Benefit: pay exactly for what you need
Exam Tips & Quick Reference
Scenario-to-Answer Mapping
| Scenario Keyword / Requirement | Correct Answer |
|---|---|
| "Pay only for what you use, no upfront cost" | Consumption-based model / OpEx |
| "Need OS-level control, custom OS image" | IaaS (Azure Virtual Machines) |
| "Want to deploy an app without managing servers or OS" | PaaS (Azure App Service) |
| "Use email and collaboration tools, no IT management" | SaaS (Microsoft 365) |
| "Combine on-premises with cloud" | Hybrid cloud |
| "Single organization, maximum control and compliance" | Private cloud |
| "Fastest deployment, low cost, shared infrastructure" | Public cloud |
| "Automatically increase/decrease resources with demand" | Elasticity / Autoscale |
| "Guaranteed uptime percentage backed by Microsoft" | SLA (Service Level Agreement) |
| "Maximum acceptable data loss in case of disaster" | RPO (Recovery Point Objective) |
| "Maximum acceptable downtime after a disaster" | RTO (Recovery Time Objective) |
| "Customer is always responsible for this in all models" | Data, accounts, and devices |
| "Microsoft is always responsible for this in all models" | Physical datacenter and hardware |
| "Add more VMs to handle increased load" | Horizontal scaling (scale out) |
| "Upgrade an existing VM to a larger size" | Vertical scaling (scale up) |
| "Use Azure to manage resources across multiple clouds" | Multi-cloud + Azure Arc |
Common Traps
- CapEx vs OpEx confusion: Candidates often think Reserved Instances are CapEx because you commit upfront. They are still OpEx — you're committing to a service fee, not buying physical hardware. The key is: does money go toward physical infrastructure ownership? If yes → CapEx. If not → OpEx.
- Elasticity vs Scalability: Many candidates use these interchangeably. Scalability = the ability to scale. Elasticity = automatic scaling in response to demand. Elasticity requires automation; manual scaling is just scalability.
- Shared Responsibility misunderstanding: A common wrong answer is that Microsoft manages the customer's data or identity in SaaS. Identity management and data classification remain always the customer's responsibility — even in SaaS.
- Private cloud = on-premises: A private cloud can be hosted by a third-party provider and still be considered private, as long as the hardware is dedicated exclusively to one organization. "Private" refers to exclusivity, not location.
- Fault tolerance vs High Availability: Fault tolerance = zero downtime during a component failure. High availability = minimal downtime with a defined SLA. Fault-tolerant systems are more expensive and complex to build.
- IaaS is always cheaper: Wrong. IaaS requires more customer management effort (patching, maintenance), which has a hidden cost. PaaS can be more cost-effective when total cost of ownership (including labor) is considered.
Key Terms — Domain 1
| Term | One-Line Definition |
|---|---|
| Cloud Computing | On-demand delivery of computing resources over the internet on a pay-as-you-go basis. |
| Shared Responsibility Model | A framework defining which security tasks are owned by the cloud provider vs. the customer. |
| Public Cloud | Cloud infrastructure owned and operated by a third-party provider, shared across multiple tenants. |
| Private Cloud | Cloud infrastructure used exclusively by one organization, on-premises or hosted. |
| Hybrid Cloud | A mix of public and private cloud environments connected by secure networking. |
| IaaS | Cloud model where the provider manages physical infrastructure; customer manages OS and above. |
| PaaS | Cloud model where the provider manages infra + OS; customer manages only applications and data. |
| SaaS | Cloud model where the provider manages everything; customer uses the software via browser. |
| CapEx | Upfront spending on physical assets, depreciated over time (traditional IT). |
| OpEx | Recurring operational spending on services consumed, with no upfront cost (cloud IT). |
| Consumption-Based Model | Pay only for what you use, when you use it — no idle resource costs. |
| High Availability | System design ensuring maximum operational uptime, measured by an SLA percentage. |
| Reliability | The ability of a system to recover from failures and continue operating. |
| Scalability | The ability to increase or decrease resources to match demand. |
| Elasticity | Automatic, dynamic adjustment of resources based on real-time demand. |
| Agility | The ability to rapidly provision and deploy resources to support fast business changes. |
| Fault Tolerance | The ability of a system to continue operating without interruption despite component failures. |
| Disaster Recovery | The strategy and processes to restore operations after a catastrophic failure. |
| RTO | Recovery Time Objective — maximum acceptable downtime after a failure. |
| RPO | Recovery Point Objective — maximum acceptable data loss measured in time. |
| SLA | Service Level Agreement — Microsoft's contractual uptime guarantee for Azure services. |
| Reserved Instances | A 1- or 3-year commitment to a VM size/type in exchange for up to 72% discount. |
| Spot Instances | Use of unused Azure capacity at deep discounts; workload can be interrupted at any time. |
End of Domain 1. Continue to [Domain 2: Describe Azure Architecture and Services] →
Ready to test yourself?
Practice questions for this topic