AWSAIF-C01

Domain 1: Fundamentals of AI and ML

Topic 1 of 5 · Study notes

AWS Certified AI Practitioner — Domain 1: Fundamentals of AI and Machine Learning

Exam Code: AIF-C01 | Level: Foundational
Domain Weight: 20% | Total Domains: 5 | Passing Score: 700/1000

AI, ML, and Deep Learning
- 1.1 Definitions and Hierarchy
- 1.2 Types of AI by Capability
Types of Machine Learning
The ML Lifecycle
- 3.1 Pipeline Stages
Key ML Concepts and Terminology
- 4.1 Bias, Variance, and Regularization
- 4.2 Hyperparameters vs. Parameters
Data Fundamentals for ML
- 5.1 Data Types and Quality
- 5.2 Handling Missing Data and Class Imbalance
Feature Engineering
- 6.1 Feature Types and Transformations
Model Training and Evaluation
- 7.1 Evaluation Metrics
ML Algorithms Overview
- 8.1 Supervised Algorithms
- 8.2 Unsupervised Algorithms
Deep Learning and Neural Networks
- 9.1 Network Architectures
AWS AI and ML Services
- 10.1 Amazon SageMaker
- 10.2 AWS Pre-Built AI Services
Exam Tips and Quick Reference

1. AI, ML, and Deep Learning

Artificial Intelligence, Machine Learning, and Deep Learning form a nested hierarchy where each is a subset of the one above it. Understanding where each term begins and ends is the foundation for the entire exam.

1.1 Definitions and Hierarchy

Artificial Intelligence (AI) is the simulation of human intelligence in machines programmed to think, reason, and make decisions. Machine Learning (ML) is a subset of AI where models learn patterns from data without being explicitly programmed with rules. Deep Learning (DL) is a subset of ML using multi-layered neural networks.

┌──────────────────────────────────────────┐
│          Artificial Intelligence         │
│  ┌────────────────────────────────────┐  │
│  │        Machine Learning            │  │
│  │  ┌──────────────────────────────┐  │  │
│  │  │       Deep Learning          │  │  │
│  │  │  ┌────────────────────────┐  │  │  │
│  │  │  │    Generative AI       │  │  │  │
│  │  │  └────────────────────────┘  │  │  │
│  │  └──────────────────────────────┘  │  │
│  └────────────────────────────────────┘  │
└──────────────────────────────────────────┘

Term	Definition
Artificial Intelligence	Machines simulating human-like reasoning and decision-making.
Machine Learning	AI systems that learn from data without being given explicit rules.
Deep Learning	ML using neural networks with many layers to learn complex representations.
Natural Language Processing	AI for understanding and generating human language.
Computer Vision	AI for understanding images and video.

Key Concept: All ML is AI, but not all AI is ML. All Deep Learning is ML, but not all ML is Deep Learning. Generative AI is a subset of Deep Learning.

1.2 Types of AI by Capability

Type	Also Called	Description
Narrow AI	ANI	Designed for one specific task. Every production AI system today is narrow AI.
General AI	AGI	Hypothetical human-level general intelligence. Does not exist today.
Super AI	ASI	Hypothetical AI surpassing human intelligence. Theoretical only.

Exam Tip: Every AWS AI service — Bedrock, SageMaker, Rekognition — is narrow AI. AGI and ASI are conceptual terms, not products.

2. Types of Machine Learning

ML is categorized by how models learn from data. The exam frequently gives a scenario and asks you to identify the correct learning paradigm.

2.1 Supervised Learning

In supervised learning, the model learns from labeled data — datasets where the correct answer is already known. The goal is to learn a mapping from inputs to outputs.

Classification

Predicts a discrete class or category. Two sub-types exist:

Sub-type	Number of Classes	Example
Binary Classification	2	Spam vs. not spam
Multi-class Classification	3 or more	Cat, dog, or bird

Regression

Predicts a continuous numeric value. Examples include house price prediction and temperature forecasting.

2.2 Unsupervised Learning

The model finds patterns in unlabeled data. There is no correct answer to learn from — the model discovers structure on its own.

Sub-type	Goal	Example Algorithms
Clustering	Group similar data points together	K-Means, DBSCAN
Dimensionality Reduction	Reduce the number of features while preserving information	PCA, t-SNE
Association	Find co-occurrence relationships between variables	Market basket analysis
Anomaly Detection	Identify data points that deviate from the norm	Isolation Forest

2.3 Reinforcement Learning

An agent learns by interacting with an environment, receiving rewards for good actions and penalties for bad ones. It learns a policy — a strategy for maximizing cumulative reward over time.

Component	Definition
Agent	The learner or decision-maker.
Environment	The world the agent interacts with.
Action	A choice the agent can make at each step.
State	The current situation the agent observes.
Reward	The feedback signal — positive or negative.
Policy	The learned strategy that maps states to actions.

Exam Tip: If a scenario describes a system learning through trial, reward, and penalty — game playing, robotic control, autonomous driving — the answer is reinforcement learning.

2.4 Semi-Supervised and Self-Supervised Learning

Type	Data Used	How It Works
Semi-Supervised	Small labeled + large unlabeled	Uses unlabeled data to improve a model trained on limited labels.
Self-Supervised	Unlabeled only	The model generates its own labels from the data structure (e.g., predict the next word). This is the foundation of LLMs like GPT and BERT.

3. The ML Lifecycle

The ML lifecycle is an iterative, six-stage process from business problem definition to production monitoring. Projects routinely loop back through earlier stages when performance degrades or requirements change.

3.1 Pipeline Stages

┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│  1. Define   │───►│  2. Collect  │───►│  3. Prepare  │
│   Problem    │    │    Data      │    │    Data      │
└──────────────┘    └──────────────┘    └──────────────┘
                                               │
┌──────────────┐    ┌──────────────┐    ┌──────▼───────┐
│  6. Monitor  │◄───│  5. Deploy   │◄───│  4. Train &  │
│  & Retrain   │    │    Model     │    │   Evaluate   │
└──────────────┘    └──────────────┘    └──────────────┘

Stage	Key Activities
1. Define Problem	Identify business goal, ML problem type, and success metrics.
2. Collect Data	Gather data from databases, APIs, IoT sensors, and web sources.
3. Prepare Data	Perform EDA, cleaning, feature engineering, and data splitting.
4. Train and Evaluate	Select algorithm, train model, tune hyperparameters, evaluate on validation set.
5. Deploy	Package model as an endpoint; run A/B tests.
6. Monitor and Retrain	Detect data drift and model drift; trigger retraining when performance degrades.

Key Concept: The ML lifecycle never truly ends. Production models must be continuously monitored and periodically retrained as real-world data evolves.

4. Key ML Concepts and Terminology

4.1 Bias, Variance, and Regularization

Bias measures how far a model's predictions are from the true values. Variance measures how much predictions fluctuate across different training datasets. Every model must balance the two.

Condition	Bias	Variance	Problem	Fix
Underfitting	High	Low	Model too simple; misses patterns	Add features; increase model complexity
Overfitting	Low	High	Model memorizes noise; fails on new data	Regularization; more data; simpler model
Good Fit	Low	Low	Generalizes well to unseen data	—

Regularization Techniques

Technique	Mechanism	Effect
L1 (Lasso)	Adds absolute value of weights to loss	Drives some weights to zero; performs feature selection
L2 (Ridge)	Adds squared weights to loss	Shrinks all weights; no feature elimination
Dropout	Randomly deactivates neurons during training	Prevents co-adaptation in deep networks
Early Stopping	Halts training when validation loss stops improving	Avoids over-training without modifying the architecture

4.2 Hyperparameters vs. Parameters

Key Concept: Parameters are learned automatically by the optimizer during training. Hyperparameters are set by the practitioner before training begins and control how learning happens.

Property	Parameters	Hyperparameters
Who sets them	Optimizer (automatic)	Data scientist (manual)
Examples	Weights, biases	Learning rate, number of layers, batch size
When determined	During training	Before training begins

Hyperparameter Tuning Methods

Method	Description	Speed
Grid Search	Exhaustively tries every combination	Slow
Random Search	Randomly samples combinations	Medium
Bayesian Optimization	Uses prior results to guide the next trial	Fast
SageMaker Automatic Model Tuning	Managed Bayesian optimization	Fast

5. Data Fundamentals for ML

5.1 Data Types and Quality

Data Type	Description	Examples
Structured	Fixed rows and columns with a defined schema	CSV files, relational databases
Unstructured	No predefined format or schema	Images, audio, video, raw text
Semi-structured	Partial structure without a rigid schema	JSON, XML, application log files
Time Series	Sequential observations indexed by time	Stock prices, IoT sensor readings

Data Quality Dimensions

Dimension	Definition
Accuracy	Data correctly reflects real-world values.
Completeness	No required fields are missing.
Consistency	Data is uniform and coherent across all sources.
Timeliness	Data is current and not stale.
Uniqueness	No duplicate records exist.
Validity	Data conforms to defined formats, types, and ranges.

Train / Validation / Test Split

Split	Purpose	Typical Proportion
Training	Fit model parameters	60–80%
Validation	Tune hyperparameters; monitor for overfitting	10–20%
Test	Final, one-time unbiased evaluation	10–20%

Exam Tip: Test data must remain completely unseen during training and tuning. Using test data to make any development decision invalidates the evaluation and constitutes data leakage.

5.2 Handling Missing Data and Class Imbalance

Missing Data Strategies

Strategy	When to Use
Deletion	Missing at random; sufficient data remains after removal
Mean / Median / Mode Imputation	Numerical features with a low rate of missingness
Forward / Backward Fill	Time-series data where adjacent values are informative
Predictive Imputation	High-value features with complex missing patterns
Indicator Variable	When the fact of missingness itself carries signal

Class Imbalance Techniques

Technique	Description
SMOTE	Synthesize new minority-class samples
Undersampling	Randomly remove majority-class samples
Class Weights	Penalize minority-class misclassification more heavily during training
Correct Evaluation	Use F1, Precision, or Recall — not accuracy — for imbalanced datasets

6. Feature Engineering

Feature engineering is the process of creating, transforming, or selecting input variables to improve model performance. It is often the highest-leverage activity in an ML project.

6.1 Feature Types and Transformations

Numerical Transformations

Transformation	Description	When to Use
Min-Max Scaling	Scales values to the range [0, 1]	Algorithms sensitive to feature magnitude (KNN, SVM)
Standardization (Z-score)	Produces mean = 0 and standard deviation = 1	Most algorithms; removes scale effects
Log Transform	Compresses right-skewed distributions	Highly skewed numerical features
Binning	Converts a continuous value into a discrete bucket	Capturing non-linear threshold effects

Categorical Transformations

Transformation	Description	When to Use
One-Hot Encoding	Creates a binary column per category	Nominal categories with low cardinality
Label Encoding	Assigns an integer to each category	Tree-based models only
Target Encoding	Replaces each category with the mean of the target variable	High-cardinality categoricals

Text Feature Methods

Method	Description
Bag of Words	Counts the occurrences of each word per document
TF-IDF	Weights words by how frequent they are in a document relative to the corpus
Word Embeddings	Dense vector representations trained on large corpora (Word2Vec, GloVe)
BERT Embeddings	Contextual representations produced by a transformer model

7. Model Training and Evaluation

7.1 Evaluation Metrics

Classification Metrics

The confusion matrix is the foundation for all classification metrics.

                    Predicted Positive    Predicted Negative
Actual Positive  │        TP            │        FN         │
Actual Negative  │        FP            │        TN         │

Metric	Formula	Optimize When
Accuracy	(TP + TN) / All	Classes are balanced
Precision	TP / (TP + FP)	False positives are costly (spam filter)
Recall (Sensitivity)	TP / (TP + FN)	False negatives are costly (cancer detection)
F1 Score	2 × (P × R) / (P + R)	Classes are imbalanced; both P and R matter
AUC-ROC	Area under the ROC curve	Evaluating overall classifier quality

Exam Tip: For fraud detection and medical diagnosis, prioritize Recall — a missed positive (false negative) is the most costly error. For spam filtering, prioritize Precision — a false positive blocks legitimate email.

Regression Metrics

Metric	Description	Sensitive to Outliers
MAE	Mean absolute difference between predicted and actual values	No
MSE	Mean squared difference; amplifies large errors	Yes
RMSE	Square root of MSE; same unit as the target variable	Yes
R²	Proportion of target variance explained by the model; range 0–1	Moderate

8. ML Algorithms Overview

8.1 Supervised Algorithms

Algorithm	Type	Key Characteristics
Linear Regression	Regression	Fits a line or hyperplane; highly interpretable
Logistic Regression	Classification	Uses sigmoid function; outputs a probability
Decision Tree	Both	Interpretable path; prone to overfitting without pruning
Random Forest	Both	Ensemble of trees using bagging; reduces variance
XGBoost / LightGBM	Both	Sequential boosting; top performer on tabular data
K-Nearest Neighbors	Both	Classifies by majority vote of K nearest training points
Support Vector Machine	Both	Finds the optimal separating hyperplane with maximum margin
Naive Bayes	Classification	Probabilistic; assumes feature independence; very fast

SageMaker Built-in Algorithms

Algorithm	Problem Type	Primary Use Case
XGBoost	Supervised	General-purpose tabular data; most popular built-in
Linear Learner	Supervised	Classification and regression on large datasets
K-Means	Unsupervised	Clustering
PCA	Unsupervised	Dimensionality reduction
Random Cut Forest	Anomaly Detection	Time-series anomaly detection
DeepAR	Forecasting	Time-series forecasting with deep learning
BlazingText	NLP	Word embeddings and text classification
Object Detection	Computer Vision	Detect and locate objects in images
Image Classification	Computer Vision	Assign images to categories
Factorization Machines	Supervised	Recommendation and click-through prediction

8.2 Unsupervised Algorithms

Algorithm	Type	Key Characteristics
K-Means	Clustering	Iteratively assigns points to K centroids; requires K upfront
DBSCAN	Clustering	Density-based; finds arbitrary shapes; no K required
Hierarchical Clustering	Clustering	Builds a tree (dendrogram) of clusters; no K required
PCA	Dimensionality Reduction	Projects data onto orthogonal axes of maximum variance
t-SNE	Dimensionality Reduction	Non-linear; excellent for 2D visualization of high-dimensional data
Isolation Forest	Anomaly Detection	Isolates anomalies using random feature splits; efficient at scale

9. Deep Learning and Neural Networks

A neural network learns by adjusting internal weights and biases to minimize a loss function via backpropagation and gradient descent.

9.1 Network Architectures

Activation Functions

Function	Output Range	Primary Use
ReLU	[0, ∞)	Default for hidden layers; computationally efficient
Sigmoid	(0, 1)	Binary classification output layer
Softmax	(0, 1)	Multi-class classification output layer
Tanh	(−1, 1)	Recurrent networks
Leaky ReLU	(−∞, ∞)	Avoids dying ReLU problem in deep networks

Neural Network Types

Architecture	Abbreviation	Best For
Feedforward Neural Network	FNN / MLP	Tabular data; general classification and regression
Convolutional Neural Network	CNN	Images and video; detects local spatial patterns
Long Short-Term Memory	LSTM	Sequential data; handles long-range dependencies
Transformer	—	Text, code, multi-modal; basis of all modern LLMs
Generative Adversarial Network	GAN	Image generation; synthetic data augmentation
Autoencoder	AE	Dimensionality reduction; anomaly detection; denoising

Key Concept: The Transformer architecture, introduced in "Attention Is All You Need" (2017), uses self-attention to process all tokens in parallel. It is the foundation of GPT, BERT, Claude, and every major modern language model.

Transfer Learning Steps

Step	Action
1	Select a model pre-trained on a large general dataset (e.g., ResNet, BERT)
2	Freeze most layers to preserve learned representations
3	Replace and retrain the final layers on your specific, smaller dataset

10. AWS AI and ML Services

AWS organizes its AI/ML stack into three tiers based on the level of ML expertise required.

┌──────────────────────────────────────────────────────────────┐
│  Tier 1 — AI Services  (no ML expertise needed)              │
│  Rekognition · Comprehend · Polly · Transcribe · Translate   │
├──────────────────────────────────────────────────────────────┤
│  Tier 2 — ML Services  (build and train custom models)       │
│                    Amazon SageMaker                          │
├──────────────────────────────────────────────────────────────┤
│  Tier 3 — Infrastructure  (GPUs and custom chips)            │
│        EC2 GPU instances · AWS Trainium · AWS Inferentia     │
└──────────────────────────────────────────────────────────────┘

10.1 Amazon SageMaker

Amazon SageMaker is AWS's fully managed ML platform for building, training, and deploying models at scale.

Component	Purpose
SageMaker Studio	Web-based IDE for notebooks, experiments, and pipelines
SageMaker Autopilot	AutoML; automatically selects the algorithm and tunes the best model
SageMaker Canvas	No-code ML for business analysts; no programming required
SageMaker Data Wrangler	Visual, low-code data preparation and feature engineering
SageMaker Feature Store	Centralized, reusable repository for ML features
SageMaker Training Jobs	Managed distributed model training at scale
SageMaker Pipelines	CI/CD for ML; automates training and deployment workflows
SageMaker Model Registry	Version and approve models before deployment
SageMaker Clarify	Detect bias in data and models; generate SHAP explanations
SageMaker Model Monitor	Detect data drift and model degradation in production
SageMaker Ground Truth	Managed data labeling with human workforce

SageMaker Deployment Options

Deployment Type	Description	Best Use Case
Real-time Endpoint	Persistent, always-on low-latency endpoint	User-facing applications
Serverless Inference	Auto-scales to zero; pay per request	Intermittent or unpredictable traffic
Async Inference	Queues and processes large payloads asynchronously	Video processing, long documents
Batch Transform	Runs inference on a full dataset at once	Offline scoring, nightly reporting
Multi-Model Endpoint	Hosts many models on a single endpoint	Cost optimization with many models

10.2 AWS Pre-Built AI Services

Vision Services

Service	Key Capabilities
Amazon Rekognition	Object detection, facial recognition, content moderation, text in images
Amazon Textract	Extract text, key-value pairs, and tables from scanned documents
Amazon Lookout for Vision	Industrial defect detection using computer vision

Language Services

Service	Key Capabilities
Amazon Comprehend	Entity recognition, sentiment analysis, key phrases, topic modeling, PII detection
Amazon Comprehend Medical	Extract clinical entities and PHI from unstructured medical text
Amazon Translate	Neural machine translation across 75+ languages
Amazon Polly	Text-to-speech synthesis with natural-sounding voices
Amazon Transcribe	Speech-to-text with custom vocabulary and speaker identification
Amazon Lex	Conversational chatbots and voice assistants (same engine as Alexa)

Search, Forecasting, and Analytics Services

Service	Key Capabilities
Amazon Kendra	Intelligent enterprise search using NLP and ML relevance ranking
Amazon Personalize	Real-time personalization and recommendations
Amazon Forecast	ML-based time-series demand forecasting
Amazon Lookout for Metrics	Anomaly detection in business metrics
Amazon Fraud Detector	Build and deploy ML-based fraud detection models

Exam Tips & Quick Reference

Scenario-to-Answer Mapping

Scenario Keyword / Requirement	Correct Answer
"No ML expertise needed; build a model from business data"	SageMaker Canvas
"Automatically find the best model and hyperparameters"	SageMaker Autopilot
"Label large volumes of training data with human reviewers"	SageMaker Ground Truth
"Detect bias in training data before model training"	SageMaker Clarify (pre-training bias)
"Monitor a deployed model for input distribution changes"	SageMaker Model Monitor
"Classify images without training a custom model"	Amazon Rekognition
"Extract text and tables from PDF invoices"	Amazon Textract
"Analyze customer review sentiment at scale"	Amazon Comprehend
"Build a chatbot that understands user intent via voice or text"	Amazon Lex
"Predict future demand based on historical time-series data"	Amazon Forecast
"Detect anomalies in IoT sensor data"	SageMaker Random Cut Forest
"Prevent overfitting in a deep neural network"	Dropout, L2 regularization, early stopping
"Minimize missed fraud cases (false negatives)"	Optimize for Recall
"Minimize legitimate emails marked as spam (false positives)"	Optimize for Precision

Common Traps

Accuracy on imbalanced data: Accuracy looks good on imbalanced datasets but hides poor minority-class performance. The exam will present an imbalanced scenario and expect F1, Precision, or Recall — not accuracy.
Autopilot vs. Canvas: Autopilot automates model building for practitioners who want ML control. Canvas targets business analysts with no ML background using a no-code interface.
Parameters vs. hyperparameters: Parameters (weights, biases) are learned automatically. Hyperparameters (learning rate, depth) are set before training. The optimizer sets parameters; the data scientist sets hyperparameters.
Validation vs. test set: Validation data is used iteratively during development. Test data is used exactly once at the very end. Tuning on test data is data leakage.

Key Terms — Domain 1

Term	One-Line Definition
Epoch	One complete pass through the entire training dataset
Batch Size	Number of training samples processed before each weight update
Learning Rate	Step size the optimizer uses to update model weights
Gradient Descent	Iterative optimization algorithm that minimizes the loss function
Backpropagation	Algorithm that computes the gradient of loss w.r.t. each weight
Inference	Using a trained model to generate predictions on new, unseen data
Ground Truth	The correct, verified label for a training or evaluation sample
Data Drift	The distribution of input features changes after model deployment
Concept Drift	The relationship between inputs and the target variable changes over time
Overfitting	Model performs well on training data but poorly on unseen data
Underfitting	Model is too simple to capture the underlying patterns in the data
Transfer Learning	Reusing a model pre-trained on one task as the starting point for another
Ensemble	Combining multiple models to produce a stronger, more stable prediction

End of Domain 1. Continue to Domain 2: Fundamentals of Generative AI →

Domain 2: Fundamentals of GenAI

Ready to test yourself?

Practice questions for this topic

Start Practicing →