Courses/AIF-C01/Domain 1: Fundamentals of AI and ML
Practice questions →
AWSAIF-C01

Domain 1: Fundamentals of AI and ML

Topic 1 of 5 · Study notes

AWS Certified AI Practitioner — Domain 1: Fundamentals of AI and Machine Learning

Exam Code: AIF-C01  |  Level: Foundational
Domain Weight: 20%  |  Total Domains: 5  |  Passing Score: 700/1000


Table of Contents

  1. AI, ML, and Deep Learning
  2. Types of Machine Learning
  3. The ML Lifecycle
  4. Key ML Concepts and Terminology
  5. Data Fundamentals for ML
  6. Feature Engineering
  7. Model Training and Evaluation
  8. ML Algorithms Overview
  9. Deep Learning and Neural Networks
  10. AWS AI and ML Services
  11. Exam Tips and Quick Reference

1. AI, ML, and Deep Learning

Artificial Intelligence, Machine Learning, and Deep Learning form a nested hierarchy where each is a subset of the one above it. Understanding where each term begins and ends is the foundation for the entire exam.

1.1 Definitions and Hierarchy

Artificial Intelligence (AI) is the simulation of human intelligence in machines programmed to think, reason, and make decisions. Machine Learning (ML) is a subset of AI where models learn patterns from data without being explicitly programmed with rules. Deep Learning (DL) is a subset of ML using multi-layered neural networks.

┌──────────────────────────────────────────┐
│          Artificial Intelligence         │
│  ┌────────────────────────────────────┐  │
│  │        Machine Learning            │  │
│  │  ┌──────────────────────────────┐  │  │
│  │  │       Deep Learning          │  │  │
│  │  │  ┌────────────────────────┐  │  │  │
│  │  │  │    Generative AI       │  │  │  │
│  │  │  └────────────────────────┘  │  │  │
│  │  └──────────────────────────────┘  │  │
│  └────────────────────────────────────┘  │
└──────────────────────────────────────────┘
Term Definition
Artificial Intelligence Machines simulating human-like reasoning and decision-making.
Machine Learning AI systems that learn from data without being given explicit rules.
Deep Learning ML using neural networks with many layers to learn complex representations.
Natural Language Processing AI for understanding and generating human language.
Computer Vision AI for understanding images and video.

Key Concept: All ML is AI, but not all AI is ML. All Deep Learning is ML, but not all ML is Deep Learning. Generative AI is a subset of Deep Learning.

1.2 Types of AI by Capability

Type Also Called Description
Narrow AI ANI Designed for one specific task. Every production AI system today is narrow AI.
General AI AGI Hypothetical human-level general intelligence. Does not exist today.
Super AI ASI Hypothetical AI surpassing human intelligence. Theoretical only.

Exam Tip: Every AWS AI service — Bedrock, SageMaker, Rekognition — is narrow AI. AGI and ASI are conceptual terms, not products.


2. Types of Machine Learning

ML is categorized by how models learn from data. The exam frequently gives a scenario and asks you to identify the correct learning paradigm.

2.1 Supervised Learning

In supervised learning, the model learns from labeled data — datasets where the correct answer is already known. The goal is to learn a mapping from inputs to outputs.

Classification

Predicts a discrete class or category. Two sub-types exist:

Sub-type Number of Classes Example
Binary Classification 2 Spam vs. not spam
Multi-class Classification 3 or more Cat, dog, or bird

Regression

Predicts a continuous numeric value. Examples include house price prediction and temperature forecasting.

2.2 Unsupervised Learning

The model finds patterns in unlabeled data. There is no correct answer to learn from — the model discovers structure on its own.

Sub-type Goal Example Algorithms
Clustering Group similar data points together K-Means, DBSCAN
Dimensionality Reduction Reduce the number of features while preserving information PCA, t-SNE
Association Find co-occurrence relationships between variables Market basket analysis
Anomaly Detection Identify data points that deviate from the norm Isolation Forest

2.3 Reinforcement Learning

An agent learns by interacting with an environment, receiving rewards for good actions and penalties for bad ones. It learns a policy — a strategy for maximizing cumulative reward over time.

Component Definition
Agent The learner or decision-maker.
Environment The world the agent interacts with.
Action A choice the agent can make at each step.
State The current situation the agent observes.
Reward The feedback signal — positive or negative.
Policy The learned strategy that maps states to actions.

Exam Tip: If a scenario describes a system learning through trial, reward, and penalty — game playing, robotic control, autonomous driving — the answer is reinforcement learning.

2.4 Semi-Supervised and Self-Supervised Learning

Type Data Used How It Works
Semi-Supervised Small labeled + large unlabeled Uses unlabeled data to improve a model trained on limited labels.
Self-Supervised Unlabeled only The model generates its own labels from the data structure (e.g., predict the next word). This is the foundation of LLMs like GPT and BERT.

3. The ML Lifecycle

The ML lifecycle is an iterative, six-stage process from business problem definition to production monitoring. Projects routinely loop back through earlier stages when performance degrades or requirements change.

3.1 Pipeline Stages

┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│  1. Define   │───►│  2. Collect  │───►│  3. Prepare  │
│   Problem    │    │    Data      │    │    Data      │
└──────────────┘    └──────────────┘    └──────────────┘
                                               │
┌──────────────┐    ┌──────────────┐    ┌──────▼───────┐
│  6. Monitor  │◄───│  5. Deploy   │◄───│  4. Train &  │
│  & Retrain   │    │    Model     │    │   Evaluate   │
└──────────────┘    └──────────────┘    └──────────────┘
Stage Key Activities
1. Define Problem Identify business goal, ML problem type, and success metrics.
2. Collect Data Gather data from databases, APIs, IoT sensors, and web sources.
3. Prepare Data Perform EDA, cleaning, feature engineering, and data splitting.
4. Train and Evaluate Select algorithm, train model, tune hyperparameters, evaluate on validation set.
5. Deploy Package model as an endpoint; run A/B tests.
6. Monitor and Retrain Detect data drift and model drift; trigger retraining when performance degrades.

Key Concept: The ML lifecycle never truly ends. Production models must be continuously monitored and periodically retrained as real-world data evolves.


4. Key ML Concepts and Terminology

4.1 Bias, Variance, and Regularization

Bias measures how far a model's predictions are from the true values. Variance measures how much predictions fluctuate across different training datasets. Every model must balance the two.

Condition Bias Variance Problem Fix
Underfitting High Low Model too simple; misses patterns Add features; increase model complexity
Overfitting Low High Model memorizes noise; fails on new data Regularization; more data; simpler model
Good Fit Low Low Generalizes well to unseen data

Regularization Techniques

Technique Mechanism Effect
L1 (Lasso) Adds absolute value of weights to loss Drives some weights to zero; performs feature selection
L2 (Ridge) Adds squared weights to loss Shrinks all weights; no feature elimination
Dropout Randomly deactivates neurons during training Prevents co-adaptation in deep networks
Early Stopping Halts training when validation loss stops improving Avoids over-training without modifying the architecture

4.2 Hyperparameters vs. Parameters

Key Concept: Parameters are learned automatically by the optimizer during training. Hyperparameters are set by the practitioner before training begins and control how learning happens.

Property Parameters Hyperparameters
Who sets them Optimizer (automatic) Data scientist (manual)
Examples Weights, biases Learning rate, number of layers, batch size
When determined During training Before training begins

Hyperparameter Tuning Methods

Method Description Speed
Grid Search Exhaustively tries every combination Slow
Random Search Randomly samples combinations Medium
Bayesian Optimization Uses prior results to guide the next trial Fast
SageMaker Automatic Model Tuning Managed Bayesian optimization Fast

5. Data Fundamentals for ML

5.1 Data Types and Quality

Data Type Description Examples
Structured Fixed rows and columns with a defined schema CSV files, relational databases
Unstructured No predefined format or schema Images, audio, video, raw text
Semi-structured Partial structure without a rigid schema JSON, XML, application log files
Time Series Sequential observations indexed by time Stock prices, IoT sensor readings

Data Quality Dimensions

Dimension Definition
Accuracy Data correctly reflects real-world values.
Completeness No required fields are missing.
Consistency Data is uniform and coherent across all sources.
Timeliness Data is current and not stale.
Uniqueness No duplicate records exist.
Validity Data conforms to defined formats, types, and ranges.

Train / Validation / Test Split

Split Purpose Typical Proportion
Training Fit model parameters 60–80%
Validation Tune hyperparameters; monitor for overfitting 10–20%
Test Final, one-time unbiased evaluation 10–20%

Exam Tip: Test data must remain completely unseen during training and tuning. Using test data to make any development decision invalidates the evaluation and constitutes data leakage.

5.2 Handling Missing Data and Class Imbalance

Missing Data Strategies

Strategy When to Use
Deletion Missing at random; sufficient data remains after removal
Mean / Median / Mode Imputation Numerical features with a low rate of missingness
Forward / Backward Fill Time-series data where adjacent values are informative
Predictive Imputation High-value features with complex missing patterns
Indicator Variable When the fact of missingness itself carries signal

Class Imbalance Techniques

Technique Description
SMOTE Synthesize new minority-class samples
Undersampling Randomly remove majority-class samples
Class Weights Penalize minority-class misclassification more heavily during training
Correct Evaluation Use F1, Precision, or Recall — not accuracy — for imbalanced datasets

6. Feature Engineering

Feature engineering is the process of creating, transforming, or selecting input variables to improve model performance. It is often the highest-leverage activity in an ML project.

6.1 Feature Types and Transformations

Numerical Transformations

Transformation Description When to Use
Min-Max Scaling Scales values to the range [0, 1] Algorithms sensitive to feature magnitude (KNN, SVM)
Standardization (Z-score) Produces mean = 0 and standard deviation = 1 Most algorithms; removes scale effects
Log Transform Compresses right-skewed distributions Highly skewed numerical features
Binning Converts a continuous value into a discrete bucket Capturing non-linear threshold effects

Categorical Transformations

Transformation Description When to Use
One-Hot Encoding Creates a binary column per category Nominal categories with low cardinality
Label Encoding Assigns an integer to each category Tree-based models only
Target Encoding Replaces each category with the mean of the target variable High-cardinality categoricals

Text Feature Methods

Method Description
Bag of Words Counts the occurrences of each word per document
TF-IDF Weights words by how frequent they are in a document relative to the corpus
Word Embeddings Dense vector representations trained on large corpora (Word2Vec, GloVe)
BERT Embeddings Contextual representations produced by a transformer model

7. Model Training and Evaluation

7.1 Evaluation Metrics

Classification Metrics

The confusion matrix is the foundation for all classification metrics.

                    Predicted Positive    Predicted Negative
Actual Positive  │        TP            │        FN         │
Actual Negative  │        FP            │        TN         │
Metric Formula Optimize When
Accuracy (TP + TN) / All Classes are balanced
Precision TP / (TP + FP) False positives are costly (spam filter)
Recall (Sensitivity) TP / (TP + FN) False negatives are costly (cancer detection)
F1 Score 2 × (P × R) / (P + R) Classes are imbalanced; both P and R matter
AUC-ROC Area under the ROC curve Evaluating overall classifier quality

Exam Tip: For fraud detection and medical diagnosis, prioritize Recall — a missed positive (false negative) is the most costly error. For spam filtering, prioritize Precision — a false positive blocks legitimate email.

Regression Metrics

Metric Description Sensitive to Outliers
MAE Mean absolute difference between predicted and actual values No
MSE Mean squared difference; amplifies large errors Yes
RMSE Square root of MSE; same unit as the target variable Yes
Proportion of target variance explained by the model; range 0–1 Moderate

8. ML Algorithms Overview

8.1 Supervised Algorithms

Algorithm Type Key Characteristics
Linear Regression Regression Fits a line or hyperplane; highly interpretable
Logistic Regression Classification Uses sigmoid function; outputs a probability
Decision Tree Both Interpretable path; prone to overfitting without pruning
Random Forest Both Ensemble of trees using bagging; reduces variance
XGBoost / LightGBM Both Sequential boosting; top performer on tabular data
K-Nearest Neighbors Both Classifies by majority vote of K nearest training points
Support Vector Machine Both Finds the optimal separating hyperplane with maximum margin
Naive Bayes Classification Probabilistic; assumes feature independence; very fast

SageMaker Built-in Algorithms

Algorithm Problem Type Primary Use Case
XGBoost Supervised General-purpose tabular data; most popular built-in
Linear Learner Supervised Classification and regression on large datasets
K-Means Unsupervised Clustering
PCA Unsupervised Dimensionality reduction
Random Cut Forest Anomaly Detection Time-series anomaly detection
DeepAR Forecasting Time-series forecasting with deep learning
BlazingText NLP Word embeddings and text classification
Object Detection Computer Vision Detect and locate objects in images
Image Classification Computer Vision Assign images to categories
Factorization Machines Supervised Recommendation and click-through prediction

8.2 Unsupervised Algorithms

Algorithm Type Key Characteristics
K-Means Clustering Iteratively assigns points to K centroids; requires K upfront
DBSCAN Clustering Density-based; finds arbitrary shapes; no K required
Hierarchical Clustering Clustering Builds a tree (dendrogram) of clusters; no K required
PCA Dimensionality Reduction Projects data onto orthogonal axes of maximum variance
t-SNE Dimensionality Reduction Non-linear; excellent for 2D visualization of high-dimensional data
Isolation Forest Anomaly Detection Isolates anomalies using random feature splits; efficient at scale

9. Deep Learning and Neural Networks

A neural network learns by adjusting internal weights and biases to minimize a loss function via backpropagation and gradient descent.

9.1 Network Architectures

Activation Functions

Function Output Range Primary Use
ReLU [0, ∞) Default for hidden layers; computationally efficient
Sigmoid (0, 1) Binary classification output layer
Softmax (0, 1) Multi-class classification output layer
Tanh (−1, 1) Recurrent networks
Leaky ReLU (−∞, ∞) Avoids dying ReLU problem in deep networks

Neural Network Types

Architecture Abbreviation Best For
Feedforward Neural Network FNN / MLP Tabular data; general classification and regression
Convolutional Neural Network CNN Images and video; detects local spatial patterns
Long Short-Term Memory LSTM Sequential data; handles long-range dependencies
Transformer Text, code, multi-modal; basis of all modern LLMs
Generative Adversarial Network GAN Image generation; synthetic data augmentation
Autoencoder AE Dimensionality reduction; anomaly detection; denoising

Key Concept: The Transformer architecture, introduced in "Attention Is All You Need" (2017), uses self-attention to process all tokens in parallel. It is the foundation of GPT, BERT, Claude, and every major modern language model.

Transfer Learning Steps

Step Action
1 Select a model pre-trained on a large general dataset (e.g., ResNet, BERT)
2 Freeze most layers to preserve learned representations
3 Replace and retrain the final layers on your specific, smaller dataset

10. AWS AI and ML Services

AWS organizes its AI/ML stack into three tiers based on the level of ML expertise required.

┌──────────────────────────────────────────────────────────────┐
│  Tier 1 — AI Services  (no ML expertise needed)              │
│  Rekognition · Comprehend · Polly · Transcribe · Translate   │
├──────────────────────────────────────────────────────────────┤
│  Tier 2 — ML Services  (build and train custom models)       │
│                    Amazon SageMaker                          │
├──────────────────────────────────────────────────────────────┤
│  Tier 3 — Infrastructure  (GPUs and custom chips)            │
│        EC2 GPU instances · AWS Trainium · AWS Inferentia     │
└──────────────────────────────────────────────────────────────┘

10.1 Amazon SageMaker

Amazon SageMaker is AWS's fully managed ML platform for building, training, and deploying models at scale.

Component Purpose
SageMaker Studio Web-based IDE for notebooks, experiments, and pipelines
SageMaker Autopilot AutoML; automatically selects the algorithm and tunes the best model
SageMaker Canvas No-code ML for business analysts; no programming required
SageMaker Data Wrangler Visual, low-code data preparation and feature engineering
SageMaker Feature Store Centralized, reusable repository for ML features
SageMaker Training Jobs Managed distributed model training at scale
SageMaker Pipelines CI/CD for ML; automates training and deployment workflows
SageMaker Model Registry Version and approve models before deployment
SageMaker Clarify Detect bias in data and models; generate SHAP explanations
SageMaker Model Monitor Detect data drift and model degradation in production
SageMaker Ground Truth Managed data labeling with human workforce

SageMaker Deployment Options

Deployment Type Description Best Use Case
Real-time Endpoint Persistent, always-on low-latency endpoint User-facing applications
Serverless Inference Auto-scales to zero; pay per request Intermittent or unpredictable traffic
Async Inference Queues and processes large payloads asynchronously Video processing, long documents
Batch Transform Runs inference on a full dataset at once Offline scoring, nightly reporting
Multi-Model Endpoint Hosts many models on a single endpoint Cost optimization with many models

10.2 AWS Pre-Built AI Services

Vision Services

Service Key Capabilities
Amazon Rekognition Object detection, facial recognition, content moderation, text in images
Amazon Textract Extract text, key-value pairs, and tables from scanned documents
Amazon Lookout for Vision Industrial defect detection using computer vision

Language Services

Service Key Capabilities
Amazon Comprehend Entity recognition, sentiment analysis, key phrases, topic modeling, PII detection
Amazon Comprehend Medical Extract clinical entities and PHI from unstructured medical text
Amazon Translate Neural machine translation across 75+ languages
Amazon Polly Text-to-speech synthesis with natural-sounding voices
Amazon Transcribe Speech-to-text with custom vocabulary and speaker identification
Amazon Lex Conversational chatbots and voice assistants (same engine as Alexa)

Search, Forecasting, and Analytics Services

Service Key Capabilities
Amazon Kendra Intelligent enterprise search using NLP and ML relevance ranking
Amazon Personalize Real-time personalization and recommendations
Amazon Forecast ML-based time-series demand forecasting
Amazon Lookout for Metrics Anomaly detection in business metrics
Amazon Fraud Detector Build and deploy ML-based fraud detection models

Exam Tips & Quick Reference

Scenario-to-Answer Mapping

Scenario Keyword / Requirement Correct Answer
"No ML expertise needed; build a model from business data" SageMaker Canvas
"Automatically find the best model and hyperparameters" SageMaker Autopilot
"Label large volumes of training data with human reviewers" SageMaker Ground Truth
"Detect bias in training data before model training" SageMaker Clarify (pre-training bias)
"Monitor a deployed model for input distribution changes" SageMaker Model Monitor
"Classify images without training a custom model" Amazon Rekognition
"Extract text and tables from PDF invoices" Amazon Textract
"Analyze customer review sentiment at scale" Amazon Comprehend
"Build a chatbot that understands user intent via voice or text" Amazon Lex
"Predict future demand based on historical time-series data" Amazon Forecast
"Detect anomalies in IoT sensor data" SageMaker Random Cut Forest
"Prevent overfitting in a deep neural network" Dropout, L2 regularization, early stopping
"Minimize missed fraud cases (false negatives)" Optimize for Recall
"Minimize legitimate emails marked as spam (false positives)" Optimize for Precision

Common Traps

  • Accuracy on imbalanced data: Accuracy looks good on imbalanced datasets but hides poor minority-class performance. The exam will present an imbalanced scenario and expect F1, Precision, or Recall — not accuracy.
  • Autopilot vs. Canvas: Autopilot automates model building for practitioners who want ML control. Canvas targets business analysts with no ML background using a no-code interface.
  • Parameters vs. hyperparameters: Parameters (weights, biases) are learned automatically. Hyperparameters (learning rate, depth) are set before training. The optimizer sets parameters; the data scientist sets hyperparameters.
  • Validation vs. test set: Validation data is used iteratively during development. Test data is used exactly once at the very end. Tuning on test data is data leakage.

Key Terms — Domain 1

Term One-Line Definition
Epoch One complete pass through the entire training dataset
Batch Size Number of training samples processed before each weight update
Learning Rate Step size the optimizer uses to update model weights
Gradient Descent Iterative optimization algorithm that minimizes the loss function
Backpropagation Algorithm that computes the gradient of loss w.r.t. each weight
Inference Using a trained model to generate predictions on new, unseen data
Ground Truth The correct, verified label for a training or evaluation sample
Data Drift The distribution of input features changes after model deployment
Concept Drift The relationship between inputs and the target variable changes over time
Overfitting Model performs well on training data but poorly on unseen data
Underfitting Model is too simple to capture the underlying patterns in the data
Transfer Learning Reusing a model pre-trained on one task as the starting point for another
Ensemble Combining multiple models to produce a stronger, more stable prediction

End of Domain 1. Continue to Domain 2: Fundamentals of Generative AI →

Ready to test yourself?

Practice questions for this topic

Start Practicing →

AIF-C01 Topics

Topic 1 of 5