Domain 1: Fundamentals of AI and ML
Topic 1 of 5 · Study notes
AWS Certified AI Practitioner — Domain 1: Fundamentals of AI and Machine Learning
Exam Code: AIF-C01 | Level: Foundational
Domain Weight: 20% | Total Domains: 5 | Passing Score: 700/1000
Table of Contents
- AI, ML, and Deep Learning
- Types of Machine Learning
- The ML Lifecycle
- Key ML Concepts and Terminology
- Data Fundamentals for ML
- Feature Engineering
- Model Training and Evaluation
- ML Algorithms Overview
- Deep Learning and Neural Networks
- AWS AI and ML Services
- Exam Tips and Quick Reference
1. AI, ML, and Deep Learning
Artificial Intelligence, Machine Learning, and Deep Learning form a nested hierarchy where each is a subset of the one above it. Understanding where each term begins and ends is the foundation for the entire exam.
1.1 Definitions and Hierarchy
Artificial Intelligence (AI) is the simulation of human intelligence in machines programmed to think, reason, and make decisions. Machine Learning (ML) is a subset of AI where models learn patterns from data without being explicitly programmed with rules. Deep Learning (DL) is a subset of ML using multi-layered neural networks.
┌──────────────────────────────────────────┐
│ Artificial Intelligence │
│ ┌────────────────────────────────────┐ │
│ │ Machine Learning │ │
│ │ ┌──────────────────────────────┐ │ │
│ │ │ Deep Learning │ │ │
│ │ │ ┌────────────────────────┐ │ │ │
│ │ │ │ Generative AI │ │ │ │
│ │ │ └────────────────────────┘ │ │ │
│ │ └──────────────────────────────┘ │ │
│ └────────────────────────────────────┘ │
└──────────────────────────────────────────┘
| Term | Definition |
|---|---|
| Artificial Intelligence | Machines simulating human-like reasoning and decision-making. |
| Machine Learning | AI systems that learn from data without being given explicit rules. |
| Deep Learning | ML using neural networks with many layers to learn complex representations. |
| Natural Language Processing | AI for understanding and generating human language. |
| Computer Vision | AI for understanding images and video. |
Key Concept: All ML is AI, but not all AI is ML. All Deep Learning is ML, but not all ML is Deep Learning. Generative AI is a subset of Deep Learning.
1.2 Types of AI by Capability
| Type | Also Called | Description |
|---|---|---|
| Narrow AI | ANI | Designed for one specific task. Every production AI system today is narrow AI. |
| General AI | AGI | Hypothetical human-level general intelligence. Does not exist today. |
| Super AI | ASI | Hypothetical AI surpassing human intelligence. Theoretical only. |
Exam Tip: Every AWS AI service — Bedrock, SageMaker, Rekognition — is narrow AI. AGI and ASI are conceptual terms, not products.
2. Types of Machine Learning
ML is categorized by how models learn from data. The exam frequently gives a scenario and asks you to identify the correct learning paradigm.
2.1 Supervised Learning
In supervised learning, the model learns from labeled data — datasets where the correct answer is already known. The goal is to learn a mapping from inputs to outputs.
Classification
Predicts a discrete class or category. Two sub-types exist:
| Sub-type | Number of Classes | Example |
|---|---|---|
| Binary Classification | 2 | Spam vs. not spam |
| Multi-class Classification | 3 or more | Cat, dog, or bird |
Regression
Predicts a continuous numeric value. Examples include house price prediction and temperature forecasting.
2.2 Unsupervised Learning
The model finds patterns in unlabeled data. There is no correct answer to learn from — the model discovers structure on its own.
| Sub-type | Goal | Example Algorithms |
|---|---|---|
| Clustering | Group similar data points together | K-Means, DBSCAN |
| Dimensionality Reduction | Reduce the number of features while preserving information | PCA, t-SNE |
| Association | Find co-occurrence relationships between variables | Market basket analysis |
| Anomaly Detection | Identify data points that deviate from the norm | Isolation Forest |
2.3 Reinforcement Learning
An agent learns by interacting with an environment, receiving rewards for good actions and penalties for bad ones. It learns a policy — a strategy for maximizing cumulative reward over time.
| Component | Definition |
|---|---|
| Agent | The learner or decision-maker. |
| Environment | The world the agent interacts with. |
| Action | A choice the agent can make at each step. |
| State | The current situation the agent observes. |
| Reward | The feedback signal — positive or negative. |
| Policy | The learned strategy that maps states to actions. |
Exam Tip: If a scenario describes a system learning through trial, reward, and penalty — game playing, robotic control, autonomous driving — the answer is reinforcement learning.
2.4 Semi-Supervised and Self-Supervised Learning
| Type | Data Used | How It Works |
|---|---|---|
| Semi-Supervised | Small labeled + large unlabeled | Uses unlabeled data to improve a model trained on limited labels. |
| Self-Supervised | Unlabeled only | The model generates its own labels from the data structure (e.g., predict the next word). This is the foundation of LLMs like GPT and BERT. |
3. The ML Lifecycle
The ML lifecycle is an iterative, six-stage process from business problem definition to production monitoring. Projects routinely loop back through earlier stages when performance degrades or requirements change.
3.1 Pipeline Stages
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ 1. Define │───►│ 2. Collect │───►│ 3. Prepare │
│ Problem │ │ Data │ │ Data │
└──────────────┘ └──────────────┘ └──────────────┘
│
┌──────────────┐ ┌──────────────┐ ┌──────▼───────┐
│ 6. Monitor │◄───│ 5. Deploy │◄───│ 4. Train & │
│ & Retrain │ │ Model │ │ Evaluate │
└──────────────┘ └──────────────┘ └──────────────┘
| Stage | Key Activities |
|---|---|
| 1. Define Problem | Identify business goal, ML problem type, and success metrics. |
| 2. Collect Data | Gather data from databases, APIs, IoT sensors, and web sources. |
| 3. Prepare Data | Perform EDA, cleaning, feature engineering, and data splitting. |
| 4. Train and Evaluate | Select algorithm, train model, tune hyperparameters, evaluate on validation set. |
| 5. Deploy | Package model as an endpoint; run A/B tests. |
| 6. Monitor and Retrain | Detect data drift and model drift; trigger retraining when performance degrades. |
Key Concept: The ML lifecycle never truly ends. Production models must be continuously monitored and periodically retrained as real-world data evolves.
4. Key ML Concepts and Terminology
4.1 Bias, Variance, and Regularization
Bias measures how far a model's predictions are from the true values. Variance measures how much predictions fluctuate across different training datasets. Every model must balance the two.
| Condition | Bias | Variance | Problem | Fix |
|---|---|---|---|---|
| Underfitting | High | Low | Model too simple; misses patterns | Add features; increase model complexity |
| Overfitting | Low | High | Model memorizes noise; fails on new data | Regularization; more data; simpler model |
| Good Fit | Low | Low | Generalizes well to unseen data | — |
Regularization Techniques
| Technique | Mechanism | Effect |
|---|---|---|
| L1 (Lasso) | Adds absolute value of weights to loss | Drives some weights to zero; performs feature selection |
| L2 (Ridge) | Adds squared weights to loss | Shrinks all weights; no feature elimination |
| Dropout | Randomly deactivates neurons during training | Prevents co-adaptation in deep networks |
| Early Stopping | Halts training when validation loss stops improving | Avoids over-training without modifying the architecture |
4.2 Hyperparameters vs. Parameters
Key Concept: Parameters are learned automatically by the optimizer during training. Hyperparameters are set by the practitioner before training begins and control how learning happens.
| Property | Parameters | Hyperparameters |
|---|---|---|
| Who sets them | Optimizer (automatic) | Data scientist (manual) |
| Examples | Weights, biases | Learning rate, number of layers, batch size |
| When determined | During training | Before training begins |
Hyperparameter Tuning Methods
| Method | Description | Speed |
|---|---|---|
| Grid Search | Exhaustively tries every combination | Slow |
| Random Search | Randomly samples combinations | Medium |
| Bayesian Optimization | Uses prior results to guide the next trial | Fast |
| SageMaker Automatic Model Tuning | Managed Bayesian optimization | Fast |
5. Data Fundamentals for ML
5.1 Data Types and Quality
| Data Type | Description | Examples |
|---|---|---|
| Structured | Fixed rows and columns with a defined schema | CSV files, relational databases |
| Unstructured | No predefined format or schema | Images, audio, video, raw text |
| Semi-structured | Partial structure without a rigid schema | JSON, XML, application log files |
| Time Series | Sequential observations indexed by time | Stock prices, IoT sensor readings |
Data Quality Dimensions
| Dimension | Definition |
|---|---|
| Accuracy | Data correctly reflects real-world values. |
| Completeness | No required fields are missing. |
| Consistency | Data is uniform and coherent across all sources. |
| Timeliness | Data is current and not stale. |
| Uniqueness | No duplicate records exist. |
| Validity | Data conforms to defined formats, types, and ranges. |
Train / Validation / Test Split
| Split | Purpose | Typical Proportion |
|---|---|---|
| Training | Fit model parameters | 60–80% |
| Validation | Tune hyperparameters; monitor for overfitting | 10–20% |
| Test | Final, one-time unbiased evaluation | 10–20% |
Exam Tip: Test data must remain completely unseen during training and tuning. Using test data to make any development decision invalidates the evaluation and constitutes data leakage.
5.2 Handling Missing Data and Class Imbalance
Missing Data Strategies
| Strategy | When to Use |
|---|---|
| Deletion | Missing at random; sufficient data remains after removal |
| Mean / Median / Mode Imputation | Numerical features with a low rate of missingness |
| Forward / Backward Fill | Time-series data where adjacent values are informative |
| Predictive Imputation | High-value features with complex missing patterns |
| Indicator Variable | When the fact of missingness itself carries signal |
Class Imbalance Techniques
| Technique | Description |
|---|---|
| SMOTE | Synthesize new minority-class samples |
| Undersampling | Randomly remove majority-class samples |
| Class Weights | Penalize minority-class misclassification more heavily during training |
| Correct Evaluation | Use F1, Precision, or Recall — not accuracy — for imbalanced datasets |
6. Feature Engineering
Feature engineering is the process of creating, transforming, or selecting input variables to improve model performance. It is often the highest-leverage activity in an ML project.
6.1 Feature Types and Transformations
Numerical Transformations
| Transformation | Description | When to Use |
|---|---|---|
| Min-Max Scaling | Scales values to the range [0, 1] | Algorithms sensitive to feature magnitude (KNN, SVM) |
| Standardization (Z-score) | Produces mean = 0 and standard deviation = 1 | Most algorithms; removes scale effects |
| Log Transform | Compresses right-skewed distributions | Highly skewed numerical features |
| Binning | Converts a continuous value into a discrete bucket | Capturing non-linear threshold effects |
Categorical Transformations
| Transformation | Description | When to Use |
|---|---|---|
| One-Hot Encoding | Creates a binary column per category | Nominal categories with low cardinality |
| Label Encoding | Assigns an integer to each category | Tree-based models only |
| Target Encoding | Replaces each category with the mean of the target variable | High-cardinality categoricals |
Text Feature Methods
| Method | Description |
|---|---|
| Bag of Words | Counts the occurrences of each word per document |
| TF-IDF | Weights words by how frequent they are in a document relative to the corpus |
| Word Embeddings | Dense vector representations trained on large corpora (Word2Vec, GloVe) |
| BERT Embeddings | Contextual representations produced by a transformer model |
7. Model Training and Evaluation
7.1 Evaluation Metrics
Classification Metrics
The confusion matrix is the foundation for all classification metrics.
Predicted Positive Predicted Negative
Actual Positive │ TP │ FN │
Actual Negative │ FP │ TN │
| Metric | Formula | Optimize When |
|---|---|---|
| Accuracy | (TP + TN) / All | Classes are balanced |
| Precision | TP / (TP + FP) | False positives are costly (spam filter) |
| Recall (Sensitivity) | TP / (TP + FN) | False negatives are costly (cancer detection) |
| F1 Score | 2 × (P × R) / (P + R) | Classes are imbalanced; both P and R matter |
| AUC-ROC | Area under the ROC curve | Evaluating overall classifier quality |
Exam Tip: For fraud detection and medical diagnosis, prioritize Recall — a missed positive (false negative) is the most costly error. For spam filtering, prioritize Precision — a false positive blocks legitimate email.
Regression Metrics
| Metric | Description | Sensitive to Outliers |
|---|---|---|
| MAE | Mean absolute difference between predicted and actual values | No |
| MSE | Mean squared difference; amplifies large errors | Yes |
| RMSE | Square root of MSE; same unit as the target variable | Yes |
| R² | Proportion of target variance explained by the model; range 0–1 | Moderate |
8. ML Algorithms Overview
8.1 Supervised Algorithms
| Algorithm | Type | Key Characteristics |
|---|---|---|
| Linear Regression | Regression | Fits a line or hyperplane; highly interpretable |
| Logistic Regression | Classification | Uses sigmoid function; outputs a probability |
| Decision Tree | Both | Interpretable path; prone to overfitting without pruning |
| Random Forest | Both | Ensemble of trees using bagging; reduces variance |
| XGBoost / LightGBM | Both | Sequential boosting; top performer on tabular data |
| K-Nearest Neighbors | Both | Classifies by majority vote of K nearest training points |
| Support Vector Machine | Both | Finds the optimal separating hyperplane with maximum margin |
| Naive Bayes | Classification | Probabilistic; assumes feature independence; very fast |
SageMaker Built-in Algorithms
| Algorithm | Problem Type | Primary Use Case |
|---|---|---|
| XGBoost | Supervised | General-purpose tabular data; most popular built-in |
| Linear Learner | Supervised | Classification and regression on large datasets |
| K-Means | Unsupervised | Clustering |
| PCA | Unsupervised | Dimensionality reduction |
| Random Cut Forest | Anomaly Detection | Time-series anomaly detection |
| DeepAR | Forecasting | Time-series forecasting with deep learning |
| BlazingText | NLP | Word embeddings and text classification |
| Object Detection | Computer Vision | Detect and locate objects in images |
| Image Classification | Computer Vision | Assign images to categories |
| Factorization Machines | Supervised | Recommendation and click-through prediction |
8.2 Unsupervised Algorithms
| Algorithm | Type | Key Characteristics |
|---|---|---|
| K-Means | Clustering | Iteratively assigns points to K centroids; requires K upfront |
| DBSCAN | Clustering | Density-based; finds arbitrary shapes; no K required |
| Hierarchical Clustering | Clustering | Builds a tree (dendrogram) of clusters; no K required |
| PCA | Dimensionality Reduction | Projects data onto orthogonal axes of maximum variance |
| t-SNE | Dimensionality Reduction | Non-linear; excellent for 2D visualization of high-dimensional data |
| Isolation Forest | Anomaly Detection | Isolates anomalies using random feature splits; efficient at scale |
9. Deep Learning and Neural Networks
A neural network learns by adjusting internal weights and biases to minimize a loss function via backpropagation and gradient descent.
9.1 Network Architectures
Activation Functions
| Function | Output Range | Primary Use |
|---|---|---|
| ReLU | [0, ∞) | Default for hidden layers; computationally efficient |
| Sigmoid | (0, 1) | Binary classification output layer |
| Softmax | (0, 1) | Multi-class classification output layer |
| Tanh | (−1, 1) | Recurrent networks |
| Leaky ReLU | (−∞, ∞) | Avoids dying ReLU problem in deep networks |
Neural Network Types
| Architecture | Abbreviation | Best For |
|---|---|---|
| Feedforward Neural Network | FNN / MLP | Tabular data; general classification and regression |
| Convolutional Neural Network | CNN | Images and video; detects local spatial patterns |
| Long Short-Term Memory | LSTM | Sequential data; handles long-range dependencies |
| Transformer | — | Text, code, multi-modal; basis of all modern LLMs |
| Generative Adversarial Network | GAN | Image generation; synthetic data augmentation |
| Autoencoder | AE | Dimensionality reduction; anomaly detection; denoising |
Key Concept: The Transformer architecture, introduced in "Attention Is All You Need" (2017), uses self-attention to process all tokens in parallel. It is the foundation of GPT, BERT, Claude, and every major modern language model.
Transfer Learning Steps
| Step | Action |
|---|---|
| 1 | Select a model pre-trained on a large general dataset (e.g., ResNet, BERT) |
| 2 | Freeze most layers to preserve learned representations |
| 3 | Replace and retrain the final layers on your specific, smaller dataset |
10. AWS AI and ML Services
AWS organizes its AI/ML stack into three tiers based on the level of ML expertise required.
┌──────────────────────────────────────────────────────────────┐
│ Tier 1 — AI Services (no ML expertise needed) │
│ Rekognition · Comprehend · Polly · Transcribe · Translate │
├──────────────────────────────────────────────────────────────┤
│ Tier 2 — ML Services (build and train custom models) │
│ Amazon SageMaker │
├──────────────────────────────────────────────────────────────┤
│ Tier 3 — Infrastructure (GPUs and custom chips) │
│ EC2 GPU instances · AWS Trainium · AWS Inferentia │
└──────────────────────────────────────────────────────────────┘
10.1 Amazon SageMaker
Amazon SageMaker is AWS's fully managed ML platform for building, training, and deploying models at scale.
| Component | Purpose |
|---|---|
| SageMaker Studio | Web-based IDE for notebooks, experiments, and pipelines |
| SageMaker Autopilot | AutoML; automatically selects the algorithm and tunes the best model |
| SageMaker Canvas | No-code ML for business analysts; no programming required |
| SageMaker Data Wrangler | Visual, low-code data preparation and feature engineering |
| SageMaker Feature Store | Centralized, reusable repository for ML features |
| SageMaker Training Jobs | Managed distributed model training at scale |
| SageMaker Pipelines | CI/CD for ML; automates training and deployment workflows |
| SageMaker Model Registry | Version and approve models before deployment |
| SageMaker Clarify | Detect bias in data and models; generate SHAP explanations |
| SageMaker Model Monitor | Detect data drift and model degradation in production |
| SageMaker Ground Truth | Managed data labeling with human workforce |
SageMaker Deployment Options
| Deployment Type | Description | Best Use Case |
|---|---|---|
| Real-time Endpoint | Persistent, always-on low-latency endpoint | User-facing applications |
| Serverless Inference | Auto-scales to zero; pay per request | Intermittent or unpredictable traffic |
| Async Inference | Queues and processes large payloads asynchronously | Video processing, long documents |
| Batch Transform | Runs inference on a full dataset at once | Offline scoring, nightly reporting |
| Multi-Model Endpoint | Hosts many models on a single endpoint | Cost optimization with many models |
10.2 AWS Pre-Built AI Services
Vision Services
| Service | Key Capabilities |
|---|---|
| Amazon Rekognition | Object detection, facial recognition, content moderation, text in images |
| Amazon Textract | Extract text, key-value pairs, and tables from scanned documents |
| Amazon Lookout for Vision | Industrial defect detection using computer vision |
Language Services
| Service | Key Capabilities |
|---|---|
| Amazon Comprehend | Entity recognition, sentiment analysis, key phrases, topic modeling, PII detection |
| Amazon Comprehend Medical | Extract clinical entities and PHI from unstructured medical text |
| Amazon Translate | Neural machine translation across 75+ languages |
| Amazon Polly | Text-to-speech synthesis with natural-sounding voices |
| Amazon Transcribe | Speech-to-text with custom vocabulary and speaker identification |
| Amazon Lex | Conversational chatbots and voice assistants (same engine as Alexa) |
Search, Forecasting, and Analytics Services
| Service | Key Capabilities |
|---|---|
| Amazon Kendra | Intelligent enterprise search using NLP and ML relevance ranking |
| Amazon Personalize | Real-time personalization and recommendations |
| Amazon Forecast | ML-based time-series demand forecasting |
| Amazon Lookout for Metrics | Anomaly detection in business metrics |
| Amazon Fraud Detector | Build and deploy ML-based fraud detection models |
Exam Tips & Quick Reference
Scenario-to-Answer Mapping
| Scenario Keyword / Requirement | Correct Answer |
|---|---|
| "No ML expertise needed; build a model from business data" | SageMaker Canvas |
| "Automatically find the best model and hyperparameters" | SageMaker Autopilot |
| "Label large volumes of training data with human reviewers" | SageMaker Ground Truth |
| "Detect bias in training data before model training" | SageMaker Clarify (pre-training bias) |
| "Monitor a deployed model for input distribution changes" | SageMaker Model Monitor |
| "Classify images without training a custom model" | Amazon Rekognition |
| "Extract text and tables from PDF invoices" | Amazon Textract |
| "Analyze customer review sentiment at scale" | Amazon Comprehend |
| "Build a chatbot that understands user intent via voice or text" | Amazon Lex |
| "Predict future demand based on historical time-series data" | Amazon Forecast |
| "Detect anomalies in IoT sensor data" | SageMaker Random Cut Forest |
| "Prevent overfitting in a deep neural network" | Dropout, L2 regularization, early stopping |
| "Minimize missed fraud cases (false negatives)" | Optimize for Recall |
| "Minimize legitimate emails marked as spam (false positives)" | Optimize for Precision |
Common Traps
- Accuracy on imbalanced data: Accuracy looks good on imbalanced datasets but hides poor minority-class performance. The exam will present an imbalanced scenario and expect F1, Precision, or Recall — not accuracy.
- Autopilot vs. Canvas: Autopilot automates model building for practitioners who want ML control. Canvas targets business analysts with no ML background using a no-code interface.
- Parameters vs. hyperparameters: Parameters (weights, biases) are learned automatically. Hyperparameters (learning rate, depth) are set before training. The optimizer sets parameters; the data scientist sets hyperparameters.
- Validation vs. test set: Validation data is used iteratively during development. Test data is used exactly once at the very end. Tuning on test data is data leakage.
Key Terms — Domain 1
| Term | One-Line Definition |
|---|---|
| Epoch | One complete pass through the entire training dataset |
| Batch Size | Number of training samples processed before each weight update |
| Learning Rate | Step size the optimizer uses to update model weights |
| Gradient Descent | Iterative optimization algorithm that minimizes the loss function |
| Backpropagation | Algorithm that computes the gradient of loss w.r.t. each weight |
| Inference | Using a trained model to generate predictions on new, unseen data |
| Ground Truth | The correct, verified label for a training or evaluation sample |
| Data Drift | The distribution of input features changes after model deployment |
| Concept Drift | The relationship between inputs and the target variable changes over time |
| Overfitting | Model performs well on training data but poorly on unseen data |
| Underfitting | Model is too simple to capture the underlying patterns in the data |
| Transfer Learning | Reusing a model pre-trained on one task as the starting point for another |
| Ensemble | Combining multiple models to produce a stronger, more stable prediction |
End of Domain 1. Continue to Domain 2: Fundamentals of Generative AI →
Ready to test yourself?
Practice questions for this topic