AWSAIF-C01

Domain 2: Fundamentals of GenAI

Topic 2 of 5 · Study notes

AWS Certified AI Practitioner — Domain 2: Fundamentals of Generative AI

Exam Code: AIF-C01 | Level: Foundational
Domain Weight: 24% | Total Domains: 5 | Passing Score: 700/1000

What is Generative AI?
- 1.1 Traditional AI vs. Generative AI
- 1.2 What Generative AI Can Produce
Foundation Models
Large Language Models
- 3.1 How LLMs Are Pre-Trained
- 3.2 LLM Capabilities
The Transformer Architecture
Tokens and Context Windows
- 5.1 Tokenization
- 5.2 Context Window
Embeddings and Vector Databases
- 6.1 What are Embeddings?
- 6.2 Vector Databases
Prompt Engineering
- 7.1 Prompting Techniques
- 7.2 Inference Parameters
Model Customization Techniques
- 8.1 Technique Comparison
- 8.2 RAG vs. Fine-Tuning Decision Guide
Types of Generative AI Models
Amazon Bedrock
- 10.1 Core Features
- 10.2 Bedrock Security and Pricing
Amazon Q
Hallucination and Output Quality
Exam Tips and Quick Reference

1. What is Generative AI?

Generative AI is a subset of artificial intelligence that creates new content — text, images, audio, video, code, or synthetic data — by learning patterns from vast training datasets. Unlike traditional ML, which classifies or predicts, generative AI produces novel outputs.

1.1 Traditional AI vs. Generative AI

Aspect	Traditional ML / AI	Generative AI
Primary task	Classify, predict, detect	Create, generate, converse
Output type	Label, number, decision	Text, image, code, audio
Training data	Task-specific labeled dataset	Massive general-purpose corpus
Flexibility	Narrow, one task	Broad, many tasks
Example	Spam filter, fraud detection	Claude, DALL-E, Amazon Titan

1.2 What Generative AI Can Produce

Modality	Examples
Text	Articles, summaries, code, conversations, translations
Images	Photorealistic images, product visuals, artwork
Audio	Music, speech synthesis, sound effects
Video	Video generation, animation
Code	Write, debug, explain, and refactor source code
Synthetic Data	Training data for other ML models
Multi-modal	Combined text and image understanding or generation

2. Foundation Models

2.1 Definition and Key Characteristics

A Foundation Model (FM) is a large AI model trained on broad, internet-scale datasets using self-supervision, designed to be adapted across a wide range of downstream tasks. The term was introduced by Stanford HAI in 2021.

Characteristic	Description
Scale	Billions to trillions of parameters
Pre-training	Trained on massive, general-purpose datasets
Emergent abilities	Capabilities not explicitly trained for appear at scale
Adaptability	Can be fine-tuned or prompted for specific applications
Few-shot learning	Learns new tasks from only a few examples

Emergent Abilities

Capabilities that appear in large models but are absent in smaller versions include multi-step reasoning, code generation, language translation, in-context (few-shot) learning, and chain-of-thought reasoning.

2.2 Foundation Models vs. Traditional ML Models

Property	Traditional ML Model	Foundation Model
Training data	Task-specific, labeled	Broad, often unlabeled, massive scale
Training cost	Thousands to millions USD	Tens to hundreds of millions USD
Flexibility	One task	Many tasks
Customization	Retrain from scratch	Fine-tune or prompt
Access model	Build your own	API or download weights

2.3 Models Available on Amazon Bedrock

Provider	Model Family	Key Strength
Anthropic	Claude Haiku, Sonnet, Opus	Long context, safety, complex reasoning
Amazon	Titan Text, Titan Embeddings, Titan Image	Native AWS; data not shared for training
Meta	Llama 2, Llama 3	Open weights; strong at code
Mistral AI	Mistral, Mixtral	Efficient; multilingual
AI21 Labs	Jamba	Long context; summarization
Cohere	Command, Embed	Enterprise text generation and embeddings
Stability AI	Stable Diffusion XL	High-quality image generation

Exam Tip: Amazon Titan models are AWS-native. Customer data submitted to Bedrock is never used to train the underlying foundation models — this is a critical security guarantee.

3. Large Language Models

Large Language Models (LLMs) are foundation models specifically pre-trained on massive text corpora to understand and generate human language.

"Large" — billions to trillions of parameters
"Language" — trained primarily on text
"Model" — a statistical representation of language

3.1 How LLMs Are Pre-Trained

LLMs are trained using self-supervised learning on internet-scale text without manual labels.

Pre-training Approach	Style	Example Models
Next token prediction	Predict the next word given all previous words	GPT, Claude, Llama
Masked language modeling	Predict randomly masked words in a sentence	BERT, RoBERTa
RLHF	Fine-tune with human preference rankings to improve helpfulness	ChatGPT, Claude

RLHF — Reinforcement Learning from Human Feedback

1. Pre-train LLM on massive text corpus
         ↓
2. Supervised Fine-Tuning (SFT) on high-quality examples
         ↓
3. Train a Reward Model from human preference rankings
         ↓
4. Optimize LLM with RL (PPO) to maximize reward
         ↓
   Aligned, helpful, and safer model

3.2 LLM Capabilities

Capability	Description
Text generation	Write articles, emails, and creative content
Summarization	Condense long documents into key points
Question answering	Answer questions based on provided context
Translation	Convert text across languages
Code generation	Write, debug, and explain source code
Classification	Categorize text without task-specific training
Information extraction	Pull structured data from unstructured text
Reasoning	Perform multi-step logical problem solving

4. The Transformer Architecture

The Transformer, introduced in "Attention Is All You Need" (Vaswani et al., Google, 2017), is the architecture underlying all modern LLMs. Its key innovation is the self-attention mechanism, which allows every token to attend to every other token in parallel.

4.1 Core Mechanism

Self-Attention

For each token, self-attention computes how much weight to assign to every other token when producing its representation. This creates Query (Q), Key (K), and Value (V) vectors.

Attention(Q, K, V) = softmax(QKᵀ / √d) × V

Multi-Head Attention

Multiple attention heads run in parallel, each learning different types of relationships — syntactic, semantic, co-reference, and positional.

Positional Encoding

Because transformers process all tokens simultaneously (not sequentially), positional encodings are added to each token embedding to convey order information.

4.2 Encoder vs. Decoder vs. Encoder-Decoder

Architecture	Representative Models	Best For
Encoder-only	BERT, RoBERTa	Text understanding — classification, NER, embeddings
Decoder-only	GPT-4, Claude, Llama	Text generation, chat, code
Encoder-Decoder	T5, BART	Sequence-to-sequence — translation, summarization

Key Concept: Most modern chat and code models use the decoder-only architecture. They generate text one token at a time, each new token conditioned on all previous tokens.

4.3 Text Generation and Decoding Strategies

LLMs generate text one token at a time, left to right. At each step the model outputs a probability distribution over the entire vocabulary.

Strategy	Description	Best For
Greedy	Always selects the highest-probability token	Fast; often repetitive
Beam Search	Maintains K candidate sequences simultaneously	Structured, deterministic output
Top-K Sampling	Samples randomly from the K most probable tokens	Creative, varied text
Top-P (Nucleus)	Samples from tokens summing to probability P	Creative with coherence
Temperature Scaling	Scales the logits before sampling	Applied on top of any sampling strategy

5. Tokens and Context Windows

5.1 Tokenization

A token is the basic unit of text that an LLM processes. Tokenization splits raw text into tokens before the model sees it.

Approximation	Value
Characters per token (English)	~4
Words per token	~0.75
Tokens per page of text	~750
Tokens per 100 words	~133

Common Tokenization Methods

Method	Description	Used By
Byte-Pair Encoding (BPE)	Iteratively merges the most frequent character pairs	GPT, Llama
WordPiece	Similar to BPE but uses likelihood instead of frequency	BERT
SentencePiece	Language-agnostic subword tokenization	T5, multilingual models

Key Concept: Token count directly controls both context window usage and API cost. Input tokens and output tokens are each priced separately, with output tokens typically costing more.

5.2 Context Window

The context window is the maximum number of tokens — input plus output — the model can process in a single request.

Model Family	Context Window
Older models	2K–4K tokens
GPT-4 (standard)	8K–128K tokens
Claude 3	Up to 200K tokens
Gemini 1.5 Pro	1M+ tokens

6. Embeddings and Vector Databases

6.1 What are Embeddings?

Embeddings are dense numerical vectors that encode the semantic meaning of text (or other data). Semantically similar content produces vectors that are close together in the embedding space.

"king"   → [ 0.21, -0.40,  0.80,  0.11, ... ]
"queen"  → [ 0.20, -0.39,  0.79,  0.18, ... ]
"apple"  → [-0.32,  0.61, -0.10,  0.92, ... ]

Key Concept: Embeddings enable semantic search — finding documents by meaning rather than exact keyword match. They are the foundation of Retrieval Augmented Generation (RAG).

AWS Embedding Models (via Bedrock)

Model	Provider	Capability
Amazon Titan Embeddings	Amazon	Text embeddings for English and multilingual use
Cohere Embed	Cohere	High-quality multilingual embeddings

6.2 Vector Databases

Vector databases store embeddings and support fast Approximate Nearest Neighbor (ANN) search to find the most similar vectors to a query.

Database	Notes
Amazon OpenSearch (vector engine)	AWS-native; recommended for Bedrock Knowledge Bases
Amazon Aurora PostgreSQL (pgvector)	Relational DB with vector extension
Amazon RDS PostgreSQL (pgvector)	Managed relational DB with vector support
Pinecone	Purpose-built, fully managed vector database
Redis Enterprise	In-memory vector search
MongoDB Atlas	Document database with vector search

7. Prompt Engineering

Prompt engineering is the practice of designing and optimizing input text to get the best possible output from a generative AI model — without modifying model weights.

7.1 Prompting Techniques

Zero-Shot Prompting

Ask the model to perform a task with no examples provided.

Classify the sentiment of this review: "The shipping was late but the product is great."

Few-Shot Prompting

Provide labeled examples before the actual request to guide the model.

"I love this!" → Positive
"Terrible experience." → Negative
"It's okay." → Neutral
"This product changed my life!" →

Chain-of-Thought (CoT) Prompting

Instruct the model to reason step by step before answering. Dramatically improves performance on reasoning and math tasks.

Q: A store has 5 apples. 2 are sold and 3 more are added. How many remain?
A: Let me think step by step.
   Start: 5 apples
   After selling 2: 5 − 2 = 3 apples
   After adding 3: 3 + 3 = 6 apples
   Answer: 6

Other Key Techniques

Technique	Description
Role / Persona Prompting	Assign the model a role: "You are a senior AWS architect…"
System Prompt	Persistent instructions defining behavior, constraints, and persona
RAG Prompting	Inject retrieved document chunks into the prompt as context
ReAct	Model alternates between reasoning and calling external tools
Self-Consistency	Generate multiple CoT outputs and take the majority-vote answer

7.2 Inference Parameters

Parameter	Description	Effect
Temperature	Scales the probability distribution before sampling	0 = deterministic; >1 = more creative and varied
Top-P	Sample from the smallest set of tokens summing to probability P	Lower = more focused
Top-K	Sample from only the K most probable tokens	Lower = more conservative
Max Tokens	Maximum number of tokens in the generated response	Limits output length and cost
Stop Sequences	Strings that immediately halt generation	Controls format and length

Exam Tip: Temperature = 0 produces deterministic, consistent output — ideal for factual Q&A and classification. Temperature > 0 introduces randomness — ideal for creative writing and brainstorming.

8. Model Customization Techniques

When zero-shot and few-shot prompting are insufficient, the model itself can be customized. Techniques vary in cost, complexity, and whether they modify model weights.

8.1 Technique Comparison

Technique	Modifies Weights	Cost	Data Required	Best When
Prompt Engineering	No	Inference only	None	Always start here
RAG	No	Inference + retrieval	Documents / data source	Need current or proprietary information
Fine-Tuning	Yes	Training compute	Labeled examples	Need specific style, format, or domain vocabulary
Continued Pre-Training	Yes	Very high	Large unlabeled corpus	Entirely new domain with unique terminology

Fine-Tuning Methods

Method	Description	Use Case
Full Fine-Tuning	Update all model parameters	Best quality; highest cost
LoRA	Train small low-rank matrices; freeze most weights	Efficient; near-full-fine-tuning quality
QLoRA	LoRA combined with model quantization	Most memory-efficient fine-tuning
Instruction Fine-Tuning	Train on instruction-response pairs	Improve instruction-following behavior
RLHF	Train using human preference rankings	Align model with human values

8.2 RAG vs. Fine-Tuning Decision Guide

Scenario	Recommended Approach
Need up-to-date or real-time information	RAG
Need access to private company documents	RAG
Need to update knowledge frequently	RAG
Need a specific writing style or tone	Fine-Tuning
Need domain-specific vocabulary or terminology	Fine-Tuning
Have very limited labeled training data	RAG

9. Types of Generative AI Models

Model Type	Architecture	How It Works	AWS Example
Autoregressive LLM	Decoder-only transformer	Predicts next token left to right	Claude, Titan Text
Masked LM	Encoder-only transformer	Predicts masked tokens in context	BERT (embeddings, not generation)
Seq2Seq	Full encoder-decoder	Maps an input sequence to an output sequence	Translation, summarization
Diffusion Model	Iterative denoising	Learns to reverse a noise-addition process	Titan Image Generator, Stable Diffusion
GAN	Generator + Discriminator	Two networks compete until generator produces realistic output	Image synthesis
VAE	Encoder + Probabilistic Decoder	Encodes input to a latent distribution; samples to generate	Generative modeling
Multi-modal Model	Combined architectures	Processes and generates across text and images	Claude 3 Vision, Amazon Nova

Key Concept: Diffusion models are the dominant architecture for image generation. They work by learning to reverse a noise-addition process — starting from pure noise and iteratively denoising toward a coherent image conditioned on a text prompt.

10. Amazon Bedrock

Amazon Bedrock is a fully managed service that provides access to high-performing foundation models from multiple providers through a single, unified API — without requiring infrastructure management.

10.1 Core Features

Knowledge Bases (Managed RAG)

Bedrock Knowledge Bases provide a fully managed RAG pipeline.

Data Sources (S3, SharePoint, Confluence, web)
       ↓  Automatic chunking + embedding
Vector Store (OpenSearch Serverless, Aurora pgvector, Pinecone)
       ↓  At inference time
User Query → Embed → Retrieve top-K chunks → Inject into prompt → FM → Response

Supported Vector Store	Type
Amazon OpenSearch Serverless	Recommended; serverless
Amazon Aurora PostgreSQL	Relational with pgvector
Amazon RDS PostgreSQL	Managed relational with pgvector
Pinecone	Third-party, purpose-built
Redis Enterprise	Third-party, in-memory
MongoDB Atlas	Third-party, document DB

Agents

Bedrock Agents enable autonomous, multi-step task execution by orchestrating reasoning, knowledge base queries, and API calls via Lambda functions.

Agent Component	Description
Foundation Model	The reasoning engine for the agent
Instructions	System prompt defining the agent's role and constraints
Action Groups	APIs the agent can call, defined by an OpenAPI schema and backed by Lambda
Knowledge Bases	Document collections the agent can query

Guardrails

Bedrock Guardrails provide customizable safety controls applied at both input and output.

Guardrail Feature	Description
Content Filters	Block harmful content (hate, insults, sexual, violence, misconduct, prompt attacks)
Denied Topics	Block specific topics defined in natural language
Word Filters	Block exact words or phrases; includes a built-in profanity list
PII Redaction	Detect and mask personally identifiable information
Grounding Check	Detect hallucinations by comparing responses to retrieved source material

Model Evaluation

Evaluation Type	Description
Automatic Evaluation	Built-in metrics — accuracy, robustness, toxicity
Human Evaluation	Route responses to human reviewers; compare models side by side

Additional Bedrock Features

Feature	Description
Fine-Tuning	Further train select models (Titan, Llama, Cohere) on your data stored in S3
Continued Pre-Training	Pre-train select Titan models on unlabeled domain-specific data
Provisioned Throughput	Reserve dedicated model capacity for consistent latency; billed hourly
Prompt Caching	Cache static prompt prefixes to reduce cost and latency for repeated prefixes

10.2 Bedrock Security and Pricing

Dimension	Detail
Data privacy	Customer prompts and responses are not used to train AWS base models
Encryption	TLS for data in transit; SSE-KMS for fine-tuning data at rest
Access control	IAM policies at the model ARN level
Network	VPC interface endpoints via AWS PrivateLink available
Audit	All API calls logged in AWS CloudTrail
On-demand pricing	Billed per input token + output token; output tokens typically cost more
Provisioned pricing	Fixed hourly rate; 1-month or 6-month commitment

11. Amazon Q

Amazon Q is AWS's family of generative AI–powered assistants tailored to specific contexts.

Product	Target User	Key Capabilities
Amazon Q Business	Enterprise employees	Connect to 40+ data sources; respects existing access controls; custom plugins
Amazon Q Developer	Developers	Inline code suggestions, generation, security scanning, code transformation, CLI completions
Amazon Q in QuickSight	Business analysts	Natural language queries for BI dashboards: "Show top 10 products by revenue"
Amazon Q in AWS Console	AWS users	In-console assistant for service questions, troubleshooting, and environment queries

Exam Tip: Amazon Q Developer replaced Amazon CodeWhisperer. It adds security scanning, code transformation (e.g., upgrade Java 8 → 17), and CLI completions beyond simple code suggestions.

12. Hallucination and Output Quality

Hallucination occurs when an LLM confidently generates plausible but factually incorrect information.

Hallucination Types

Type	Description
Factual Hallucination	Incorrect facts stated with confidence
Source Hallucination	Cites non-existent papers, links, or documents
Logical Hallucination	Internally inconsistent reasoning

Mitigation Strategies

Strategy	How It Helps
RAG	Grounds responses in retrieved, verifiable facts
Grounding Check (Bedrock Guardrails)	Detects and blocks responses not supported by retrieved context
Lower Temperature	Reduces randomness; model stays closer to likely facts
Require Citations	Prompt the model to cite sources for all factual claims
Human Review	Mandatory for high-stakes or irreversible decisions

LLM Evaluation Metrics

Metric	Description	Used For
ROUGE	Recall-Oriented Understudy for Gisting Evaluation	Summarization quality
BLEU	Bilingual Evaluation Understudy	Translation quality
Perplexity	Measure of model uncertainty over a text sequence	Language modeling quality
LLM-as-Judge	Use a powerful LLM to score outputs from another model	Scalable automated evaluation

Exam Tips & Quick Reference

Scenario-to-Answer Mapping

Scenario Keyword / Requirement	Correct Answer
"Access multiple foundation models via a single API"	Amazon Bedrock
"Ground LLM responses in proprietary company documents"	Bedrock Knowledge Bases (RAG)
"LLM must call external APIs and complete multi-step tasks"	Bedrock Agents
"Filter harmful content from model inputs and outputs"	Bedrock Guardrails
"Customize model behavior without changing model weights"	Prompt engineering
"Customize model for specific domain vocabulary and style"	Fine-tuning via Bedrock
"Responses are confidently wrong"	Hallucination; mitigate with RAG and grounding check
"AI coding assistant that also scans for security vulnerabilities"	Amazon Q Developer
"Enterprise chatbot connected to SharePoint and Confluence"	Amazon Q Business
"Consistent, deterministic responses for factual Q&A"	Temperature = 0
"More creative, varied responses for brainstorming"	Higher temperature (0.7–1.0)
"Reduce cost on repeated prompts with long system prompts"	Bedrock prompt caching
"Reserve model capacity for consistent latency SLA"	Bedrock Provisioned Throughput

Common Traps

RAG vs. Fine-Tuning: RAG is for grounding in current or proprietary data and requires no training. Fine-tuning is for teaching a specific style, tone, or vocabulary and requires training data and compute. The exam frequently conflates them.
Temperature = 0 is not random: Temperature = 0 makes generation deterministic — the model always picks the highest-probability token. It does not mean the model refuses to answer.
Tokens ≠ Words: Tokens are subword units. A single English word may be one or more tokens. API costs are billed per token, not per word.
Amazon Q Developer ≠ Amazon Q Business: Q Developer is for writing code; Q Business is for enterprise knowledge management. They are different products with different pricing.
Bedrock does not train on your data: Customer prompts and responses submitted to Bedrock are never used to train or improve AWS base models — a fundamental privacy guarantee.

Key Terms — Domain 2

Term	One-Line Definition
Foundation Model	A large model pre-trained on broad data and adaptable to many tasks
LLM	A language-specific foundation model trained on massive text corpora
Token	The basic unit of text a model processes (~4 characters in English)
Context Window	The maximum number of tokens a model can process in one request
Hallucination	A model generating confident but factually incorrect information
Embedding	A dense numerical vector encoding the semantic meaning of text
Vector Database	A database optimized for similarity search over embedding vectors
RAG	Grounding LLM responses by retrieving relevant context from a knowledge base
Temperature	A parameter controlling the randomness of token sampling
RLHF	Training technique that uses human preference rankings to align a model
Fine-Tuning	Continuing to train a foundation model on domain-specific data
Guardrails	Configurable filters applied to model inputs and outputs for safety
Agent	An LLM-powered system that plans and executes multi-step tasks using tools
Zero-Shot	Prompting a model to perform a task with no examples provided
Few-Shot	Providing a small number of labeled examples in the prompt
Chain-of-Thought	Prompting the model to reason step by step before answering

End of Domain 2. Continue to Domain 3: Applications of Foundation Models →

Domain 1: Fundamentals of AI and ML

Domain 3: Applications of Foundation Models

Ready to test yourself?

Practice questions for this topic

Start Practicing →