Courses/AIF-C01/Domain 2: Fundamentals of GenAI
Practice questions →
AWSAIF-C01

Domain 2: Fundamentals of GenAI

Topic 2 of 5 · Study notes

AWS Certified AI Practitioner — Domain 2: Fundamentals of Generative AI

Exam Code: AIF-C01  |  Level: Foundational
Domain Weight: 24%  |  Total Domains: 5  |  Passing Score: 700/1000


Table of Contents

  1. What is Generative AI?
  2. Foundation Models
  3. Large Language Models
  4. The Transformer Architecture
  5. Tokens and Context Windows
  6. Embeddings and Vector Databases
  7. Prompt Engineering
  8. Model Customization Techniques
  9. Types of Generative AI Models
  10. Amazon Bedrock
  11. Amazon Q
  12. Hallucination and Output Quality
  13. Exam Tips and Quick Reference

1. What is Generative AI?

Generative AI is a subset of artificial intelligence that creates new content — text, images, audio, video, code, or synthetic data — by learning patterns from vast training datasets. Unlike traditional ML, which classifies or predicts, generative AI produces novel outputs.

1.1 Traditional AI vs. Generative AI

Aspect Traditional ML / AI Generative AI
Primary task Classify, predict, detect Create, generate, converse
Output type Label, number, decision Text, image, code, audio
Training data Task-specific labeled dataset Massive general-purpose corpus
Flexibility Narrow, one task Broad, many tasks
Example Spam filter, fraud detection Claude, DALL-E, Amazon Titan

1.2 What Generative AI Can Produce

Modality Examples
Text Articles, summaries, code, conversations, translations
Images Photorealistic images, product visuals, artwork
Audio Music, speech synthesis, sound effects
Video Video generation, animation
Code Write, debug, explain, and refactor source code
Synthetic Data Training data for other ML models
Multi-modal Combined text and image understanding or generation

2. Foundation Models

2.1 Definition and Key Characteristics

A Foundation Model (FM) is a large AI model trained on broad, internet-scale datasets using self-supervision, designed to be adapted across a wide range of downstream tasks. The term was introduced by Stanford HAI in 2021.

Characteristic Description
Scale Billions to trillions of parameters
Pre-training Trained on massive, general-purpose datasets
Emergent abilities Capabilities not explicitly trained for appear at scale
Adaptability Can be fine-tuned or prompted for specific applications
Few-shot learning Learns new tasks from only a few examples

Emergent Abilities

Capabilities that appear in large models but are absent in smaller versions include multi-step reasoning, code generation, language translation, in-context (few-shot) learning, and chain-of-thought reasoning.

2.2 Foundation Models vs. Traditional ML Models

Property Traditional ML Model Foundation Model
Training data Task-specific, labeled Broad, often unlabeled, massive scale
Training cost Thousands to millions USD Tens to hundreds of millions USD
Flexibility One task Many tasks
Customization Retrain from scratch Fine-tune or prompt
Access model Build your own API or download weights

2.3 Models Available on Amazon Bedrock

Provider Model Family Key Strength
Anthropic Claude Haiku, Sonnet, Opus Long context, safety, complex reasoning
Amazon Titan Text, Titan Embeddings, Titan Image Native AWS; data not shared for training
Meta Llama 2, Llama 3 Open weights; strong at code
Mistral AI Mistral, Mixtral Efficient; multilingual
AI21 Labs Jamba Long context; summarization
Cohere Command, Embed Enterprise text generation and embeddings
Stability AI Stable Diffusion XL High-quality image generation

Exam Tip: Amazon Titan models are AWS-native. Customer data submitted to Bedrock is never used to train the underlying foundation models — this is a critical security guarantee.


3. Large Language Models

Large Language Models (LLMs) are foundation models specifically pre-trained on massive text corpora to understand and generate human language.

  • "Large" — billions to trillions of parameters
  • "Language" — trained primarily on text
  • "Model" — a statistical representation of language

3.1 How LLMs Are Pre-Trained

LLMs are trained using self-supervised learning on internet-scale text without manual labels.

Pre-training Approach Style Example Models
Next token prediction Predict the next word given all previous words GPT, Claude, Llama
Masked language modeling Predict randomly masked words in a sentence BERT, RoBERTa
RLHF Fine-tune with human preference rankings to improve helpfulness ChatGPT, Claude

RLHF — Reinforcement Learning from Human Feedback

1. Pre-train LLM on massive text corpus
         ↓
2. Supervised Fine-Tuning (SFT) on high-quality examples
         ↓
3. Train a Reward Model from human preference rankings
         ↓
4. Optimize LLM with RL (PPO) to maximize reward
         ↓
   Aligned, helpful, and safer model

3.2 LLM Capabilities

Capability Description
Text generation Write articles, emails, and creative content
Summarization Condense long documents into key points
Question answering Answer questions based on provided context
Translation Convert text across languages
Code generation Write, debug, and explain source code
Classification Categorize text without task-specific training
Information extraction Pull structured data from unstructured text
Reasoning Perform multi-step logical problem solving

4. The Transformer Architecture

The Transformer, introduced in "Attention Is All You Need" (Vaswani et al., Google, 2017), is the architecture underlying all modern LLMs. Its key innovation is the self-attention mechanism, which allows every token to attend to every other token in parallel.

4.1 Core Mechanism

Self-Attention

For each token, self-attention computes how much weight to assign to every other token when producing its representation. This creates Query (Q), Key (K), and Value (V) vectors.

Attention(Q, K, V) = softmax(QKᵀ / √d) × V

Multi-Head Attention

Multiple attention heads run in parallel, each learning different types of relationships — syntactic, semantic, co-reference, and positional.

Positional Encoding

Because transformers process all tokens simultaneously (not sequentially), positional encodings are added to each token embedding to convey order information.

4.2 Encoder vs. Decoder vs. Encoder-Decoder

Architecture Representative Models Best For
Encoder-only BERT, RoBERTa Text understanding — classification, NER, embeddings
Decoder-only GPT-4, Claude, Llama Text generation, chat, code
Encoder-Decoder T5, BART Sequence-to-sequence — translation, summarization

Key Concept: Most modern chat and code models use the decoder-only architecture. They generate text one token at a time, each new token conditioned on all previous tokens.

4.3 Text Generation and Decoding Strategies

LLMs generate text one token at a time, left to right. At each step the model outputs a probability distribution over the entire vocabulary.

Strategy Description Best For
Greedy Always selects the highest-probability token Fast; often repetitive
Beam Search Maintains K candidate sequences simultaneously Structured, deterministic output
Top-K Sampling Samples randomly from the K most probable tokens Creative, varied text
Top-P (Nucleus) Samples from tokens summing to probability P Creative with coherence
Temperature Scaling Scales the logits before sampling Applied on top of any sampling strategy

5. Tokens and Context Windows

5.1 Tokenization

A token is the basic unit of text that an LLM processes. Tokenization splits raw text into tokens before the model sees it.

Approximation Value
Characters per token (English) ~4
Words per token ~0.75
Tokens per page of text ~750
Tokens per 100 words ~133

Common Tokenization Methods

Method Description Used By
Byte-Pair Encoding (BPE) Iteratively merges the most frequent character pairs GPT, Llama
WordPiece Similar to BPE but uses likelihood instead of frequency BERT
SentencePiece Language-agnostic subword tokenization T5, multilingual models

Key Concept: Token count directly controls both context window usage and API cost. Input tokens and output tokens are each priced separately, with output tokens typically costing more.

5.2 Context Window

The context window is the maximum number of tokens — input plus output — the model can process in a single request.

Model Family Context Window
Older models 2K–4K tokens
GPT-4 (standard) 8K–128K tokens
Claude 3 Up to 200K tokens
Gemini 1.5 Pro 1M+ tokens

6. Embeddings and Vector Databases

6.1 What are Embeddings?

Embeddings are dense numerical vectors that encode the semantic meaning of text (or other data). Semantically similar content produces vectors that are close together in the embedding space.

"king"   → [ 0.21, -0.40,  0.80,  0.11, ... ]
"queen"  → [ 0.20, -0.39,  0.79,  0.18, ... ]
"apple"  → [-0.32,  0.61, -0.10,  0.92, ... ]

Key Concept: Embeddings enable semantic search — finding documents by meaning rather than exact keyword match. They are the foundation of Retrieval Augmented Generation (RAG).

AWS Embedding Models (via Bedrock)

Model Provider Capability
Amazon Titan Embeddings Amazon Text embeddings for English and multilingual use
Cohere Embed Cohere High-quality multilingual embeddings

6.2 Vector Databases

Vector databases store embeddings and support fast Approximate Nearest Neighbor (ANN) search to find the most similar vectors to a query.

Database Notes
Amazon OpenSearch (vector engine) AWS-native; recommended for Bedrock Knowledge Bases
Amazon Aurora PostgreSQL (pgvector) Relational DB with vector extension
Amazon RDS PostgreSQL (pgvector) Managed relational DB with vector support
Pinecone Purpose-built, fully managed vector database
Redis Enterprise In-memory vector search
MongoDB Atlas Document database with vector search

7. Prompt Engineering

Prompt engineering is the practice of designing and optimizing input text to get the best possible output from a generative AI model — without modifying model weights.

7.1 Prompting Techniques

Zero-Shot Prompting

Ask the model to perform a task with no examples provided.

Classify the sentiment of this review: "The shipping was late but the product is great."

Few-Shot Prompting

Provide labeled examples before the actual request to guide the model.

"I love this!" → Positive
"Terrible experience." → Negative
"It's okay." → Neutral
"This product changed my life!" →

Chain-of-Thought (CoT) Prompting

Instruct the model to reason step by step before answering. Dramatically improves performance on reasoning and math tasks.

Q: A store has 5 apples. 2 are sold and 3 more are added. How many remain?
A: Let me think step by step.
   Start: 5 apples
   After selling 2: 5 − 2 = 3 apples
   After adding 3: 3 + 3 = 6 apples
   Answer: 6

Other Key Techniques

Technique Description
Role / Persona Prompting Assign the model a role: "You are a senior AWS architect…"
System Prompt Persistent instructions defining behavior, constraints, and persona
RAG Prompting Inject retrieved document chunks into the prompt as context
ReAct Model alternates between reasoning and calling external tools
Self-Consistency Generate multiple CoT outputs and take the majority-vote answer

7.2 Inference Parameters

Parameter Description Effect
Temperature Scales the probability distribution before sampling 0 = deterministic; >1 = more creative and varied
Top-P Sample from the smallest set of tokens summing to probability P Lower = more focused
Top-K Sample from only the K most probable tokens Lower = more conservative
Max Tokens Maximum number of tokens in the generated response Limits output length and cost
Stop Sequences Strings that immediately halt generation Controls format and length

Exam Tip: Temperature = 0 produces deterministic, consistent output — ideal for factual Q&A and classification. Temperature > 0 introduces randomness — ideal for creative writing and brainstorming.


8. Model Customization Techniques

When zero-shot and few-shot prompting are insufficient, the model itself can be customized. Techniques vary in cost, complexity, and whether they modify model weights.

8.1 Technique Comparison

Technique Modifies Weights Cost Data Required Best When
Prompt Engineering No Inference only None Always start here
RAG No Inference + retrieval Documents / data source Need current or proprietary information
Fine-Tuning Yes Training compute Labeled examples Need specific style, format, or domain vocabulary
Continued Pre-Training Yes Very high Large unlabeled corpus Entirely new domain with unique terminology

Fine-Tuning Methods

Method Description Use Case
Full Fine-Tuning Update all model parameters Best quality; highest cost
LoRA Train small low-rank matrices; freeze most weights Efficient; near-full-fine-tuning quality
QLoRA LoRA combined with model quantization Most memory-efficient fine-tuning
Instruction Fine-Tuning Train on instruction-response pairs Improve instruction-following behavior
RLHF Train using human preference rankings Align model with human values

8.2 RAG vs. Fine-Tuning Decision Guide

Scenario Recommended Approach
Need up-to-date or real-time information RAG
Need access to private company documents RAG
Need to update knowledge frequently RAG
Need a specific writing style or tone Fine-Tuning
Need domain-specific vocabulary or terminology Fine-Tuning
Have very limited labeled training data RAG

9. Types of Generative AI Models

Model Type Architecture How It Works AWS Example
Autoregressive LLM Decoder-only transformer Predicts next token left to right Claude, Titan Text
Masked LM Encoder-only transformer Predicts masked tokens in context BERT (embeddings, not generation)
Seq2Seq Full encoder-decoder Maps an input sequence to an output sequence Translation, summarization
Diffusion Model Iterative denoising Learns to reverse a noise-addition process Titan Image Generator, Stable Diffusion
GAN Generator + Discriminator Two networks compete until generator produces realistic output Image synthesis
VAE Encoder + Probabilistic Decoder Encodes input to a latent distribution; samples to generate Generative modeling
Multi-modal Model Combined architectures Processes and generates across text and images Claude 3 Vision, Amazon Nova

Key Concept: Diffusion models are the dominant architecture for image generation. They work by learning to reverse a noise-addition process — starting from pure noise and iteratively denoising toward a coherent image conditioned on a text prompt.


10. Amazon Bedrock

Amazon Bedrock is a fully managed service that provides access to high-performing foundation models from multiple providers through a single, unified API — without requiring infrastructure management.

10.1 Core Features

Knowledge Bases (Managed RAG)

Bedrock Knowledge Bases provide a fully managed RAG pipeline.

Data Sources (S3, SharePoint, Confluence, web)
       ↓  Automatic chunking + embedding
Vector Store (OpenSearch Serverless, Aurora pgvector, Pinecone)
       ↓  At inference time
User Query → Embed → Retrieve top-K chunks → Inject into prompt → FM → Response
Supported Vector Store Type
Amazon OpenSearch Serverless Recommended; serverless
Amazon Aurora PostgreSQL Relational with pgvector
Amazon RDS PostgreSQL Managed relational with pgvector
Pinecone Third-party, purpose-built
Redis Enterprise Third-party, in-memory
MongoDB Atlas Third-party, document DB

Agents

Bedrock Agents enable autonomous, multi-step task execution by orchestrating reasoning, knowledge base queries, and API calls via Lambda functions.

Agent Component Description
Foundation Model The reasoning engine for the agent
Instructions System prompt defining the agent's role and constraints
Action Groups APIs the agent can call, defined by an OpenAPI schema and backed by Lambda
Knowledge Bases Document collections the agent can query

Guardrails

Bedrock Guardrails provide customizable safety controls applied at both input and output.

Guardrail Feature Description
Content Filters Block harmful content (hate, insults, sexual, violence, misconduct, prompt attacks)
Denied Topics Block specific topics defined in natural language
Word Filters Block exact words or phrases; includes a built-in profanity list
PII Redaction Detect and mask personally identifiable information
Grounding Check Detect hallucinations by comparing responses to retrieved source material

Model Evaluation

Evaluation Type Description
Automatic Evaluation Built-in metrics — accuracy, robustness, toxicity
Human Evaluation Route responses to human reviewers; compare models side by side

Additional Bedrock Features

Feature Description
Fine-Tuning Further train select models (Titan, Llama, Cohere) on your data stored in S3
Continued Pre-Training Pre-train select Titan models on unlabeled domain-specific data
Provisioned Throughput Reserve dedicated model capacity for consistent latency; billed hourly
Prompt Caching Cache static prompt prefixes to reduce cost and latency for repeated prefixes

10.2 Bedrock Security and Pricing

Dimension Detail
Data privacy Customer prompts and responses are not used to train AWS base models
Encryption TLS for data in transit; SSE-KMS for fine-tuning data at rest
Access control IAM policies at the model ARN level
Network VPC interface endpoints via AWS PrivateLink available
Audit All API calls logged in AWS CloudTrail
On-demand pricing Billed per input token + output token; output tokens typically cost more
Provisioned pricing Fixed hourly rate; 1-month or 6-month commitment

11. Amazon Q

Amazon Q is AWS's family of generative AI–powered assistants tailored to specific contexts.

Product Target User Key Capabilities
Amazon Q Business Enterprise employees Connect to 40+ data sources; respects existing access controls; custom plugins
Amazon Q Developer Developers Inline code suggestions, generation, security scanning, code transformation, CLI completions
Amazon Q in QuickSight Business analysts Natural language queries for BI dashboards: "Show top 10 products by revenue"
Amazon Q in AWS Console AWS users In-console assistant for service questions, troubleshooting, and environment queries

Exam Tip: Amazon Q Developer replaced Amazon CodeWhisperer. It adds security scanning, code transformation (e.g., upgrade Java 8 → 17), and CLI completions beyond simple code suggestions.


12. Hallucination and Output Quality

Hallucination occurs when an LLM confidently generates plausible but factually incorrect information.

Hallucination Types

Type Description
Factual Hallucination Incorrect facts stated with confidence
Source Hallucination Cites non-existent papers, links, or documents
Logical Hallucination Internally inconsistent reasoning

Mitigation Strategies

Strategy How It Helps
RAG Grounds responses in retrieved, verifiable facts
Grounding Check (Bedrock Guardrails) Detects and blocks responses not supported by retrieved context
Lower Temperature Reduces randomness; model stays closer to likely facts
Require Citations Prompt the model to cite sources for all factual claims
Human Review Mandatory for high-stakes or irreversible decisions

LLM Evaluation Metrics

Metric Description Used For
ROUGE Recall-Oriented Understudy for Gisting Evaluation Summarization quality
BLEU Bilingual Evaluation Understudy Translation quality
Perplexity Measure of model uncertainty over a text sequence Language modeling quality
LLM-as-Judge Use a powerful LLM to score outputs from another model Scalable automated evaluation

Exam Tips & Quick Reference

Scenario-to-Answer Mapping

Scenario Keyword / Requirement Correct Answer
"Access multiple foundation models via a single API" Amazon Bedrock
"Ground LLM responses in proprietary company documents" Bedrock Knowledge Bases (RAG)
"LLM must call external APIs and complete multi-step tasks" Bedrock Agents
"Filter harmful content from model inputs and outputs" Bedrock Guardrails
"Customize model behavior without changing model weights" Prompt engineering
"Customize model for specific domain vocabulary and style" Fine-tuning via Bedrock
"Responses are confidently wrong" Hallucination; mitigate with RAG and grounding check
"AI coding assistant that also scans for security vulnerabilities" Amazon Q Developer
"Enterprise chatbot connected to SharePoint and Confluence" Amazon Q Business
"Consistent, deterministic responses for factual Q&A" Temperature = 0
"More creative, varied responses for brainstorming" Higher temperature (0.7–1.0)
"Reduce cost on repeated prompts with long system prompts" Bedrock prompt caching
"Reserve model capacity for consistent latency SLA" Bedrock Provisioned Throughput

Common Traps

  • RAG vs. Fine-Tuning: RAG is for grounding in current or proprietary data and requires no training. Fine-tuning is for teaching a specific style, tone, or vocabulary and requires training data and compute. The exam frequently conflates them.
  • Temperature = 0 is not random: Temperature = 0 makes generation deterministic — the model always picks the highest-probability token. It does not mean the model refuses to answer.
  • Tokens ≠ Words: Tokens are subword units. A single English word may be one or more tokens. API costs are billed per token, not per word.
  • Amazon Q Developer ≠ Amazon Q Business: Q Developer is for writing code; Q Business is for enterprise knowledge management. They are different products with different pricing.
  • Bedrock does not train on your data: Customer prompts and responses submitted to Bedrock are never used to train or improve AWS base models — a fundamental privacy guarantee.

Key Terms — Domain 2

Term One-Line Definition
Foundation Model A large model pre-trained on broad data and adaptable to many tasks
LLM A language-specific foundation model trained on massive text corpora
Token The basic unit of text a model processes (~4 characters in English)
Context Window The maximum number of tokens a model can process in one request
Hallucination A model generating confident but factually incorrect information
Embedding A dense numerical vector encoding the semantic meaning of text
Vector Database A database optimized for similarity search over embedding vectors
RAG Grounding LLM responses by retrieving relevant context from a knowledge base
Temperature A parameter controlling the randomness of token sampling
RLHF Training technique that uses human preference rankings to align a model
Fine-Tuning Continuing to train a foundation model on domain-specific data
Guardrails Configurable filters applied to model inputs and outputs for safety
Agent An LLM-powered system that plans and executes multi-step tasks using tools
Zero-Shot Prompting a model to perform a task with no examples provided
Few-Shot Providing a small number of labeled examples in the prompt
Chain-of-Thought Prompting the model to reason step by step before answering

End of Domain 2. Continue to Domain 3: Applications of Foundation Models →

Ready to test yourself?

Practice questions for this topic

Start Practicing →

AIF-C01 Topics

Topic 2 of 5