AWSAIF-C01

Domain 3: Applications of Foundation Models

Topic 3 of 5 · Study notes

AWS Certified AI Practitioner — Domain 3: Applications of Foundation Models

Exam Code: AIF-C01 | Level: Foundational
Domain Weight: 28% | Total Domains: 5 | Passing Score: 700/1000

Foundation Model Application Patterns
- 1.1 Development Levels
- 1.2 Application Selection Criteria
Retrieval Augmented Generation
AI Agents
- 3.1 Agent Loop and Architecture Patterns
- 3.2 Amazon Bedrock Agents
Text-Based Applications
- 4.1 Summarization
- 4.2 Classification and Information Extraction
Code Generation Applications
Image and Vision Applications
- 6.1 Image Generation
- 6.2 Image Understanding
Conversational AI and Chatbots
- 7.1 Chatbot Types and Memory
Search and Information Retrieval
Document Processing and Analysis
Building GenAI Applications on AWS
- 10.1 AWS Services for GenAI Applications
- 10.2 AWS Architecture Patterns
Model Selection and Evaluation
- 11.1 Choosing the Right Model
- 11.2 Performance and Cost Optimization
Exam Tips and Quick Reference

1. Foundation Model Application Patterns

Foundation models enable a new application development paradigm — natural language replaces explicit logic, and a single model can power many different features.

1.1 Development Levels

The depth of customization and expertise required increases with each level.

Level	Approach	Example
1	Use pre-built AI APIs with no ML knowledge	Amazon Rekognition, Comprehend, Polly
2	Prompt foundation models via API	Amazon Bedrock with simple prompts
3	RAG — ground responses in private knowledge	Bedrock + Knowledge Bases
4	Fine-tune a foundation model on domain data	Bedrock fine-tuning
5	AI Agents — autonomous multi-step workflows	Bedrock Agents + Lambda
6	Train a custom foundation model from scratch	Rare; extremely expensive

1.2 Application Selection Criteria

Factor	Question to Answer
Task complexity	Simple classification or open-ended reasoning?
Data requirements	Is proprietary or real-time data needed?
Latency requirement	Synchronous real-time or asynchronous batch?
Cost constraints	Per-call vs. upfront training investment?
Accuracy requirement	How costly are errors?
Customization need	Does the model need domain adaptation?

2. Retrieval Augmented Generation

Retrieval Augmented Generation (RAG) solves two fundamental LLM limitations: knowledge cutoff dates and the absence of private or proprietary information. At inference time, relevant context is dynamically retrieved from an external knowledge base and injected into the prompt.

2.1 RAG Architecture

RAG operates in two distinct phases:

── INDEXING PHASE (runs offline) ─────────────────────────
  Documents (S3, SharePoint, web, etc.)
       ↓ Document loader
  [Text Splitter / Chunker]
       ↓
  [Embedding Model] → Dense Vectors
       ↓
  [Vector Database]  ← stored for retrieval

── RETRIEVAL PHASE (runs at inference time) ──────────────
  User Query
       ↓
  [Embedding Model] → Query Vector
       ↓
  [Vector DB: ANN Search] → Top-K Chunks
       ↓
  [Prompt Assembly] System + Retrieved Context + Query
       ↓
  [Foundation Model] → Grounded Response

Common RAG Quality Issues and Fixes

Issue	Likely Cause	Fix
Missing context	Document not indexed	Verify ingestion pipeline; check data source sync
Wrong chunk retrieved	Poor chunking or weak embedder	Adjust chunk size; use a stronger embedding model
Hallucination persists	Prompt ignores context	Add "Answer only using the provided context."
Irrelevant results	Poor query-to-document similarity	Apply query rewriting or HyDE
Stale information	Index out of date	Schedule periodic re-syncs

2.2 Chunking Strategies

How documents are split significantly affects retrieval quality. Smaller chunks produce more precise retrieval; larger chunks provide more surrounding context.

Strategy	Description	Best For
Fixed Size	Split at a fixed number of characters or tokens	Simple, uniform documents
Sentence Splitting	Split at sentence boundaries	Narrative articles
Recursive Character	Split at paragraphs first, then sentences, then characters	General-purpose default
Semantic Chunking	Group sentences with similar meaning	Dense, topic-rich documents
Document Structure	Split at headings or HTML tags	Structured documents, Markdown, HTML

Note: A typical starting configuration is 512 tokens per chunk with 10–20% overlap between adjacent chunks to prevent missing context at boundaries.

2.3 Retrieval Methods

Method	Mechanism	Best For
Dense (Semantic)	Embedding vector similarity search	Natural language queries; meaning-based retrieval
Sparse (Keyword)	BM25 / TF-IDF exact or fuzzy keyword match	Known terminology; product codes
Hybrid	Weighted combination of dense and sparse	Best overall quality; recommended for production
Re-ranking	A second cross-encoder model reorders results	Improving precision of top-K results

Advanced RAG Techniques

Technique	Description
HyDE	Generate a hypothetical answer; embed it for retrieval instead of the raw query
Query Rewriting	Rephrase user query to improve retrieval before embedding
Multi-Query	Generate several query variants; retrieve and merge results
Contextual Compression	Trim irrelevant sentences from retrieved chunks before injection

2.4 Amazon Bedrock Knowledge Bases

Bedrock Knowledge Bases provide a fully managed RAG pipeline — no infrastructure to provision or manage.

Supported Data Sources

Data Source	Notes
Amazon S3	PDFs, Word, HTML, TXT, Markdown, CSV
Confluence	Pages and spaces
Microsoft SharePoint	Sites and document libraries
Salesforce	Objects and records
Web Crawler	Public web pages

Setup Flow

Step	Action
1	Create Knowledge Base; select data source
2	Choose embedding model (Titan Embeddings or Cohere Embed)
3	Choose vector store (OpenSearch Serverless recommended)
4	Configure chunking strategy
5	Sync data source to trigger indexing
6	Invoke via Bedrock API or attach to a Bedrock Agent

3. AI Agents

An AI agent is an LLM-powered system that autonomously understands a goal, plans a sequence of steps, calls external tools, observes results, and delivers a final response — often across multiple iterations.

3.1 Agent Loop and Architecture Patterns

ReAct Agent Loop

User Request
     ↓
[LLM: Reason — what step is needed?]
     ↓
[Identify tool or action]
     ↓
[Execute tool call]
     ↓
[Observe result] ──► More steps needed? ──► Yes ──► back to Reason
     ↓ No
[Final response to user]

Agent Architecture Patterns

Pattern	Description	Use Case
Single Agent	One LLM with a set of tools	Simple tool-use workflows
Supervisor + Sub-Agents	Orchestrator delegates to specialized agents	Complex, multi-domain tasks
Multi-Agent	Parallel specialized agents collaborate	Research, writing, review pipeline
Plan-and-Execute	Agent creates a full plan first, then executes each step	Long-horizon, structured tasks

Common Agent Tool Categories

Category	Examples
Database Query	Run SQL against RDS; query DynamoDB
REST API Calls	Call CRM, ERP, or internal microservices
Knowledge Base Search	Semantic retrieval from Bedrock Knowledge Base
Code Execution	Run Python for calculations or data processing
File I/O	Read or write S3 objects
AWS Service Calls	Start EC2 instances; query CloudWatch metrics

3.2 Amazon Bedrock Agents

Bedrock Agents provide a fully managed agent runtime.

Component	Description
Foundation Model	The reasoning engine; Claude or Titan
Instructions	System prompt defining role, goals, and constraints
Action Groups	APIs the agent can call, defined by an OpenAPI schema, backed by AWS Lambda
Knowledge Bases	Document collections the agent can query via RAG
Guardrails	Safety filters applied to inputs and outputs

Key Concept: Bedrock Agents expose a full trace of every reasoning step, action, and observation — essential for debugging agent behavior and satisfying auditability requirements.

4. Text-Based Applications

4.1 Summarization

Abstractive summarization (LLM-generated) produces new prose that captures the meaning of the source. Extractive summarization pulls key sentences verbatim from the source.

Summarization Type	Description
Single-document	Summarize one article, report, or email
Multi-document	Summarize a collection of related documents
Hierarchical (Map-Reduce)	Split → summarize chunks → summarize the summaries; handles documents exceeding the context window

Map-Reduce Pattern for Long Documents

Long Document
     ↓ Split into chunks
[Chunk 1] → [Summary 1]
[Chunk 2] → [Summary 2]   ─► [Final Summary]
[Chunk 3] → [Summary 3]

4.2 Classification and Information Extraction

Traditional ML vs. LLM for Classification

Dimension	Traditional ML	LLM-Based
Training data needed	Yes — hundreds to thousands of examples	No — zero-shot capable
New categories	Requires retraining	Change the prompt
Inference latency	Very fast	Slower
Cost per prediction	Very low	Higher per call

Common Information Extraction Tasks

Task	Extracted Data
Invoice processing	Vendor name, amount, date, line items
Resume parsing	Skills, education, work experience
Medical records	Diagnoses, medications, procedure dates
Contract analysis	Party names, effective dates, termination clauses

5. Code Generation Applications

Amazon Q Developer is the primary AWS service for AI-assisted coding.

Feature	Description
Inline Code Completion	Suggestions appear as you type in the IDE
Code Generation	Generate full functions from natural language comments
Code Explanation	Plain-language explanation of selected code
Code Review	Identify bugs, performance issues, and improvements
Unit Test Generation	Automatically write test cases for existing functions
Documentation Generation	Create docstrings and README files
Security Scanning	Detect OWASP Top 10 vulnerabilities in code
Code Transformation	Upgrade Java 8 to Java 17; migrate legacy codebases
CLI Completions	Suggest and explain terminal commands

Supported Languages

Python, Java, JavaScript, TypeScript, C#, Go, Ruby, PHP, Rust, Kotlin, SQL, Shell, and more.

Exam Tip: Amazon Q Developer replaced Amazon CodeWhisperer. The security scanning and code transformation features are exclusive to Q Developer and not available in the original CodeWhisperer.

6. Image and Vision Applications

6.1 Image Generation

Capability	Description	AWS Service
Text-to-Image	Generate an image from a text prompt	Titan Image Generator, Stable Diffusion (Bedrock)
Image Editing	In-paint, out-paint, or modify regions of an existing image	Titan Image Generator
Image Variation	Generate stylistically similar versions of an image	Titan Image Generator
Background Removal	Remove or replace image backgrounds	Titan Image Generator
AI Watermarking	Embed invisible watermarks to identify AI-generated images	Titan Image Generator

6.2 Image Understanding

Multi-modal LLMs such as Claude 3 Vision and Amazon Nova accept both text and image inputs.

Application	Description
Visual Q&A	"What products are shown in this photo?"
Document Understanding	Process invoices, forms, and charts from images
Accessibility	Auto-generate descriptive alt-text for images
Chart Analysis	Interpret graphs and extract data from visualizations

Amazon Rekognition Key Features

Feature	Description
Object Detection	Identify objects, scenes, and activities in images
Facial Analysis	Detect faces and analyze attributes (age range, emotion, pose)
Facial Recognition	Match faces against a custom collection
Text in Images (OCR)	Extract text visible in photos
Content Moderation	Detect explicit, violent, or unsafe content
Celebrity Recognition	Identify public figures
Custom Labels	Train a custom object or scene detector with your own images
Video Analysis	Apply all above features to video streams or files

7. Conversational AI and Chatbots

7.1 Chatbot Types and Memory

Chatbot Architecture Comparison

Type	Technology	Capability
Rule-based bot	Decision trees / scripts	Predefined, scripted conversation flows
Intent-based bot	Amazon Lex	Understands intents and extracts slot values
LLM chatbot	Amazon Bedrock	Open-ended, contextual conversation
RAG chatbot	Bedrock + Knowledge Base	Grounded in company documents
Agentic chatbot	Bedrock Agents	Can take actions; call APIs; execute workflows

Amazon Lex vs. Amazon Bedrock for Chatbots

Scenario	Use Lex	Use Bedrock
Structured form-filling or slot collection	✓
Voice interface with telephony integration	✓
Open-ended, free-form conversation		✓
Complex multi-step reasoning		✓
Knowledge base Q&A		✓

Conversation Memory Types

LLMs are stateless. The full conversation history must be passed with each request. Different memory strategies manage this at scale.

Memory Type	Description
Buffer Memory	Retain the last N turns verbatim
Summary Memory	Summarize older turns; retain recent turns verbatim
Entity Memory	Track key entities mentioned across the conversation
Vector Memory	Store past exchanges as embeddings; retrieve relevant ones

8. Search and Information Retrieval

Keyword Search vs. Semantic Search

Dimension	Keyword Search	Semantic Search
Matching basis	Exact word or phrase match	Meaning and intent
Query "car"	Returns documents containing "car"	Returns documents about vehicles and automobiles
Technology	BM25, TF-IDF	Embeddings + vector similarity
Strength	Known terminology, product codes	Natural language, ambiguous queries

Amazon Kendra

Amazon Kendra is a fully managed enterprise search service powered by ML.

Feature	Description
40+ native connectors	S3, SharePoint, RDS, Salesforce, ServiceNow, and more
Semantic ranking	ML relevance ranking beyond keyword matching
FAQ matching	Returns exact answers from Q&A document pairs
Access control	Respects source document permissions per user
Incremental learning	Improves ranking from user click-through feedback

Search Service Comparison

Dimension	Kendra	OpenSearch	Bedrock Knowledge Bases
Setup effort	Low	Medium	Low
Search type	Semantic + keyword	Keyword, semantic, and vector	Semantic (RAG)
FM response	Limited	Manual integration	Native
Primary use	Enterprise document search	General-purpose search	GenAI Q&A

9. Document Processing and Analysis

End-to-End Document Pipeline

Raw Documents (PDFs, scans, Word files)
     ↓
[Amazon Textract] → Extracted text, key-value pairs, tables
     ↓
[Amazon Comprehend] → Entities, sentiment, key phrases, PII
     ↓
[Amazon Bedrock] → Summarization, Q&A, classification, generation
     ↓
Structured output stored in RDS, DynamoDB, or S3

Amazon Comprehend Feature Reference

Feature	What It Detects
Entity Recognition	People, organizations, places, dates, quantities
Key Phrase Extraction	Most important phrases in the text
Sentiment Analysis	Positive, Negative, Neutral, or Mixed
Language Detection	Language of the input text
Topic Modeling	Latent topics across a document collection
PII Detection	Personally identifiable information
Custom Classification	User-defined text categories
Custom Entity Recognition	User-defined entity types
Targeted Sentiment	Sentiment directed at specific named entities

Amazon Comprehend Medical

Specialized for healthcare and clinical text:

Detects anatomy, medical conditions, medications, dosages, and test results
Identifies Protected Health Information (PHI) — HIPAA-eligible service
Maps findings to standard ontologies: ICD-10-CM, RxNorm, SNOMED CT

10. Building GenAI Applications on AWS

10.1 AWS Services for GenAI Applications

Service	Role in a GenAI Application
Amazon Bedrock	FM access, RAG, Agents, Guardrails
Amazon SageMaker	Build, train, and deploy custom models
AWS Lambda	Serverless compute for agent actions and API backends
Amazon API Gateway	REST or WebSocket API for the application
Amazon S3	Store documents, training data, and model artifacts
Amazon OpenSearch	Vector search and full-text search
Amazon DynamoDB	Store chat history and user session state
Amazon RDS / Aurora	Relational data storage
AWS Step Functions	Orchestrate complex multi-step GenAI workflows
Amazon CloudWatch	Monitor logs, metrics, and set alarms
Amazon Cognito	User authentication for customer-facing apps
AWS IAM	Fine-grained access control for all services
AWS KMS	Encryption key management

10.2 AWS Architecture Patterns

Pattern 1 — Simple Prompt Application

Best for content generation and basic Q&A with no private data.

User → API Gateway → Lambda → Amazon Bedrock → Response

Pattern 2 — RAG Application

Best for Q&A over private or proprietary documents.

Ingestion:  S3 → Bedrock Knowledge Base (chunk + embed) → OpenSearch

Retrieval:  User Query → Lambda → Bedrock RetrieveAndGenerate API → Response

Pattern 3 — Agentic Chatbot

Best for multi-turn conversations that require external actions.

User → API Gateway → Lambda
                         → Bedrock Agent
                               → Knowledge Base (retrieve context)
                               → Action Group → Lambda → External API
                         → DynamoDB (store chat history)
                    → Response with citations

Pattern 4 — Document Processing Pipeline

Best for automated extraction and analysis at scale.

S3 Upload Event
     → Lambda
           → Textract (extract text and structure)
           → Comprehend (entities, sentiment, PII)
           → Bedrock (summarize or classify)
           → DynamoDB / RDS (store results)
           → SNS (notify downstream systems)

Pattern 5 — Streaming Response

Best for conversational UIs where text appears progressively.

User → API Gateway (WebSocket) → Lambda → Bedrock (streaming) → Tokens streamed in real time

11. Model Selection and Evaluation

11.1 Choosing the Right Model

Decision Factors

Factor	Consideration
Task type	Text generation, classification, summarization, code, vision?
Context length	How long is the combined input and expected output?
Quality requirement	Mission-critical or best-effort?
Latency requirement	Under 500 ms? Under 2 s?
Cost budget	Per-token cost multiplied by expected volume
Language	English-only or multilingual?
Fine-tuning need	Does the model need customization?

Model Size vs. Quality Trade-off

Model Tier	Examples	Best For
Small / Fast	Claude Haiku, Llama 3 8B	Simple classification, routing, low-latency responses
Mid-tier	Claude Sonnet, Llama 3 70B	General Q&A, document analysis, chat
Large / Powerful	Claude Opus	Complex reasoning, multi-step analysis, legal and medical review

Key Concept: Always right-size. Use the smallest model that meets your quality threshold. Overusing large models is the most common cause of unnecessary GenAI cost.

Bedrock Model Evaluation

Evaluation Type	Description
Automatic Evaluation	Built-in metrics — accuracy, robustness, toxicity — run against your test prompts
Human Evaluation	Route sampled responses to human reviewers; compare models side by side

11.2 Performance and Cost Optimization

Latency Optimization

Technique	Effect
Choose a smaller model	Claude Haiku is significantly faster than Opus
Enable streaming	First token appears immediately; reduces perceived latency
Reduce prompt length	Shorter input = faster time to first token
Provisioned Throughput	Dedicated capacity eliminates cold-start variability
Async inference	Offload non-real-time work to asynchronous queues

Cost Optimization

Technique	Mechanism
Right-size the model	Haiku costs a fraction of Opus for simple tasks
Prompt caching	Cache repeated system prompts; pay only on cache miss
Limit max tokens	Set a reasonable upper bound on output length
Batch API	Use Bedrock Batch for large offline workloads
Model routing	Route simple queries to small models; escalate complex ones
Efficient retrieval	Retrieve only what is needed; avoid injecting excessive context

Exam Tips & Quick Reference

Scenario-to-Answer Mapping

Scenario Keyword / Requirement	Correct Answer
"LLM needs to answer questions from internal SharePoint docs"	Bedrock Knowledge Bases (RAG)
"LLM must book a meeting, query a database, and send an email"	Bedrock Agents
"Customer support bot follows a structured script with slots"	Amazon Lex
"Open-ended enterprise chatbot connected to company knowledge"	Bedrock + Knowledge Bases
"Extract structured data from scanned invoices"	Amazon Textract
"Analyze sentiment and entities in customer reviews"	Amazon Comprehend
"Detect objects and moderate content in uploaded images"	Amazon Rekognition
"Find documents by meaning, not just keyword match"	Semantic search; Amazon Kendra or OpenSearch
"AI coding assistant with security vulnerability scanning"	Amazon Q Developer
"Reduce cost of long repeated system prompts"	Bedrock prompt caching
"Model responses too slow for real-time use"	Switch to a smaller model (Haiku); enable streaming
"Generate product images from text descriptions"	Amazon Titan Image Generator
"Large document exceeds context window during summarization"	Map-Reduce summarization pattern

Common Traps

Kendra vs. Bedrock Knowledge Bases: Kendra is enterprise search — it finds documents. Knowledge Bases power RAG — the FM reads retrieved chunks and generates an answer. These solve different problems.
Bedrock Agents vs. simple LLM calls: If a scenario requires calling an external API, executing code, or taking a multi-step action, the answer is Bedrock Agents — not a plain Bedrock model invocation.
Amazon Lex vs. Bedrock for chatbots: Lex handles structured intents and slot-filling (e.g., booking a flight). Bedrock handles open-ended reasoning. The keywords "intent," "slot," or "script" point to Lex.
Textract vs. Comprehend: Textract extracts raw text and structure from documents (OCR). Comprehend analyzes text that is already extracted (NLP). Use them together in a pipeline.
Model size and cost: Larger models are not always better. The exam may ask for the "most cost-effective" approach — the correct answer will use the smallest model that meets the stated requirements.

Key Terms — Domain 3

Term	One-Line Definition
RAG	Grounding FM responses by retrieving relevant context at inference time
Chunking	Splitting source documents into smaller segments for indexing and retrieval
Vector Database	Database optimized for similarity search over embedding vectors
Hybrid Search	Combining dense semantic search with sparse keyword search for best results
Bedrock Agent	An FM-powered system that autonomously plans and executes multi-step tasks
Action Group	A set of APIs a Bedrock Agent can call, defined by an OpenAPI schema
Map-Reduce Summarization	Summarize chunks individually, then summarize the summaries
Re-ranking	A second model pass to reorder retrieved results by relevance
Amazon Kendra	Managed enterprise search using NLP-based relevance ranking
Amazon Personalize	Managed real-time recommendation and personalization service
Amazon Textract	Service that extracts text, forms, and tables from scanned documents
Amazon Comprehend	NLP service for entity, sentiment, key phrase, and PII analysis
Semantic Search	Finding documents by meaning rather than exact word match
Streaming	Returning generated tokens progressively to reduce perceived latency
Provisioned Throughput	Reserved Bedrock model capacity for consistent performance
Prompt Caching	Caching static prompt prefixes to reduce repeated input token costs

End of Domain 3. Continue to Domain 4: Guidelines for Responsible AI →

Domain 2: Fundamentals of GenAI

Domain 4: Guidelines for Responsible AI

Ready to test yourself?

Practice questions for this topic

Start Practicing →