Courses/AIF-C01/Domain 3: Applications of Foundation Models
Practice questions →
AWSAIF-C01

Domain 3: Applications of Foundation Models

Topic 3 of 5 · Study notes

AWS Certified AI Practitioner — Domain 3: Applications of Foundation Models

Exam Code: AIF-C01  |  Level: Foundational
Domain Weight: 28%  |  Total Domains: 5  |  Passing Score: 700/1000


Table of Contents

  1. Foundation Model Application Patterns
  2. Retrieval Augmented Generation
  3. AI Agents
  4. Text-Based Applications
  5. Code Generation Applications
  6. Image and Vision Applications
  7. Conversational AI and Chatbots
  8. Search and Information Retrieval
  9. Document Processing and Analysis
  10. Building GenAI Applications on AWS
  11. Model Selection and Evaluation
  12. Exam Tips and Quick Reference

1. Foundation Model Application Patterns

Foundation models enable a new application development paradigm — natural language replaces explicit logic, and a single model can power many different features.

1.1 Development Levels

The depth of customization and expertise required increases with each level.

Level Approach Example
1 Use pre-built AI APIs with no ML knowledge Amazon Rekognition, Comprehend, Polly
2 Prompt foundation models via API Amazon Bedrock with simple prompts
3 RAG — ground responses in private knowledge Bedrock + Knowledge Bases
4 Fine-tune a foundation model on domain data Bedrock fine-tuning
5 AI Agents — autonomous multi-step workflows Bedrock Agents + Lambda
6 Train a custom foundation model from scratch Rare; extremely expensive

1.2 Application Selection Criteria

Factor Question to Answer
Task complexity Simple classification or open-ended reasoning?
Data requirements Is proprietary or real-time data needed?
Latency requirement Synchronous real-time or asynchronous batch?
Cost constraints Per-call vs. upfront training investment?
Accuracy requirement How costly are errors?
Customization need Does the model need domain adaptation?

2. Retrieval Augmented Generation

Retrieval Augmented Generation (RAG) solves two fundamental LLM limitations: knowledge cutoff dates and the absence of private or proprietary information. At inference time, relevant context is dynamically retrieved from an external knowledge base and injected into the prompt.

2.1 RAG Architecture

RAG operates in two distinct phases:

── INDEXING PHASE (runs offline) ─────────────────────────
  Documents (S3, SharePoint, web, etc.)
       ↓ Document loader
  [Text Splitter / Chunker]
       ↓
  [Embedding Model] → Dense Vectors
       ↓
  [Vector Database]  ← stored for retrieval

── RETRIEVAL PHASE (runs at inference time) ──────────────
  User Query
       ↓
  [Embedding Model] → Query Vector
       ↓
  [Vector DB: ANN Search] → Top-K Chunks
       ↓
  [Prompt Assembly] System + Retrieved Context + Query
       ↓
  [Foundation Model] → Grounded Response

Common RAG Quality Issues and Fixes

Issue Likely Cause Fix
Missing context Document not indexed Verify ingestion pipeline; check data source sync
Wrong chunk retrieved Poor chunking or weak embedder Adjust chunk size; use a stronger embedding model
Hallucination persists Prompt ignores context Add "Answer only using the provided context."
Irrelevant results Poor query-to-document similarity Apply query rewriting or HyDE
Stale information Index out of date Schedule periodic re-syncs

2.2 Chunking Strategies

How documents are split significantly affects retrieval quality. Smaller chunks produce more precise retrieval; larger chunks provide more surrounding context.

Strategy Description Best For
Fixed Size Split at a fixed number of characters or tokens Simple, uniform documents
Sentence Splitting Split at sentence boundaries Narrative articles
Recursive Character Split at paragraphs first, then sentences, then characters General-purpose default
Semantic Chunking Group sentences with similar meaning Dense, topic-rich documents
Document Structure Split at headings or HTML tags Structured documents, Markdown, HTML

Note: A typical starting configuration is 512 tokens per chunk with 10–20% overlap between adjacent chunks to prevent missing context at boundaries.

2.3 Retrieval Methods

Method Mechanism Best For
Dense (Semantic) Embedding vector similarity search Natural language queries; meaning-based retrieval
Sparse (Keyword) BM25 / TF-IDF exact or fuzzy keyword match Known terminology; product codes
Hybrid Weighted combination of dense and sparse Best overall quality; recommended for production
Re-ranking A second cross-encoder model reorders results Improving precision of top-K results

Advanced RAG Techniques

Technique Description
HyDE Generate a hypothetical answer; embed it for retrieval instead of the raw query
Query Rewriting Rephrase user query to improve retrieval before embedding
Multi-Query Generate several query variants; retrieve and merge results
Contextual Compression Trim irrelevant sentences from retrieved chunks before injection

2.4 Amazon Bedrock Knowledge Bases

Bedrock Knowledge Bases provide a fully managed RAG pipeline — no infrastructure to provision or manage.

Supported Data Sources

Data Source Notes
Amazon S3 PDFs, Word, HTML, TXT, Markdown, CSV
Confluence Pages and spaces
Microsoft SharePoint Sites and document libraries
Salesforce Objects and records
Web Crawler Public web pages

Setup Flow

Step Action
1 Create Knowledge Base; select data source
2 Choose embedding model (Titan Embeddings or Cohere Embed)
3 Choose vector store (OpenSearch Serverless recommended)
4 Configure chunking strategy
5 Sync data source to trigger indexing
6 Invoke via Bedrock API or attach to a Bedrock Agent

3. AI Agents

An AI agent is an LLM-powered system that autonomously understands a goal, plans a sequence of steps, calls external tools, observes results, and delivers a final response — often across multiple iterations.

3.1 Agent Loop and Architecture Patterns

ReAct Agent Loop

User Request
     ↓
[LLM: Reason — what step is needed?]
     ↓
[Identify tool or action]
     ↓
[Execute tool call]
     ↓
[Observe result] ──► More steps needed? ──► Yes ──► back to Reason
     ↓ No
[Final response to user]

Agent Architecture Patterns

Pattern Description Use Case
Single Agent One LLM with a set of tools Simple tool-use workflows
Supervisor + Sub-Agents Orchestrator delegates to specialized agents Complex, multi-domain tasks
Multi-Agent Parallel specialized agents collaborate Research, writing, review pipeline
Plan-and-Execute Agent creates a full plan first, then executes each step Long-horizon, structured tasks

Common Agent Tool Categories

Category Examples
Database Query Run SQL against RDS; query DynamoDB
REST API Calls Call CRM, ERP, or internal microservices
Knowledge Base Search Semantic retrieval from Bedrock Knowledge Base
Code Execution Run Python for calculations or data processing
File I/O Read or write S3 objects
AWS Service Calls Start EC2 instances; query CloudWatch metrics

3.2 Amazon Bedrock Agents

Bedrock Agents provide a fully managed agent runtime.

Component Description
Foundation Model The reasoning engine; Claude or Titan
Instructions System prompt defining role, goals, and constraints
Action Groups APIs the agent can call, defined by an OpenAPI schema, backed by AWS Lambda
Knowledge Bases Document collections the agent can query via RAG
Guardrails Safety filters applied to inputs and outputs

Key Concept: Bedrock Agents expose a full trace of every reasoning step, action, and observation — essential for debugging agent behavior and satisfying auditability requirements.


4. Text-Based Applications

4.1 Summarization

Abstractive summarization (LLM-generated) produces new prose that captures the meaning of the source. Extractive summarization pulls key sentences verbatim from the source.

Summarization Type Description
Single-document Summarize one article, report, or email
Multi-document Summarize a collection of related documents
Hierarchical (Map-Reduce) Split → summarize chunks → summarize the summaries; handles documents exceeding the context window

Map-Reduce Pattern for Long Documents

Long Document
     ↓ Split into chunks
[Chunk 1] → [Summary 1]
[Chunk 2] → [Summary 2]   ─► [Final Summary]
[Chunk 3] → [Summary 3]

4.2 Classification and Information Extraction

Traditional ML vs. LLM for Classification

Dimension Traditional ML LLM-Based
Training data needed Yes — hundreds to thousands of examples No — zero-shot capable
New categories Requires retraining Change the prompt
Inference latency Very fast Slower
Cost per prediction Very low Higher per call

Common Information Extraction Tasks

Task Extracted Data
Invoice processing Vendor name, amount, date, line items
Resume parsing Skills, education, work experience
Medical records Diagnoses, medications, procedure dates
Contract analysis Party names, effective dates, termination clauses

5. Code Generation Applications

Amazon Q Developer is the primary AWS service for AI-assisted coding.

Feature Description
Inline Code Completion Suggestions appear as you type in the IDE
Code Generation Generate full functions from natural language comments
Code Explanation Plain-language explanation of selected code
Code Review Identify bugs, performance issues, and improvements
Unit Test Generation Automatically write test cases for existing functions
Documentation Generation Create docstrings and README files
Security Scanning Detect OWASP Top 10 vulnerabilities in code
Code Transformation Upgrade Java 8 to Java 17; migrate legacy codebases
CLI Completions Suggest and explain terminal commands

Supported Languages

Python, Java, JavaScript, TypeScript, C#, Go, Ruby, PHP, Rust, Kotlin, SQL, Shell, and more.

Exam Tip: Amazon Q Developer replaced Amazon CodeWhisperer. The security scanning and code transformation features are exclusive to Q Developer and not available in the original CodeWhisperer.


6. Image and Vision Applications

6.1 Image Generation

Capability Description AWS Service
Text-to-Image Generate an image from a text prompt Titan Image Generator, Stable Diffusion (Bedrock)
Image Editing In-paint, out-paint, or modify regions of an existing image Titan Image Generator
Image Variation Generate stylistically similar versions of an image Titan Image Generator
Background Removal Remove or replace image backgrounds Titan Image Generator
AI Watermarking Embed invisible watermarks to identify AI-generated images Titan Image Generator

6.2 Image Understanding

Multi-modal LLMs such as Claude 3 Vision and Amazon Nova accept both text and image inputs.

Application Description
Visual Q&A "What products are shown in this photo?"
Document Understanding Process invoices, forms, and charts from images
Accessibility Auto-generate descriptive alt-text for images
Chart Analysis Interpret graphs and extract data from visualizations

Amazon Rekognition Key Features

Feature Description
Object Detection Identify objects, scenes, and activities in images
Facial Analysis Detect faces and analyze attributes (age range, emotion, pose)
Facial Recognition Match faces against a custom collection
Text in Images (OCR) Extract text visible in photos
Content Moderation Detect explicit, violent, or unsafe content
Celebrity Recognition Identify public figures
Custom Labels Train a custom object or scene detector with your own images
Video Analysis Apply all above features to video streams or files

7. Conversational AI and Chatbots

7.1 Chatbot Types and Memory

Chatbot Architecture Comparison

Type Technology Capability
Rule-based bot Decision trees / scripts Predefined, scripted conversation flows
Intent-based bot Amazon Lex Understands intents and extracts slot values
LLM chatbot Amazon Bedrock Open-ended, contextual conversation
RAG chatbot Bedrock + Knowledge Base Grounded in company documents
Agentic chatbot Bedrock Agents Can take actions; call APIs; execute workflows

Amazon Lex vs. Amazon Bedrock for Chatbots

Scenario Use Lex Use Bedrock
Structured form-filling or slot collection
Voice interface with telephony integration
Open-ended, free-form conversation
Complex multi-step reasoning
Knowledge base Q&A

Conversation Memory Types

LLMs are stateless. The full conversation history must be passed with each request. Different memory strategies manage this at scale.

Memory Type Description
Buffer Memory Retain the last N turns verbatim
Summary Memory Summarize older turns; retain recent turns verbatim
Entity Memory Track key entities mentioned across the conversation
Vector Memory Store past exchanges as embeddings; retrieve relevant ones

8. Search and Information Retrieval

Dimension Keyword Search Semantic Search
Matching basis Exact word or phrase match Meaning and intent
Query "car" Returns documents containing "car" Returns documents about vehicles and automobiles
Technology BM25, TF-IDF Embeddings + vector similarity
Strength Known terminology, product codes Natural language, ambiguous queries

Amazon Kendra

Amazon Kendra is a fully managed enterprise search service powered by ML.

Feature Description
40+ native connectors S3, SharePoint, RDS, Salesforce, ServiceNow, and more
Semantic ranking ML relevance ranking beyond keyword matching
FAQ matching Returns exact answers from Q&A document pairs
Access control Respects source document permissions per user
Incremental learning Improves ranking from user click-through feedback

Search Service Comparison

Dimension Kendra OpenSearch Bedrock Knowledge Bases
Setup effort Low Medium Low
Search type Semantic + keyword Keyword, semantic, and vector Semantic (RAG)
FM response Limited Manual integration Native
Primary use Enterprise document search General-purpose search GenAI Q&A

9. Document Processing and Analysis

End-to-End Document Pipeline

Raw Documents (PDFs, scans, Word files)
     ↓
[Amazon Textract] → Extracted text, key-value pairs, tables
     ↓
[Amazon Comprehend] → Entities, sentiment, key phrases, PII
     ↓
[Amazon Bedrock] → Summarization, Q&A, classification, generation
     ↓
Structured output stored in RDS, DynamoDB, or S3

Amazon Comprehend Feature Reference

Feature What It Detects
Entity Recognition People, organizations, places, dates, quantities
Key Phrase Extraction Most important phrases in the text
Sentiment Analysis Positive, Negative, Neutral, or Mixed
Language Detection Language of the input text
Topic Modeling Latent topics across a document collection
PII Detection Personally identifiable information
Custom Classification User-defined text categories
Custom Entity Recognition User-defined entity types
Targeted Sentiment Sentiment directed at specific named entities

Amazon Comprehend Medical

Specialized for healthcare and clinical text:

  • Detects anatomy, medical conditions, medications, dosages, and test results
  • Identifies Protected Health Information (PHI) — HIPAA-eligible service
  • Maps findings to standard ontologies: ICD-10-CM, RxNorm, SNOMED CT

10. Building GenAI Applications on AWS

10.1 AWS Services for GenAI Applications

Service Role in a GenAI Application
Amazon Bedrock FM access, RAG, Agents, Guardrails
Amazon SageMaker Build, train, and deploy custom models
AWS Lambda Serverless compute for agent actions and API backends
Amazon API Gateway REST or WebSocket API for the application
Amazon S3 Store documents, training data, and model artifacts
Amazon OpenSearch Vector search and full-text search
Amazon DynamoDB Store chat history and user session state
Amazon RDS / Aurora Relational data storage
AWS Step Functions Orchestrate complex multi-step GenAI workflows
Amazon CloudWatch Monitor logs, metrics, and set alarms
Amazon Cognito User authentication for customer-facing apps
AWS IAM Fine-grained access control for all services
AWS KMS Encryption key management

10.2 AWS Architecture Patterns

Pattern 1 — Simple Prompt Application

Best for content generation and basic Q&A with no private data.

User → API Gateway → Lambda → Amazon Bedrock → Response

Pattern 2 — RAG Application

Best for Q&A over private or proprietary documents.

Ingestion:  S3 → Bedrock Knowledge Base (chunk + embed) → OpenSearch

Retrieval:  User Query → Lambda → Bedrock RetrieveAndGenerate API → Response

Pattern 3 — Agentic Chatbot

Best for multi-turn conversations that require external actions.

User → API Gateway → Lambda
                         → Bedrock Agent
                               → Knowledge Base (retrieve context)
                               → Action Group → Lambda → External API
                         → DynamoDB (store chat history)
                    → Response with citations

Pattern 4 — Document Processing Pipeline

Best for automated extraction and analysis at scale.

S3 Upload Event
     → Lambda
           → Textract (extract text and structure)
           → Comprehend (entities, sentiment, PII)
           → Bedrock (summarize or classify)
           → DynamoDB / RDS (store results)
           → SNS (notify downstream systems)

Pattern 5 — Streaming Response

Best for conversational UIs where text appears progressively.

User → API Gateway (WebSocket) → Lambda → Bedrock (streaming) → Tokens streamed in real time

11. Model Selection and Evaluation

11.1 Choosing the Right Model

Decision Factors

Factor Consideration
Task type Text generation, classification, summarization, code, vision?
Context length How long is the combined input and expected output?
Quality requirement Mission-critical or best-effort?
Latency requirement Under 500 ms? Under 2 s?
Cost budget Per-token cost multiplied by expected volume
Language English-only or multilingual?
Fine-tuning need Does the model need customization?

Model Size vs. Quality Trade-off

Model Tier Examples Best For
Small / Fast Claude Haiku, Llama 3 8B Simple classification, routing, low-latency responses
Mid-tier Claude Sonnet, Llama 3 70B General Q&A, document analysis, chat
Large / Powerful Claude Opus Complex reasoning, multi-step analysis, legal and medical review

Key Concept: Always right-size. Use the smallest model that meets your quality threshold. Overusing large models is the most common cause of unnecessary GenAI cost.

Bedrock Model Evaluation

Evaluation Type Description
Automatic Evaluation Built-in metrics — accuracy, robustness, toxicity — run against your test prompts
Human Evaluation Route sampled responses to human reviewers; compare models side by side

11.2 Performance and Cost Optimization

Latency Optimization

Technique Effect
Choose a smaller model Claude Haiku is significantly faster than Opus
Enable streaming First token appears immediately; reduces perceived latency
Reduce prompt length Shorter input = faster time to first token
Provisioned Throughput Dedicated capacity eliminates cold-start variability
Async inference Offload non-real-time work to asynchronous queues

Cost Optimization

Technique Mechanism
Right-size the model Haiku costs a fraction of Opus for simple tasks
Prompt caching Cache repeated system prompts; pay only on cache miss
Limit max tokens Set a reasonable upper bound on output length
Batch API Use Bedrock Batch for large offline workloads
Model routing Route simple queries to small models; escalate complex ones
Efficient retrieval Retrieve only what is needed; avoid injecting excessive context

Exam Tips & Quick Reference

Scenario-to-Answer Mapping

Scenario Keyword / Requirement Correct Answer
"LLM needs to answer questions from internal SharePoint docs" Bedrock Knowledge Bases (RAG)
"LLM must book a meeting, query a database, and send an email" Bedrock Agents
"Customer support bot follows a structured script with slots" Amazon Lex
"Open-ended enterprise chatbot connected to company knowledge" Bedrock + Knowledge Bases
"Extract structured data from scanned invoices" Amazon Textract
"Analyze sentiment and entities in customer reviews" Amazon Comprehend
"Detect objects and moderate content in uploaded images" Amazon Rekognition
"Find documents by meaning, not just keyword match" Semantic search; Amazon Kendra or OpenSearch
"AI coding assistant with security vulnerability scanning" Amazon Q Developer
"Reduce cost of long repeated system prompts" Bedrock prompt caching
"Model responses too slow for real-time use" Switch to a smaller model (Haiku); enable streaming
"Generate product images from text descriptions" Amazon Titan Image Generator
"Large document exceeds context window during summarization" Map-Reduce summarization pattern

Common Traps

  • Kendra vs. Bedrock Knowledge Bases: Kendra is enterprise search — it finds documents. Knowledge Bases power RAG — the FM reads retrieved chunks and generates an answer. These solve different problems.
  • Bedrock Agents vs. simple LLM calls: If a scenario requires calling an external API, executing code, or taking a multi-step action, the answer is Bedrock Agents — not a plain Bedrock model invocation.
  • Amazon Lex vs. Bedrock for chatbots: Lex handles structured intents and slot-filling (e.g., booking a flight). Bedrock handles open-ended reasoning. The keywords "intent," "slot," or "script" point to Lex.
  • Textract vs. Comprehend: Textract extracts raw text and structure from documents (OCR). Comprehend analyzes text that is already extracted (NLP). Use them together in a pipeline.
  • Model size and cost: Larger models are not always better. The exam may ask for the "most cost-effective" approach — the correct answer will use the smallest model that meets the stated requirements.

Key Terms — Domain 3

Term One-Line Definition
RAG Grounding FM responses by retrieving relevant context at inference time
Chunking Splitting source documents into smaller segments for indexing and retrieval
Vector Database Database optimized for similarity search over embedding vectors
Hybrid Search Combining dense semantic search with sparse keyword search for best results
Bedrock Agent An FM-powered system that autonomously plans and executes multi-step tasks
Action Group A set of APIs a Bedrock Agent can call, defined by an OpenAPI schema
Map-Reduce Summarization Summarize chunks individually, then summarize the summaries
Re-ranking A second model pass to reorder retrieved results by relevance
Amazon Kendra Managed enterprise search using NLP-based relevance ranking
Amazon Personalize Managed real-time recommendation and personalization service
Amazon Textract Service that extracts text, forms, and tables from scanned documents
Amazon Comprehend NLP service for entity, sentiment, key phrase, and PII analysis
Semantic Search Finding documents by meaning rather than exact word match
Streaming Returning generated tokens progressively to reduce perceived latency
Provisioned Throughput Reserved Bedrock model capacity for consistent performance
Prompt Caching Caching static prompt prefixes to reduce repeated input token costs

End of Domain 3. Continue to Domain 4: Guidelines for Responsible AI →

Ready to test yourself?

Practice questions for this topic

Start Practicing →

AIF-C01 Topics

Topic 3 of 5