Domain 3: Applications of Foundation Models
Topic 3 of 5 · Study notes
AWS Certified AI Practitioner — Domain 3: Applications of Foundation Models
Exam Code: AIF-C01 | Level: Foundational
Domain Weight: 28% | Total Domains: 5 | Passing Score: 700/1000
Table of Contents
- Foundation Model Application Patterns
- Retrieval Augmented Generation
- AI Agents
- Text-Based Applications
- Code Generation Applications
- Image and Vision Applications
- Conversational AI and Chatbots
- Search and Information Retrieval
- Document Processing and Analysis
- Building GenAI Applications on AWS
- Model Selection and Evaluation
- Exam Tips and Quick Reference
1. Foundation Model Application Patterns
Foundation models enable a new application development paradigm — natural language replaces explicit logic, and a single model can power many different features.
1.1 Development Levels
The depth of customization and expertise required increases with each level.
| Level | Approach | Example |
|---|---|---|
| 1 | Use pre-built AI APIs with no ML knowledge | Amazon Rekognition, Comprehend, Polly |
| 2 | Prompt foundation models via API | Amazon Bedrock with simple prompts |
| 3 | RAG — ground responses in private knowledge | Bedrock + Knowledge Bases |
| 4 | Fine-tune a foundation model on domain data | Bedrock fine-tuning |
| 5 | AI Agents — autonomous multi-step workflows | Bedrock Agents + Lambda |
| 6 | Train a custom foundation model from scratch | Rare; extremely expensive |
1.2 Application Selection Criteria
| Factor | Question to Answer |
|---|---|
| Task complexity | Simple classification or open-ended reasoning? |
| Data requirements | Is proprietary or real-time data needed? |
| Latency requirement | Synchronous real-time or asynchronous batch? |
| Cost constraints | Per-call vs. upfront training investment? |
| Accuracy requirement | How costly are errors? |
| Customization need | Does the model need domain adaptation? |
2. Retrieval Augmented Generation
Retrieval Augmented Generation (RAG) solves two fundamental LLM limitations: knowledge cutoff dates and the absence of private or proprietary information. At inference time, relevant context is dynamically retrieved from an external knowledge base and injected into the prompt.
2.1 RAG Architecture
RAG operates in two distinct phases:
── INDEXING PHASE (runs offline) ─────────────────────────
Documents (S3, SharePoint, web, etc.)
↓ Document loader
[Text Splitter / Chunker]
↓
[Embedding Model] → Dense Vectors
↓
[Vector Database] ← stored for retrieval
── RETRIEVAL PHASE (runs at inference time) ──────────────
User Query
↓
[Embedding Model] → Query Vector
↓
[Vector DB: ANN Search] → Top-K Chunks
↓
[Prompt Assembly] System + Retrieved Context + Query
↓
[Foundation Model] → Grounded Response
Common RAG Quality Issues and Fixes
| Issue | Likely Cause | Fix |
|---|---|---|
| Missing context | Document not indexed | Verify ingestion pipeline; check data source sync |
| Wrong chunk retrieved | Poor chunking or weak embedder | Adjust chunk size; use a stronger embedding model |
| Hallucination persists | Prompt ignores context | Add "Answer only using the provided context." |
| Irrelevant results | Poor query-to-document similarity | Apply query rewriting or HyDE |
| Stale information | Index out of date | Schedule periodic re-syncs |
2.2 Chunking Strategies
How documents are split significantly affects retrieval quality. Smaller chunks produce more precise retrieval; larger chunks provide more surrounding context.
| Strategy | Description | Best For |
|---|---|---|
| Fixed Size | Split at a fixed number of characters or tokens | Simple, uniform documents |
| Sentence Splitting | Split at sentence boundaries | Narrative articles |
| Recursive Character | Split at paragraphs first, then sentences, then characters | General-purpose default |
| Semantic Chunking | Group sentences with similar meaning | Dense, topic-rich documents |
| Document Structure | Split at headings or HTML tags | Structured documents, Markdown, HTML |
Note: A typical starting configuration is 512 tokens per chunk with 10–20% overlap between adjacent chunks to prevent missing context at boundaries.
2.3 Retrieval Methods
| Method | Mechanism | Best For |
|---|---|---|
| Dense (Semantic) | Embedding vector similarity search | Natural language queries; meaning-based retrieval |
| Sparse (Keyword) | BM25 / TF-IDF exact or fuzzy keyword match | Known terminology; product codes |
| Hybrid | Weighted combination of dense and sparse | Best overall quality; recommended for production |
| Re-ranking | A second cross-encoder model reorders results | Improving precision of top-K results |
Advanced RAG Techniques
| Technique | Description |
|---|---|
| HyDE | Generate a hypothetical answer; embed it for retrieval instead of the raw query |
| Query Rewriting | Rephrase user query to improve retrieval before embedding |
| Multi-Query | Generate several query variants; retrieve and merge results |
| Contextual Compression | Trim irrelevant sentences from retrieved chunks before injection |
2.4 Amazon Bedrock Knowledge Bases
Bedrock Knowledge Bases provide a fully managed RAG pipeline — no infrastructure to provision or manage.
Supported Data Sources
| Data Source | Notes |
|---|---|
| Amazon S3 | PDFs, Word, HTML, TXT, Markdown, CSV |
| Confluence | Pages and spaces |
| Microsoft SharePoint | Sites and document libraries |
| Salesforce | Objects and records |
| Web Crawler | Public web pages |
Setup Flow
| Step | Action |
|---|---|
| 1 | Create Knowledge Base; select data source |
| 2 | Choose embedding model (Titan Embeddings or Cohere Embed) |
| 3 | Choose vector store (OpenSearch Serverless recommended) |
| 4 | Configure chunking strategy |
| 5 | Sync data source to trigger indexing |
| 6 | Invoke via Bedrock API or attach to a Bedrock Agent |
3. AI Agents
An AI agent is an LLM-powered system that autonomously understands a goal, plans a sequence of steps, calls external tools, observes results, and delivers a final response — often across multiple iterations.
3.1 Agent Loop and Architecture Patterns
ReAct Agent Loop
User Request
↓
[LLM: Reason — what step is needed?]
↓
[Identify tool or action]
↓
[Execute tool call]
↓
[Observe result] ──► More steps needed? ──► Yes ──► back to Reason
↓ No
[Final response to user]
Agent Architecture Patterns
| Pattern | Description | Use Case |
|---|---|---|
| Single Agent | One LLM with a set of tools | Simple tool-use workflows |
| Supervisor + Sub-Agents | Orchestrator delegates to specialized agents | Complex, multi-domain tasks |
| Multi-Agent | Parallel specialized agents collaborate | Research, writing, review pipeline |
| Plan-and-Execute | Agent creates a full plan first, then executes each step | Long-horizon, structured tasks |
Common Agent Tool Categories
| Category | Examples |
|---|---|
| Database Query | Run SQL against RDS; query DynamoDB |
| REST API Calls | Call CRM, ERP, or internal microservices |
| Knowledge Base Search | Semantic retrieval from Bedrock Knowledge Base |
| Code Execution | Run Python for calculations or data processing |
| File I/O | Read or write S3 objects |
| AWS Service Calls | Start EC2 instances; query CloudWatch metrics |
3.2 Amazon Bedrock Agents
Bedrock Agents provide a fully managed agent runtime.
| Component | Description |
|---|---|
| Foundation Model | The reasoning engine; Claude or Titan |
| Instructions | System prompt defining role, goals, and constraints |
| Action Groups | APIs the agent can call, defined by an OpenAPI schema, backed by AWS Lambda |
| Knowledge Bases | Document collections the agent can query via RAG |
| Guardrails | Safety filters applied to inputs and outputs |
Key Concept: Bedrock Agents expose a full trace of every reasoning step, action, and observation — essential for debugging agent behavior and satisfying auditability requirements.
4. Text-Based Applications
4.1 Summarization
Abstractive summarization (LLM-generated) produces new prose that captures the meaning of the source. Extractive summarization pulls key sentences verbatim from the source.
| Summarization Type | Description |
|---|---|
| Single-document | Summarize one article, report, or email |
| Multi-document | Summarize a collection of related documents |
| Hierarchical (Map-Reduce) | Split → summarize chunks → summarize the summaries; handles documents exceeding the context window |
Map-Reduce Pattern for Long Documents
Long Document
↓ Split into chunks
[Chunk 1] → [Summary 1]
[Chunk 2] → [Summary 2] ─► [Final Summary]
[Chunk 3] → [Summary 3]
4.2 Classification and Information Extraction
Traditional ML vs. LLM for Classification
| Dimension | Traditional ML | LLM-Based |
|---|---|---|
| Training data needed | Yes — hundreds to thousands of examples | No — zero-shot capable |
| New categories | Requires retraining | Change the prompt |
| Inference latency | Very fast | Slower |
| Cost per prediction | Very low | Higher per call |
Common Information Extraction Tasks
| Task | Extracted Data |
|---|---|
| Invoice processing | Vendor name, amount, date, line items |
| Resume parsing | Skills, education, work experience |
| Medical records | Diagnoses, medications, procedure dates |
| Contract analysis | Party names, effective dates, termination clauses |
5. Code Generation Applications
Amazon Q Developer is the primary AWS service for AI-assisted coding.
| Feature | Description |
|---|---|
| Inline Code Completion | Suggestions appear as you type in the IDE |
| Code Generation | Generate full functions from natural language comments |
| Code Explanation | Plain-language explanation of selected code |
| Code Review | Identify bugs, performance issues, and improvements |
| Unit Test Generation | Automatically write test cases for existing functions |
| Documentation Generation | Create docstrings and README files |
| Security Scanning | Detect OWASP Top 10 vulnerabilities in code |
| Code Transformation | Upgrade Java 8 to Java 17; migrate legacy codebases |
| CLI Completions | Suggest and explain terminal commands |
Supported Languages
Python, Java, JavaScript, TypeScript, C#, Go, Ruby, PHP, Rust, Kotlin, SQL, Shell, and more.
Exam Tip: Amazon Q Developer replaced Amazon CodeWhisperer. The security scanning and code transformation features are exclusive to Q Developer and not available in the original CodeWhisperer.
6. Image and Vision Applications
6.1 Image Generation
| Capability | Description | AWS Service |
|---|---|---|
| Text-to-Image | Generate an image from a text prompt | Titan Image Generator, Stable Diffusion (Bedrock) |
| Image Editing | In-paint, out-paint, or modify regions of an existing image | Titan Image Generator |
| Image Variation | Generate stylistically similar versions of an image | Titan Image Generator |
| Background Removal | Remove or replace image backgrounds | Titan Image Generator |
| AI Watermarking | Embed invisible watermarks to identify AI-generated images | Titan Image Generator |
6.2 Image Understanding
Multi-modal LLMs such as Claude 3 Vision and Amazon Nova accept both text and image inputs.
| Application | Description |
|---|---|
| Visual Q&A | "What products are shown in this photo?" |
| Document Understanding | Process invoices, forms, and charts from images |
| Accessibility | Auto-generate descriptive alt-text for images |
| Chart Analysis | Interpret graphs and extract data from visualizations |
Amazon Rekognition Key Features
| Feature | Description |
|---|---|
| Object Detection | Identify objects, scenes, and activities in images |
| Facial Analysis | Detect faces and analyze attributes (age range, emotion, pose) |
| Facial Recognition | Match faces against a custom collection |
| Text in Images (OCR) | Extract text visible in photos |
| Content Moderation | Detect explicit, violent, or unsafe content |
| Celebrity Recognition | Identify public figures |
| Custom Labels | Train a custom object or scene detector with your own images |
| Video Analysis | Apply all above features to video streams or files |
7. Conversational AI and Chatbots
7.1 Chatbot Types and Memory
Chatbot Architecture Comparison
| Type | Technology | Capability |
|---|---|---|
| Rule-based bot | Decision trees / scripts | Predefined, scripted conversation flows |
| Intent-based bot | Amazon Lex | Understands intents and extracts slot values |
| LLM chatbot | Amazon Bedrock | Open-ended, contextual conversation |
| RAG chatbot | Bedrock + Knowledge Base | Grounded in company documents |
| Agentic chatbot | Bedrock Agents | Can take actions; call APIs; execute workflows |
Amazon Lex vs. Amazon Bedrock for Chatbots
| Scenario | Use Lex | Use Bedrock |
|---|---|---|
| Structured form-filling or slot collection | ✓ | |
| Voice interface with telephony integration | ✓ | |
| Open-ended, free-form conversation | ✓ | |
| Complex multi-step reasoning | ✓ | |
| Knowledge base Q&A | ✓ |
Conversation Memory Types
LLMs are stateless. The full conversation history must be passed with each request. Different memory strategies manage this at scale.
| Memory Type | Description |
|---|---|
| Buffer Memory | Retain the last N turns verbatim |
| Summary Memory | Summarize older turns; retain recent turns verbatim |
| Entity Memory | Track key entities mentioned across the conversation |
| Vector Memory | Store past exchanges as embeddings; retrieve relevant ones |
8. Search and Information Retrieval
Keyword Search vs. Semantic Search
| Dimension | Keyword Search | Semantic Search |
|---|---|---|
| Matching basis | Exact word or phrase match | Meaning and intent |
| Query "car" | Returns documents containing "car" | Returns documents about vehicles and automobiles |
| Technology | BM25, TF-IDF | Embeddings + vector similarity |
| Strength | Known terminology, product codes | Natural language, ambiguous queries |
Amazon Kendra
Amazon Kendra is a fully managed enterprise search service powered by ML.
| Feature | Description |
|---|---|
| 40+ native connectors | S3, SharePoint, RDS, Salesforce, ServiceNow, and more |
| Semantic ranking | ML relevance ranking beyond keyword matching |
| FAQ matching | Returns exact answers from Q&A document pairs |
| Access control | Respects source document permissions per user |
| Incremental learning | Improves ranking from user click-through feedback |
Search Service Comparison
| Dimension | Kendra | OpenSearch | Bedrock Knowledge Bases |
|---|---|---|---|
| Setup effort | Low | Medium | Low |
| Search type | Semantic + keyword | Keyword, semantic, and vector | Semantic (RAG) |
| FM response | Limited | Manual integration | Native |
| Primary use | Enterprise document search | General-purpose search | GenAI Q&A |
9. Document Processing and Analysis
End-to-End Document Pipeline
Raw Documents (PDFs, scans, Word files)
↓
[Amazon Textract] → Extracted text, key-value pairs, tables
↓
[Amazon Comprehend] → Entities, sentiment, key phrases, PII
↓
[Amazon Bedrock] → Summarization, Q&A, classification, generation
↓
Structured output stored in RDS, DynamoDB, or S3
Amazon Comprehend Feature Reference
| Feature | What It Detects |
|---|---|
| Entity Recognition | People, organizations, places, dates, quantities |
| Key Phrase Extraction | Most important phrases in the text |
| Sentiment Analysis | Positive, Negative, Neutral, or Mixed |
| Language Detection | Language of the input text |
| Topic Modeling | Latent topics across a document collection |
| PII Detection | Personally identifiable information |
| Custom Classification | User-defined text categories |
| Custom Entity Recognition | User-defined entity types |
| Targeted Sentiment | Sentiment directed at specific named entities |
Amazon Comprehend Medical
Specialized for healthcare and clinical text:
- Detects anatomy, medical conditions, medications, dosages, and test results
- Identifies Protected Health Information (PHI) — HIPAA-eligible service
- Maps findings to standard ontologies: ICD-10-CM, RxNorm, SNOMED CT
10. Building GenAI Applications on AWS
10.1 AWS Services for GenAI Applications
| Service | Role in a GenAI Application |
|---|---|
| Amazon Bedrock | FM access, RAG, Agents, Guardrails |
| Amazon SageMaker | Build, train, and deploy custom models |
| AWS Lambda | Serverless compute for agent actions and API backends |
| Amazon API Gateway | REST or WebSocket API for the application |
| Amazon S3 | Store documents, training data, and model artifacts |
| Amazon OpenSearch | Vector search and full-text search |
| Amazon DynamoDB | Store chat history and user session state |
| Amazon RDS / Aurora | Relational data storage |
| AWS Step Functions | Orchestrate complex multi-step GenAI workflows |
| Amazon CloudWatch | Monitor logs, metrics, and set alarms |
| Amazon Cognito | User authentication for customer-facing apps |
| AWS IAM | Fine-grained access control for all services |
| AWS KMS | Encryption key management |
10.2 AWS Architecture Patterns
Pattern 1 — Simple Prompt Application
Best for content generation and basic Q&A with no private data.
User → API Gateway → Lambda → Amazon Bedrock → Response
Pattern 2 — RAG Application
Best for Q&A over private or proprietary documents.
Ingestion: S3 → Bedrock Knowledge Base (chunk + embed) → OpenSearch
Retrieval: User Query → Lambda → Bedrock RetrieveAndGenerate API → Response
Pattern 3 — Agentic Chatbot
Best for multi-turn conversations that require external actions.
User → API Gateway → Lambda
→ Bedrock Agent
→ Knowledge Base (retrieve context)
→ Action Group → Lambda → External API
→ DynamoDB (store chat history)
→ Response with citations
Pattern 4 — Document Processing Pipeline
Best for automated extraction and analysis at scale.
S3 Upload Event
→ Lambda
→ Textract (extract text and structure)
→ Comprehend (entities, sentiment, PII)
→ Bedrock (summarize or classify)
→ DynamoDB / RDS (store results)
→ SNS (notify downstream systems)
Pattern 5 — Streaming Response
Best for conversational UIs where text appears progressively.
User → API Gateway (WebSocket) → Lambda → Bedrock (streaming) → Tokens streamed in real time
11. Model Selection and Evaluation
11.1 Choosing the Right Model
Decision Factors
| Factor | Consideration |
|---|---|
| Task type | Text generation, classification, summarization, code, vision? |
| Context length | How long is the combined input and expected output? |
| Quality requirement | Mission-critical or best-effort? |
| Latency requirement | Under 500 ms? Under 2 s? |
| Cost budget | Per-token cost multiplied by expected volume |
| Language | English-only or multilingual? |
| Fine-tuning need | Does the model need customization? |
Model Size vs. Quality Trade-off
| Model Tier | Examples | Best For |
|---|---|---|
| Small / Fast | Claude Haiku, Llama 3 8B | Simple classification, routing, low-latency responses |
| Mid-tier | Claude Sonnet, Llama 3 70B | General Q&A, document analysis, chat |
| Large / Powerful | Claude Opus | Complex reasoning, multi-step analysis, legal and medical review |
Key Concept: Always right-size. Use the smallest model that meets your quality threshold. Overusing large models is the most common cause of unnecessary GenAI cost.
Bedrock Model Evaluation
| Evaluation Type | Description |
|---|---|
| Automatic Evaluation | Built-in metrics — accuracy, robustness, toxicity — run against your test prompts |
| Human Evaluation | Route sampled responses to human reviewers; compare models side by side |
11.2 Performance and Cost Optimization
Latency Optimization
| Technique | Effect |
|---|---|
| Choose a smaller model | Claude Haiku is significantly faster than Opus |
| Enable streaming | First token appears immediately; reduces perceived latency |
| Reduce prompt length | Shorter input = faster time to first token |
| Provisioned Throughput | Dedicated capacity eliminates cold-start variability |
| Async inference | Offload non-real-time work to asynchronous queues |
Cost Optimization
| Technique | Mechanism |
|---|---|
| Right-size the model | Haiku costs a fraction of Opus for simple tasks |
| Prompt caching | Cache repeated system prompts; pay only on cache miss |
| Limit max tokens | Set a reasonable upper bound on output length |
| Batch API | Use Bedrock Batch for large offline workloads |
| Model routing | Route simple queries to small models; escalate complex ones |
| Efficient retrieval | Retrieve only what is needed; avoid injecting excessive context |
Exam Tips & Quick Reference
Scenario-to-Answer Mapping
| Scenario Keyword / Requirement | Correct Answer |
|---|---|
| "LLM needs to answer questions from internal SharePoint docs" | Bedrock Knowledge Bases (RAG) |
| "LLM must book a meeting, query a database, and send an email" | Bedrock Agents |
| "Customer support bot follows a structured script with slots" | Amazon Lex |
| "Open-ended enterprise chatbot connected to company knowledge" | Bedrock + Knowledge Bases |
| "Extract structured data from scanned invoices" | Amazon Textract |
| "Analyze sentiment and entities in customer reviews" | Amazon Comprehend |
| "Detect objects and moderate content in uploaded images" | Amazon Rekognition |
| "Find documents by meaning, not just keyword match" | Semantic search; Amazon Kendra or OpenSearch |
| "AI coding assistant with security vulnerability scanning" | Amazon Q Developer |
| "Reduce cost of long repeated system prompts" | Bedrock prompt caching |
| "Model responses too slow for real-time use" | Switch to a smaller model (Haiku); enable streaming |
| "Generate product images from text descriptions" | Amazon Titan Image Generator |
| "Large document exceeds context window during summarization" | Map-Reduce summarization pattern |
Common Traps
- Kendra vs. Bedrock Knowledge Bases: Kendra is enterprise search — it finds documents. Knowledge Bases power RAG — the FM reads retrieved chunks and generates an answer. These solve different problems.
- Bedrock Agents vs. simple LLM calls: If a scenario requires calling an external API, executing code, or taking a multi-step action, the answer is Bedrock Agents — not a plain Bedrock model invocation.
- Amazon Lex vs. Bedrock for chatbots: Lex handles structured intents and slot-filling (e.g., booking a flight). Bedrock handles open-ended reasoning. The keywords "intent," "slot," or "script" point to Lex.
- Textract vs. Comprehend: Textract extracts raw text and structure from documents (OCR). Comprehend analyzes text that is already extracted (NLP). Use them together in a pipeline.
- Model size and cost: Larger models are not always better. The exam may ask for the "most cost-effective" approach — the correct answer will use the smallest model that meets the stated requirements.
Key Terms — Domain 3
| Term | One-Line Definition |
|---|---|
| RAG | Grounding FM responses by retrieving relevant context at inference time |
| Chunking | Splitting source documents into smaller segments for indexing and retrieval |
| Vector Database | Database optimized for similarity search over embedding vectors |
| Hybrid Search | Combining dense semantic search with sparse keyword search for best results |
| Bedrock Agent | An FM-powered system that autonomously plans and executes multi-step tasks |
| Action Group | A set of APIs a Bedrock Agent can call, defined by an OpenAPI schema |
| Map-Reduce Summarization | Summarize chunks individually, then summarize the summaries |
| Re-ranking | A second model pass to reorder retrieved results by relevance |
| Amazon Kendra | Managed enterprise search using NLP-based relevance ranking |
| Amazon Personalize | Managed real-time recommendation and personalization service |
| Amazon Textract | Service that extracts text, forms, and tables from scanned documents |
| Amazon Comprehend | NLP service for entity, sentiment, key phrase, and PII analysis |
| Semantic Search | Finding documents by meaning rather than exact word match |
| Streaming | Returning generated tokens progressively to reduce perceived latency |
| Provisioned Throughput | Reserved Bedrock model capacity for consistent performance |
| Prompt Caching | Caching static prompt prefixes to reduce repeated input token costs |
End of Domain 3. Continue to Domain 4: Guidelines for Responsible AI →
Ready to test yourself?
Practice questions for this topic