Dynamic Configuration
- AWS AppCongif is the recommended manged service for manging dynamic routing rules, feature flags, compliance policies, and cost thresholds for generatiev AI applications, as it supports zero-code deployments fo configuration changes, immediate propagation, and validation guardrails for high-risk updates.
| Feature | AppConfig | Parameter Store |
|---|---|---|
| Prompt versioning | Built-in | Manual |
| Gradual rollout | Yes (canary, linear) | No |
| Rollback | One click | Manual |
| Validation hooks | Yes (JSON schema) | No |
| A/B testing | Yes | No |
| Cost | Slightly higher | Free tier |
| Best for | Frequent changes, safe rollouts | Stable config, simple use cases |
Prompt Versioning
A classic prompt versioning strategy is to store prompt templates in AppConfig with semantic version numbers (e.g., “checkout-prompt v1.2.0”) and have the application fetch the latest version at runtime. This allows for iterative prompt improvements, A/B testing of different prompt versions, and quick rollbacks if a new prompt performs poorly, all without redeploying application code.
flowchart LR
AC["AWS AppConfig\n(prompt templates)"]
subgraph Runtime
L["Lambda / ECS /\nSageMaker"]
end
B["Amazon Bedrock\n(Claude / other model)"]
U["User Input"]
AC -->|fetch template at runtime| L
U -->|inject variables| L
L -->|filled prompt| B
B -->|response| L
Data Engineering
Glue
Aws Glue Data Catalog for GenAI Governance is the AWS recommended centralized metadata repository for GenAI data pipelines, which supports registration of hetegeneous data sources, custom metadata tagging for attribution, and native integration with other GenAI services including Bedrock, OpenSearch Service, and Sagemaker, enabling unified data discovery, lineage tracking, and governance across the GenAI ecosystem.
Glue combines serverless data cataloging ETL, and built-in data quality capabilities to create end-to-end unstructured data processing pipelines for generative AAI use cases without the need to integrate disparate tools.
Textract
Amazon Textract for structured data extraction from scanned documents and Amazon A2I for human-in-the-loop review of low-confidence outputs, a common pattern for regulated GenAI workloads processing unstructured data.
Amazon Comprehend
Amazon Comprehend is a fully managed natural language processing service that includes pre-trained PII detection/redaction, toxicity detection and prompt safety classification capabilities, removing the need for developers to build, train, and maintain custom PII identification models for GenAI workloads that process customer data.
Multi-FM Routing Architecture
For amazon Bedrock, Centralized routing layers that pull dynamic configuration from a managed service enable organizations to switch between FMs (foundation models), run A/B tests, enforce regional compliance, and optimize cost without modifying client integrations or core application code.
flowchart LR
%% Client layer
A[Client Application] --> B[Amazon API Gateway REST API]
%% API Gateway to Lambda
B --> C[Lambda Router Function]
%% AppConfig integration
C --> D[AWS AppConfig<br/>FM Selection Config]
%% Decision logic
D --> E{Model Routing Logic}
%% Foundation models (Bedrock)
E -->|chat| F1[Amazon Bedrock<br/>Claude]
E -->|summarize| F2[Amazon Bedrock<br/>Nova]
E -->|code| F3[Amazon Bedrock<br/>Llama]
%% Responses
F1 --> G[Response]
F2 --> G
F3 --> G
%% Return path
G --> B
B --> A
Serverless Practices:
- Using Lambda with the AppConfig Agent layer provides low-latency, high-concurency support for real-time generative AI use casees while reducing operational overhead, as the agent handles configuration caching and updates automatically without custom code.
Bedrock Guardrails
There are 5 types of guardrails that can be implemented with Amazon Bedrock to ensure responsible and compliant use of generative AI models:
| Guardrail Type | Purpose | What it Blocks | Example Action |
|---|---|---|---|
| Content Filters | Detect harmful/sensitive content | configure filter strength for Hate, Insults, Sexual, Violence, Misconduct, Prompt attack | Refuse or rewrite response |
| Denied Topics | Business policy enforcement | Finance advice, medical diagnosis, internal data, we can add topics and their definition | Hard refusal |
| PII Redaction | Protect personal data | Emails, phone numbers, IDs, addresses, we can add the type or using regex | Mask or remove data |
| Word Filters | We can specify the words to filter | Offensive language, slurs, company name, we can add them or upload from files | Refuse or rewrite response |
| Grounding Control | Prevent hallucination | Unsupported facts not in context, we can set the grounding score threshold and relevence threshold for that | Refuse or restrict to sources |
Guardrail can be applied on input prompts or output responses. It can be used with Bedrock invocation like IncokeModel/Converse or Independent Evaluation ApplyGuardrail API. When a guardrail is triggered, the system can either refuse to process the request, sanitize the input/output, or provide a warning message or masked response, depending on the severity and type of violation.
Prompt Engineering Best Practices
Version Control and Collaboration
Amazon Bedrock Prompt Management enables centralized, versioned management of prompt templates and use case-specific variant across teams and access controls for approval workflows, eliminating the need for custom prompt orchestration infrastructure. We can test directly with the FM models and iterate on prompt design in a collaborative environment, ensuring consistency and quality across applications.
Hallucination Mitigation
To mitigate hallucination in generative AI applications
- We can use grounding techniques such as retrieval-augmented generation (RAG) to provide the model with relevant context from a trusted knowledge base.
- Configure Bedrock guardrails to set grounding score thresholds that restrict responses to be based on provided context.
- Use Chain-of-Thought with fact verification steps to encourage the model to reason through its response and check facts against the context before finalizing the answer.
Rag Pipeline
Vector Store Options for RAG
There are several service that we can use as a vector store for RAG (retrieval augmented generation) applications. We need to choose the storage options based on scale, latency, operational overhead, and integration capabilities for generative AI RAG workloads. This includes understanding the trade-offs between fully manage serverless options like OpenSearch Serverless, self-managed relational options like Aurora PostgreSQL with pgvector, specialty stores like Neptune Analytics, and object storage-based options like S3 Vectors.
| Option | Type | Management | Latency | Scale | Best For | Metadata Filtering |
|---|---|---|---|---|---|---|
| OpenSearch Serverless | Purpose-built vector + full-text | Fully managed, serverless | Low (ms) | Auto-scales | General RAG, hybrid search (vector + keyword) | Yes — use filter clause in the KNN query (e.g. {"term": {"metadata.field": "value"}}) to pre-filter documents before vector scoring |
| Aurora PostgreSQL + pgvector | Relational + vector extension | Self-managed (RDS/Aurora) | Low–medium | Vertical + read replicas | Apps already on PostgreSQL, structured + vector queries | Yes — standard SQL WHERE clause on any column alongside the <=> vector distance operator (e.g. WHERE category = 'finance' ORDER BY embedding <=> $1) |
| Neptune Analytics | Graph + vector | Fully managed | Low | Managed cluster | Graph-based RAG, entity relationship queries | Yes — filter by graph properties in openCypher queries before or after vector similarity (e.g. WHERE n.type = 'document') |
| S3 Vectors | Object storage + vector index | Fully managed, serverless | Medium | Massive (exabyte) | Low-cost, infrequent-access, large-scale archival RAG | Yes — attach key-value metadata to each vector at ingest; pass a filter expression in the QueryVectors API call to narrow candidates before ANN search |
Bedrock Knowledge Bases Integration
Knowledge Bases for Amazon Bedrock provide native integration with supported vector stores to eliminate custom RAG workflow development, reduce operational overhead, and simplify connecting enterprise data to foundation models.
Bedrock Knowledge Bases is a managed RAG orchestration layer that sits on top of those same vector stores. You point it at an S3 bucket or other data source and a supported vector store, and Bedrock handles:
- automatic chunking & embedding (You can chose from a variety of embedding models, and configure chunk size and overlap to optimize for your data and use case)
- syncing new documents
- retrieval + reranking
- injecting context into the prompt before calling the FM
- Source attribution to link generated responses to original source documents.
The bedrock knowledge base native reranking is a fully managed built-in feature of knowledge bases for Amazon Bedrock that improves retrieval relevance by reordering initially semantically retrieved chunks based on contextual match to the user query.
It supports granular access controls and cross-account querying, making it ideal for multi-tenant GenAI service use cases where customers retain data ownership.
Metadata Filtering
Bedrock Knowledge Bases supports metadata filtering allows users to index metadata from S3 data sources including system attributes like upload date and custom user-defined attributes like content type, and apply filter expressions during retrieval to narrow search scope, improving result relevance and reducing latency without modifying core models or architectures.
RetrieveAndGenerate
We ofte use retrieve api and retrieve-and-generate API to connect Bedrock models to the knowledge base. The retrieve-and-generate API allows us to pass retrieved context directly into the model prompt and get a response in a single call, while the retrieve API gives us more control by allowing us to handle the retrieved context separately before invoking the model.
It includes query embeddings, relevant chunk retrieval, context window optimization, and FM inference, eliminating the need for custom contect management logic.
Bedrock knowledge Base supports configurable tradeoffs between retrieval accuracy and latency via the PerformanceConfig parameter. Setting teh latency optimization option is a native, managed way to reduce response times for real-time use cases without requiring custom infrastructure changes.
Bedrock Flows
Amazon Bedrock Flows is a low-code orchestration feature for building generative AI workflows, which supports integration of menaged prompts, guardrails, foundation models, and custom logic to streamline end-to-end generative AI application development.
Streaming
Amazon Bedrock Streaming API (The InvokeModelWithResponseStream API) enables incremental delivery of foundation model generated content as it is produced, rather than waiting for the full response to be complete.
Amazon API Gateway WebSocket APIs provide persistent bidirectional communication between clients and backend services, supporting low-latency delivery of incremental updates without the overhead of stateless HTTP polling. This is the recommended API type for long-run real-time interactions with generative AI applications.
A classic architecture pattern for streaming GenAI applications is to use API Gateway WebSocket APIs with an AWS Lambda integration. Configure the WebSocket API to invoke the Amazon Bedrock InvokeModelWith ResponseStream API and stream partial respones through WeSocket connections.
sequenceDiagram
participant Client
participant APIGW as API Gateway<br/>WebSocket API
participant Lambda
participant Bedrock as Amazon Bedrock<br/>InvokeModelWithResponseStream
Client->>APIGW: WebSocket connect ($connect)
APIGW->>Lambda: invoke $connect handler
Lambda-->>APIGW: 200 OK (connection established)
APIGW-->>Client: connection ID assigned
Client->>APIGW: send message (prompt)
APIGW->>Lambda: invoke $default handler<br/>+ connectionId + prompt
Lambda->>Bedrock: InvokeModelWithResponseStream(prompt)
loop streaming chunks
Bedrock-->>Lambda: chunk 1 "The answer..."
Lambda->>APIGW: PostToConnection(connectionId, chunk 1)
APIGW-->>Client: chunk 1
Bedrock-->>Lambda: chunk 2 " is 42..."
Lambda->>APIGW: PostToConnection(connectionId, chunk 2)
APIGW-->>Client: chunk 2
end
Bedrock-->>Lambda: stream complete
Lambda->>APIGW: PostToConnection(connectionId, [DONE])
APIGW-->>Client: [DONE]
Client->>APIGW: WebSocket disconnect ($disconnect)
HTTP API vs REST API
- HTTP API — you just need a fast, cheap proxy to Lambda or HTTP backends with basic auth
- REST API — you need advanced features: throttling, caching, WAF, API keys, request transformation
Inference
CRI
Bedrock Cross-Region Inference uses geographic inferencee profiles to let users access foundation models from their home region, with enforceable guarantees that all inference data remains within the specified geographic boundry, eliminating the need to directly invoke models in other regions an simplifying data residency compliance.
Inference profile is a named handle for CRI - e.g. eu.amazon.nova-pro-v1:0 covers all EU regions that serve the model.
Single model invoke
client.invoke_model(
modelId="amazon.nova-pro-v1:0", # no region prefix — just the model ID
...
)
CRI-based invoke
client.invoke_model(
modelId="eu.amazon.nova-pro-v1:0", # ← the eu. prefix tells Bedrock: use CRI
...
)
Request Batching
Request batching for Foundation model inference is a performance optimiwation technique that groups multiple independent inference requests into a simple API call to reduce per-request API overhead, increase effective throughput, and lower overal latency for high-volumne real-time workloads.
Bedrock App
Agent Core
Amazon Bedrock AgentCore is a fully managed service that natively manages persistent memory across agent interactions, session-aware reasoning, built-in IAM integration for access control and session-based permissions, and supports both synchronous and event-driven invocation (you trigger the agent via an event (e.g., an S3 upload, an SQS message, a scheduled trigger) and don’t wait for the response.) with built-in observability and event handling. This directly addresses all core requirements in the scenario without custom development, making it a highly scalable choice.
Bedrock AgentCore natively integrates with common AWS services including Lambda, API Gateway, and EventBridge to register and invoke custom tool actions without custom orchestration code, improving scalability, improving scability, reducing operational overhead.
Bedrock Agents include built-in tracing functionality that captures full interaction logs, FM reasoning steps, and RAG source references out of the box, eliminating the need to build custom audit logging pipelines for compliance use cases.
Bedrock Throughput Routing
Provisioned throughput provides dedicated model capacity for consistent, predictable low-latency performance at scale (auto scaling). To use purchased provisioned thoughput, invoke requests must reference the unique provisioned model ARN returned by the CreateProvisionedModelThroughput API, rather than the public base model ID which defaults to shared on-demand capacity.
The ModelID parameter in Bedrock Runtime invoke operations accepts three typese of values: public base model IDs, custom finetuned model ARNs; and provisioned model ARNs. The value passed to this parameter determinses which capacity pool and model instance is use to process the request.
Bedrock Retry
Exponential backoff with jitter is the official recommended retry strategy for Bedrock API calls to avoid thundering herd effects, reduce throttling events, and improve invocation success rates during peak usage.
Circuit breaker patterns can be implemented using AWS Step Functions to monitor Bedrock API call success rates and automatically pause or reroute traffic when error thresholds are exceeded, providing an additional layer of protection against cascading failures in generative AI applications.
Bedrock Streaming Response Resilience. It resumes streaming responses from the last sucessfully received chunk instead of restarting the entire stream minimizes redundant data transfer, eliminates broken response chunks.
Vector Database
Amazon MemoryDB for Redis is an in-memory, Redis-compatible database with native k-NN vector search support. The Flat index algorithm performs exact brute-force comparison of query vectors against all stored vectors, devlivering 100 percent accurracy that meets the maximum accuracy requirement.
Bedrock Intelligent Prompt Routing
This is a native managed Bedrock feature that automaticaly routes user prompts to the most appropriate foundation model based on prompt content, cost constraints, and performance requirements, eliminating the need for custom routing logic.
Routing high-volume low-complexity queries to smaller low-cost models and low-volumne high-complexity quries to large higher-cost models reduce overall inference spend while maintaining output quality.
Evaluation
Human-in-the-loop
For regulated, high-risk use cases like clinical decision support, a hybrid evaluation approach combining automated LLM-as-judge screening and targeted human review delivers the optimal balance of accuracy, cost efficiency, and compliance as fully automated evaluation may miss critical edge cases while full human review is prohibitively expensive.
Amazon bedock provides pre-configured, purpose-built evaluation capabilities for RAG workflows, including metrics for retrieval precission, response relevance, and hallucination detection, which can be used to monitor and optimize the performance of generative AI applications over time.
Bedrock Managed Model Evaluation
This native capacity supports evaluation of Bedrock foundation models using custom datasets and pre-built accuracy metrics for common generative AI tasks, eliminating the need to build custom evaluation pipelines and reducing operational overhead.
Bedrock Evaluation
Bedrock Evaluation enable automated, scalable assessment oflarge language model outputs against custom or built-in qulity metrics using judge models, reducing reliance on full manual review.
Evaluation type
Bedrock supports two type primary RAG evaluation types:
- Retrieve-only: Accesses only retrieval component performance.
- Retrieve-and-generate: Accesss end-to-end RAG performance including both retrieval and generation, which is required for full system validation.
Metric
- Precision@K: P@k is a standard retrieval performance metric that measures the share of relevant chunks in the top K retrieved results for a given query, making it directly useful for comparing the effectiveness of different chunking strategies, as chunking design directly impacts retrieval relevance. Precision@K is a ranking metric that answers: “Of the top K results I returned, how many were actually relevant?”
Generative AI Evaluation Metrics
There are several official metrics used to understand model performance.
- BERT Score: BERT Score is a standard accuracy metric for summarization that measures semantic similarity between generated outputs and reference summaries.
- RWK Score: Real World Knowledge (RWK) Score measures factual accuracy of text generation outputs, both directly aligned to the ecommerce use case accuracy requirement.
BERT Score answers “does this sound like the reference?” while RWK Score answers “is this factually correct?”
Amazon Augmented AI (A2I)
A workflow integration capability that routes low-confidence, high-risk, or flagged generative AI outputs to human reviewers for targeted validation, balancing automation efficiency with human oversight to reduce risk in high-stakes use cases.
Deployment
Metric-Gated Canary Deployment with Step Functions
Traffic shifts incrementally (10% → 25% → 50% → 75% → 100%). After each shift, a Lambda queries CloudWatch and decides whether to continue or rollback.
flowchart TD
A([CI/CD trigger]) --> B[ShiftTrafficLambda\ncanary = 10%]
B --> C[⏱ Wait State\nStep Functions — free idle time]
C --> D[MetricEvaluatorLambda\nquery CloudWatch:\nerror rate · p99 latency · alarms]
D --> E{Quality Gate}
E -- FAIL --> F[RollbackLambda\ncanary = 0%]
E -- PASS\ncanary < 100% --> G[ShiftTrafficLambda\ncanary += next step]
E -- PASS\ncanary = 100% --> H[PromoteLambda\nremove old version]
G --> C
F --> I[SNS: Rollback alert]
H --> J[SNS: Deploy success]
Step Functions orchestrates the loop — Wait states avoid billing Lambda during idle stabilization windows. Each MetricEvaluatorLambda invocation checks error rate (> 1% → rollback), P99 latency (> 2× baseline → rollback), and any composite CloudWatch alarm in ALARM state.
Step Functions
AWS Step Functions is a serverless orchestration service that enables developers to build and coordinate complex workflows using visual state machines. It provides native support for error handling, retries, parallel execution, and integration with over 200 AWS services, making it an ideal choice for orchestrating multi-step generative AI applications that require coordination between various components such as data preprocessing, model inference, post-processing, and monitoring. For fixed, deterministic linear workflows, step functions is more cost-effective and simpler than agent-based tools.
Step functions has a Payload Quota which enforce a hard 256KB limit for data passed directly between states in a workflow. We can pass references to external storage (e.g., S3 object keys) instead of large payloads to work around this limit for generative AI workflows that involve large inputs or outputs.
ResultSelector and ResultPath are native Step Functions features that let users filter, reshape, and rout task outputs between states without custom code.
Step Function Callback Pattern
The AWS-recommanded partttern for adding human approval steps to serverless workflows, which uses the waitForTaskToken parameter to pause execution until an external process (including human review) completes, eliminating the need for custom state management logic.
Native Integration and Intrinsic Functions
Native service integratiosn allow Step Functions to call AWS services like S3 and Amazon Bedrock directly without wrapping calls in Lambda, reducing operational overhead. Intrinsic functions enable data parsing and transformation directly within Pass states, eliminating the need for external compute for simple data operations.
Features
SageMaker Feature store is a centralized repository for storing, sharing, and reusing ML features. It solves the problem of teams recomputing the same features independently and ensures training/serving consistency. It has online and offline feature stores, supports feature versioning, and integrates with SageMaker training and inference services.
The feature store solves the train/serve skew — if your training pipeline computes user_avg_spend_30d differently than your inference service, your model performs well offline but poorly in production. A feature store enforces a single computation that both use.
Workflow Orchestration
Step Functions provides native support for parallel execution of independent tasks, which is a critical parttern for reducing end-to-end latency for multi-steo generative AI workloads that require multiple foundation model calls.
Audit and Compliance
IAM
Implementing least-privilege IAM policies to enforce approval workflows, restrict prompt modification and publishing to authorized approver roles, and enforce consistent quality standards for GenAI content.
SCP
Service Control Policies (SCPs) are a type of policy that you can use to manage permissions in your AWS organization. It explicit denies in SCPs override any allowed permissions at the IAM level,so required resources such as approved Bedrock inference profiles must be explicitly excluded from deny rules to enable access while maintaining broader governance controls.
Bedrock Permissions
Permissions for Bedrock operations can be scoped to specific resources, including per-region model ARNs and CRI inference profiles, allowing administrators to enforce approved access methods such as CRI only, rather than unregulated direct cross-region model access.
Bedrock Guardrails
Every time a guardrail triggers (or even evaluates), Bedrock logs it. This creates a record of what was blocked, when, and why. We can also use custom blocked responses to include unique identifiers or codes in the logs for easier searching and monitoring of specific guardrail events.
Obervability
Aws CloudTrail
CloudTrail provides comprehensive logging of all API calls, including those to Amazon Bedrock and related services. It can log all API actions for Amazon bedrock, including prompt creation, modification, access, and usage, to meet audit trail requirements for generative AI governance.
Immutable Audit Trails: CloudTrail logs are immutable and stored in S3, ensuring that audit records of Bedrock interactions cannot be tampered with, which is critical for compliance and forensic investigations.
CloudWatch
CloudWatch Unified Obervability for Managed Services. It provides integrated, fully managed capabilities for custom dashboarding, log analysis via Logs Insights, and threshold alerting via CloudWatch Alarms, making it the lowest-overhead monitoring solution for AWS-native services like Bedrock.
Sensitive Data Classification
This core concept covers the use of automated tools like Amazon Macie to discover, classify, and protect sensitive data used in GenAI input and output datasets to meet regulatory data protection requirements for PII, financial data, and other regulated data types.
CloudWatch Anomaly Detection
CloudWatch Anomaly Detection for Bedrock Metrics is a fully managed and machine learning-powered CloudWatch feature that uses machine learning to establish normal baselines for bedrock token usage and performance metrics, automatically flagging deviations that indicate cost anomalies or unexpected usage without requiring manual static threshold configuration.
CloudWatch Embedded Metrics Format
EMF is a format that lets you embed custom metrics directly in log data, which is automatically extracted by CloudWatch for monitoring and alerting. EMF supports custom dimensions to segment metrics by attributes like user segment or request type, enabling correlation of performance issues to specific user groups or user cases. EMF is a format that lets you embed custom metrics directly in log data.
CloudWatch Application Insights
CloudWatch Application Insights helps you monitor your applications that use Amazon EC2 instances along with other application resources.
CloudWatch Logs Insights
CloudWatch Logs Insights, you can interactively search and analyze your log data in Amazon CloudWatch Logs. You can perform queries to help you more efficiently and effectively respond to operational issues.
Bedrock Invocation Logging
A native bedrock feature that logs all model invocation details including input content, output content, token counts, and invocation metadata to S3 or CloudWatch Logs with minimal configuration, supporting compliance, auditing, and performance monitoring without custom logging code.
Architectural for Generative AI Applications
The API Gateway REST API
- Auth at the edge — IAM, Cognito, or API keys before any Lambda invocation, so the Lambda doesn’t handle auth logic
- Throttling — prevents Lambda from being overwhelmed if traffic spikes, which matters when each invocation calls a Bedrock FM (expensive + has its own rate limits)
- Stable public endpoint — the client always hits the same URL regardless of what’s behind it; you can swap Lambda versions, add canary deployments, or change routing logic without clients noticing
- Request validation — reject malformed payloads before they reach Lambda, reducing unnecessary Bedrock calls
VPC
Interface VPC endpoints for Amazon bedrock enable private access to bedrock runtime and control plane APIs without routing traffic over the public internet, this is important for processing sensitive data.
Lambda functions deployed to private VPC subnets can access managed AWS services via VPC endpoints without requiring public internet access, which is a standard architecture pattern for secure, compliant GenAI applications. However, we need to ensure that the Lambda execution role has the necessary permissions to use the VPC endpoints and that the security groups and network ACLs (Access control list: stateless firewall rules at the subnet level in a VPC) are configured to allow traffic between the Lambda functions and the endpoints.
flowchart TB
Internet([Internet]) --> IGW
subgraph VPC["VPC (10.0.0.0/16)"]
IGW[Internet Gateway]
subgraph PublicSubnet["Public Subnet (10.0.1.0/24)"]
NATGW[NAT Gateway]
ALB[Application Load Balancer]
end
subgraph PrivateSubnet["Private Subnet (10.0.2.0/24)"]
Lambda[Lambda Function]
ECS[ECS Container]
end
subgraph DataSubnet["Data Subnet (10.0.3.0/24)"]
RDS[(Aurora PostgreSQL\npgvector)]
OpenSearch[(OpenSearch\nServerless)]
end
subgraph VPCEndpoints["VPC Endpoints (PrivateLink)"]
EP_S3[S3 Gateway Endpoint]
EP_Bedrock[Bedrock Interface Endpoint]
EP_SM[Secrets Manager Endpoint]
end
IGW --> ALB
ALB --> Lambda
ALB --> ECS
Lambda -- private traffic --> RDS
Lambda -- private traffic --> OpenSearch
ECS -- private traffic --> RDS
Lambda -- no internet needed --> EP_Bedrock
Lambda -- no internet needed --> EP_S3
Lambda -- fetch secrets --> EP_SM
Lambda -- outbound to internet --> NATGW
ECS -- outbound to internet --> NATGW
NATGW --> IGW
end
EP_Bedrock --> Bedrock[Amazon Bedrock]
EP_S3 --> S3[Amazon S3]
EP_SM --> SecretsManager[Secrets Manager]
- Public subnet — ALB and NAT Gateway have public IPs; these are the only resources exposed to the internet
- Private subnet — Lambda and ECS have no public IPs; inbound only via ALB, outbound via NAT Gateway
- Data subnet — RDS and OpenSearch are fully isolated, reachable only from within the VPC
- VPC Endpoints (PrivateLink) — Lambda calls Bedrock, S3, and Secrets Manager without traffic ever leaving AWS’s network, no NAT Gateway needed for those calls
- NAT Gateway — allows private resources to initiate outbound internet connections (e.g. downloading packages) without being publicly reachable
VPC endpoint physically is an ENI (Elastic Network Interface) — a private IP address — that AWS creates inside your subnet. When Lambda calls bedrock-runtime.amazonaws.com, DNS resolves that hostname to the private IP of the ENI instead of the public IP. So the request goes through your VPC’s private network directly to Bedrock.
Without endpoint: bedrock-runtime.us-east-1.amazonaws.com → 52.x.x.x (public) With endpoint: bedrock-runtime.us-east-1.amazonaws.com → 10.0.2.x (your private subnet)
and the private IP will use AWS PrivateLink technology to securely route the request to the Bedrock service without exposing it to the public internet.
LF-tagging
AWS Lake formation is a managed service for building, securing, and managing a data lake on s3. Without Lake Formation, securing a data lake means managing S3 bucket policies, IAM policies, Glue catalog permissions, and Athena/Redshift access separately — it becomes a mess at scale. Lake Formation centralizes all of that. Lake Formation LF-tag based access controal is purpose-built for fine-grained, cross-account permissions at the table and column level for governed data lakes across multiple AWS accounts.
Storage
S3 Metadata Types
S3 supports three core metadata categories:
- System metadata: auto-managed by S3 for properties like timestamps and object size.
- User-defined metadata: custom key-value pairs that you can assign to objects for application-specific purposes
- Object tags: Searchable, indexable key-value pairs for categorization and access control. This enable flexible, structured metadata implementation for unstructured data assests.
S3 Object Lock
Prevents data from being deleted or overwritten for a set retention period.
- Ensures stored data is tamper-proof for compliance
- Two modes: Governance (admins can override) and Compliance (nobody can override)
- Required by regulations like GDPR, HIPAA that mandate data integrity
Amazon Q
Amazon Q Business is AWS’s fully managed, enterprise-focused AI assistant — think of it as a “ChatGPT for your company’s internal data.”
Amazon Q business provides native support for access control list (ACL) files for S3 data soruces, which map S3 object and prefixes to IAM identity Center users and groups to enforce fine-grained access to indexed content at query time, eliminating the need for custom control logic.
MCP
MCP defines two primary transport types:
- STDIO: For local, colocated MCP servers.
- Streamablr HTTP for remote, network-accessible MCP servers.
When MCP servers handle sensitive user data, service-to-service access controls are insufficient. Per-request end-user authorization, implemented via standards like OAuth 2.1 with identity providers such as Amazon Cognito, is required to enforce least priviliege access for individual users.
Lambda is a supported runtime for MCP servers, paried with Amazon API Gateway HTTP API to provide a managed, secure endpoint for Streamable HTTP transport, with native support for streaming responses and integrated authorization controls.
Event-Driven ML
Event-Driven ML Automation is the practice of using serverless event services like Amazon EventBridge and AWS Lambda to trigger ML workflows in response to data events such as S3 object uploads, enabling continous model retraining as new data becomes available.
This includes the requirement to use EventBridge as an intermediary to trigger Step Functions workflows, as S3 does not support direct Step Function invocations.
Three common event-driven architectural patterns for generative AI applications include:
- SQS — one worker processes each job (order processing, async tasks)
- SNS — broadcast same message to many subscribers (notifications, alerts)
- EventBridge — route events based on content to the right target (MLOps pipelines, automation)
| Feature | SQS | SNS | EventBridge |
|---|---|---|---|
| Type | Queue (pull) | Topic (push) | Event bus (push) |
| Message consumers | One consumer per message | Multiple subscribers | Multiple targets |
| Delivery model | Consumer polls the queue | Fan-out to all subscribers | Rule-based routing |
| Filtering | No | Basic (attribute-based) | Advanced (content-based JSON rules) |
| Message retention | Up to 14 days | No retention | No retention |
| Ordering | Standard or FIFO | No | No |
| Replay | No | No | Yes (archive + replay) |
| Dead letter queue | Yes | Yes | Yes |
| Max message size | 256 KB | 256 KB | 256 KB |
| Throughput | Very high | Very high | High |
| External SaaS sources | No | No | Yes (Salesforce, Zendesk, etc.) |
| Schema registry | No | No | Yes |
| Use case | Decoupling, job queues, rate limiting | Fan-out notifications, alerts | Event-driven architecture, routing, automation |
CloudFormation
CloudFormation StackSets
StackSets extend CloudFormation functionality to enable central deployment of CloudFormation stacks across multiple AWS accounts and Regions in an organization, supporting consistent, automated resource provisioning at scale without per-account manual configuration.