Dynamic Configuration

AWS AppCongif is the recommended manged service for manging dynamic routing rules, feature flags, compliance policies, and cost thresholds for generatiev AI applications, as it supports zero-code deployments fo configuration changes, immediate propagation, and validation guardrails for high-risk updates.

Feature	AppConfig	Parameter Store
Prompt versioning	Built-in	Manual
Gradual rollout	Yes (canary, linear)	No
Rollback	One click	Manual
Validation hooks	Yes (JSON schema)	No
A/B testing	Yes	No
Cost	Slightly higher	Free tier
Best for	Frequent changes, safe rollouts	Stable config, simple use cases

Prompt Versioning

A classic prompt versioning strategy is to store prompt templates in AppConfig with semantic version numbers (e.g., “checkout-prompt v1.2.0”) and have the application fetch the latest version at runtime. This allows for iterative prompt improvements, A/B testing of different prompt versions, and quick rollbacks if a new prompt performs poorly, all without redeploying application code.

flowchart LR
    AC["AWS AppConfig\n(prompt templates)"]
    subgraph Runtime
        L["Lambda / ECS /\nSageMaker"]
    end
    B["Amazon Bedrock\n(Claude / other model)"]
    U["User Input"]

    AC -->|fetch template at runtime| L
    U -->|inject variables| L
    L -->|filled prompt| B
    B -->|response| L

Data Engineering

Glue

Aws Glue Data Catalog for GenAI Governance is the AWS recommended centralized metadata repository for GenAI data pipelines, which supports registration of hetegeneous data sources, custom metadata tagging for attribution, and native integration with other GenAI services including Bedrock, OpenSearch Service, and Sagemaker, enabling unified data discovery, lineage tracking, and governance across the GenAI ecosystem.

Glue combines serverless data cataloging ETL, and built-in data quality capabilities to create end-to-end unstructured data processing pipelines for generative AAI use cases without the need to integrate disparate tools.

Textract

Amazon Textract for structured data extraction from scanned documents and Amazon A2I for human-in-the-loop review of low-confidence outputs, a common pattern for regulated GenAI workloads processing unstructured data.

Amazon Comprehend

Amazon Comprehend is a fully managed natural language processing service that includes pre-trained PII detection/redaction, toxicity detection and prompt safety classification capabilities, removing the need for developers to build, train, and maintain custom PII identification models for GenAI workloads that process customer data.

Multi-FM Routing Architecture

For amazon Bedrock, Centralized routing layers that pull dynamic configuration from a managed service enable organizations to switch between FMs (foundation models), run A/B tests, enforce regional compliance, and optimize cost without modifying client integrations or core application code.

flowchart LR

%% Client layer
A[Client Application] --> B[Amazon API Gateway REST API]

%% API Gateway to Lambda
B --> C[Lambda Router Function]

%% AppConfig integration
C --> D[AWS AppConfig<br/>FM Selection Config]

%% Decision logic
D --> E{Model Routing Logic}

%% Foundation models (Bedrock)
E -->|chat| F1[Amazon Bedrock<br/>Claude]
E -->|summarize| F2[Amazon Bedrock<br/>Nova]
E -->|code| F3[Amazon Bedrock<br/>Llama]

%% Responses
F1 --> G[Response]
F2 --> G
F3 --> G

%% Return path
G --> B
B --> A

Serverless Practices:

Using Lambda with the AppConfig Agent layer provides low-latency, high-concurency support for real-time generative AI use casees while reducing operational overhead, as the agent handles configuration caching and updates automatically without custom code.

Bedrock Guardrails

There are 5 types of guardrails that can be implemented with Amazon Bedrock to ensure responsible and compliant use of generative AI models:

Guardrail Type	Purpose	What it Blocks	Example Action
Content Filters	Detect harmful/sensitive content	configure filter strength for Hate, Insults, Sexual, Violence, Misconduct, Prompt attack	Refuse or rewrite response
Denied Topics	Business policy enforcement	Finance advice, medical diagnosis, internal data, we can add topics and their definition	Hard refusal
PII Redaction	Protect personal data	Emails, phone numbers, IDs, addresses, we can add the type or using regex	Mask or remove data
Word Filters	We can specify the words to filter	Offensive language, slurs, company name, we can add them or upload from files	Refuse or rewrite response
Grounding Control	Prevent hallucination	Unsupported facts not in context, we can set the grounding score threshold and relevence threshold for that	Refuse or restrict to sources

Guardrail can be applied on input prompts or output responses. It can be used with Bedrock invocation like IncokeModel/Converse or Independent Evaluation ApplyGuardrail API. When a guardrail is triggered, the system can either refuse to process the request, sanitize the input/output, or provide a warning message or masked response, depending on the severity and type of violation.

Prompt Engineering Best Practices

Version Control and Collaboration

Amazon Bedrock Prompt Management enables centralized, versioned management of prompt templates and use case-specific variant across teams and access controls for approval workflows, eliminating the need for custom prompt orchestration infrastructure. We can test directly with the FM models and iterate on prompt design in a collaborative environment, ensuring consistency and quality across applications.

Hallucination Mitigation

To mitigate hallucination in generative AI applications

We can use grounding techniques such as retrieval-augmented generation (RAG) to provide the model with relevant context from a trusted knowledge base.
Configure Bedrock guardrails to set grounding score thresholds that restrict responses to be based on provided context.
Use Chain-of-Thought with fact verification steps to encourage the model to reason through its response and check facts against the context before finalizing the answer.

Rag Pipeline

Vector Store Options for RAG

There are several service that we can use as a vector store for RAG (retrieval augmented generation) applications. We need to choose the storage options based on scale, latency, operational overhead, and integration capabilities for generative AI RAG workloads. This includes understanding the trade-offs between fully manage serverless options like OpenSearch Serverless, self-managed relational options like Aurora PostgreSQL with pgvector, specialty stores like Neptune Analytics, and object storage-based options like S3 Vectors.

Option	Type	Management	Latency	Scale	Best For	Metadata Filtering
OpenSearch Serverless	Purpose-built vector + full-text	Fully managed, serverless	Low (ms)	Auto-scales	General RAG, hybrid search (vector + keyword)	Yes — use `filter` clause in the KNN query (e.g. `{"term": {"metadata.field": "value"}}`) to pre-filter documents before vector scoring
Aurora PostgreSQL + pgvector	Relational + vector extension	Self-managed (RDS/Aurora)	Low–medium	Vertical + read replicas	Apps already on PostgreSQL, structured + vector queries	Yes — standard SQL `WHERE` clause on any column alongside the `<=>` vector distance operator (e.g. `WHERE category = 'finance' ORDER BY embedding <=> $1`)
Neptune Analytics	Graph + vector	Fully managed	Low	Managed cluster	Graph-based RAG, entity relationship queries	Yes — filter by graph properties in openCypher queries before or after vector similarity (e.g. `WHERE n.type = 'document'`)
S3 Vectors	Object storage + vector index	Fully managed, serverless	Medium	Massive (exabyte)	Low-cost, infrequent-access, large-scale archival RAG	Yes — attach key-value metadata to each vector at ingest; pass a `filter` expression in the `QueryVectors` API call to narrow candidates before ANN search

Bedrock Knowledge Bases Integration

Knowledge Bases for Amazon Bedrock provide native integration with supported vector stores to eliminate custom RAG workflow development, reduce operational overhead, and simplify connecting enterprise data to foundation models.

Bedrock Knowledge Bases is a managed RAG orchestration layer that sits on top of those same vector stores. You point it at an S3 bucket or other data source and a supported vector store, and Bedrock handles:

automatic chunking & embedding (You can chose from a variety of embedding models, and configure chunk size and overlap to optimize for your data and use case)
syncing new documents
retrieval + reranking
injecting context into the prompt before calling the FM
Source attribution to link generated responses to original source documents.

The bedrock knowledge base native reranking is a fully managed built-in feature of knowledge bases for Amazon Bedrock that improves retrieval relevance by reordering initially semantically retrieved chunks based on contextual match to the user query.

It supports granular access controls and cross-account querying, making it ideal for multi-tenant GenAI service use cases where customers retain data ownership.

Metadata Filtering

Bedrock Knowledge Bases supports metadata filtering allows users to index metadata from S3 data sources including system attributes like upload date and custom user-defined attributes like content type, and apply filter expressions during retrieval to narrow search scope, improving result relevance and reducing latency without modifying core models or architectures.

RetrieveAndGenerate

We ofte use retrieve api and retrieve-and-generate API to connect Bedrock models to the knowledge base. The retrieve-and-generate API allows us to pass retrieved context directly into the model prompt and get a response in a single call, while the retrieve API gives us more control by allowing us to handle the retrieved context separately before invoking the model.

It includes query embeddings, relevant chunk retrieval, context window optimization, and FM inference, eliminating the need for custom contect management logic.

Bedrock knowledge Base supports configurable tradeoffs between retrieval accuracy and latency via the PerformanceConfig parameter. Setting teh latency optimization option is a native, managed way to reduce response times for real-time use cases without requiring custom infrastructure changes.

Bedrock Flows

Amazon Bedrock Flows is a low-code orchestration feature for building generative AI workflows, which supports integration of menaged prompts, guardrails, foundation models, and custom logic to streamline end-to-end generative AI application development.

Streaming

Amazon Bedrock Streaming API (The InvokeModelWithResponseStream API) enables incremental delivery of foundation model generated content as it is produced, rather than waiting for the full response to be complete.

Amazon API Gateway WebSocket APIs provide persistent bidirectional communication between clients and backend services, supporting low-latency delivery of incremental updates without the overhead of stateless HTTP polling. This is the recommended API type for long-run real-time interactions with generative AI applications.

A classic architecture pattern for streaming GenAI applications is to use API Gateway WebSocket APIs with an AWS Lambda integration. Configure the WebSocket API to invoke the Amazon Bedrock InvokeModelWith ResponseStream API and stream partial respones through WeSocket connections.

sequenceDiagram
    participant Client
    participant APIGW as API Gateway<br/>WebSocket API
    participant Lambda
    participant Bedrock as Amazon Bedrock<br/>InvokeModelWithResponseStream

    Client->>APIGW: WebSocket connect ($connect)
    APIGW->>Lambda: invoke $connect handler
    Lambda-->>APIGW: 200 OK (connection established)
    APIGW-->>Client: connection ID assigned

    Client->>APIGW: send message (prompt)
    APIGW->>Lambda: invoke $default handler<br/>+ connectionId + prompt

    Lambda->>Bedrock: InvokeModelWithResponseStream(prompt)

    loop streaming chunks
        Bedrock-->>Lambda: chunk 1 "The answer..."
        Lambda->>APIGW: PostToConnection(connectionId, chunk 1)
        APIGW-->>Client: chunk 1

        Bedrock-->>Lambda: chunk 2 " is 42..."
        Lambda->>APIGW: PostToConnection(connectionId, chunk 2)
        APIGW-->>Client: chunk 2
    end

    Bedrock-->>Lambda: stream complete
    Lambda->>APIGW: PostToConnection(connectionId, [DONE])
    APIGW-->>Client: [DONE]
    Client->>APIGW: WebSocket disconnect ($disconnect)

HTTP API vs REST API

HTTP API — you just need a fast, cheap proxy to Lambda or HTTP backends with basic auth
REST API — you need advanced features: throttling, caching, WAF, API keys, request transformation

Inference

CRI

Bedrock Cross-Region Inference uses geographic inferencee profiles to let users access foundation models from their home region, with enforceable guarantees that all inference data remains within the specified geographic boundry, eliminating the need to directly invoke models in other regions an simplifying data residency compliance.

Inference profile is a named handle for CRI - e.g. eu.amazon.nova-pro-v1:0 covers all EU regions that serve the model.

Single model invoke

client.invoke_model(
    modelId="amazon.nova-pro-v1:0",  # no region prefix — just the model ID
    ...
)

CRI-based invoke

client.invoke_model(
    modelId="eu.amazon.nova-pro-v1:0",  # ← the eu. prefix tells Bedrock: use CRI
    ...
)

Request Batching

Request batching for Foundation model inference is a performance optimiwation technique that groups multiple independent inference requests into a simple API call to reduce per-request API overhead, increase effective throughput, and lower overal latency for high-volumne real-time workloads.

Bedrock App

Agent Core

Amazon Bedrock AgentCore is a fully managed service that natively manages persistent memory across agent interactions, session-aware reasoning, built-in IAM integration for access control and session-based permissions, and supports both synchronous and event-driven invocation (you trigger the agent via an event (e.g., an S3 upload, an SQS message, a scheduled trigger) and don’t wait for the response.) with built-in observability and event handling. This directly addresses all core requirements in the scenario without custom development, making it a highly scalable choice.

Bedrock AgentCore natively integrates with common AWS services including Lambda, API Gateway, and EventBridge to register and invoke custom tool actions without custom orchestration code, improving scalability, improving scability, reducing operational overhead.

Bedrock Agents include built-in tracing functionality that captures full interaction logs, FM reasoning steps, and RAG source references out of the box, eliminating the need to build custom audit logging pipelines for compliance use cases.

Bedrock Throughput Routing

Provisioned throughput provides dedicated model capacity for consistent, predictable low-latency performance at scale (auto scaling). To use purchased provisioned thoughput, invoke requests must reference the unique provisioned model ARN returned by the CreateProvisionedModelThroughput API, rather than the public base model ID which defaults to shared on-demand capacity.

The ModelID parameter in Bedrock Runtime invoke operations accepts three typese of values: public base model IDs, custom finetuned model ARNs; and provisioned model ARNs. The value passed to this parameter determinses which capacity pool and model instance is use to process the request.

Bedrock Retry

Exponential backoff with jitter is the official recommended retry strategy for Bedrock API calls to avoid thundering herd effects, reduce throttling events, and improve invocation success rates during peak usage.

Circuit breaker patterns can be implemented using AWS Step Functions to monitor Bedrock API call success rates and automatically pause or reroute traffic when error thresholds are exceeded, providing an additional layer of protection against cascading failures in generative AI applications.

Bedrock Streaming Response Resilience. It resumes streaming responses from the last sucessfully received chunk instead of restarting the entire stream minimizes redundant data transfer, eliminates broken response chunks.

Vector Database

Amazon MemoryDB for Redis is an in-memory, Redis-compatible database with native k-NN vector search support. The Flat index algorithm performs exact brute-force comparison of query vectors against all stored vectors, devlivering 100 percent accurracy that meets the maximum accuracy requirement.

Bedrock Intelligent Prompt Routing

This is a native managed Bedrock feature that automaticaly routes user prompts to the most appropriate foundation model based on prompt content, cost constraints, and performance requirements, eliminating the need for custom routing logic.

Routing high-volume low-complexity queries to smaller low-cost models and low-volumne high-complexity quries to large higher-cost models reduce overall inference spend while maintaining output quality.

Evaluation

Human-in-the-loop

For regulated, high-risk use cases like clinical decision support, a hybrid evaluation approach combining automated LLM-as-judge screening and targeted human review delivers the optimal balance of accuracy, cost efficiency, and compliance as fully automated evaluation may miss critical edge cases while full human review is prohibitively expensive.

Amazon bedock provides pre-configured, purpose-built evaluation capabilities for RAG workflows, including metrics for retrieval precission, response relevance, and hallucination detection, which can be used to monitor and optimize the performance of generative AI applications over time.

Bedrock Managed Model Evaluation

This native capacity supports evaluation of Bedrock foundation models using custom datasets and pre-built accuracy metrics for common generative AI tasks, eliminating the need to build custom evaluation pipelines and reducing operational overhead.

Bedrock Evaluation

Bedrock Evaluation enable automated, scalable assessment oflarge language model outputs against custom or built-in qulity metrics using judge models, reducing reliance on full manual review.

Evaluation type

Bedrock supports two type primary RAG evaluation types:

Retrieve-only: Accesses only retrieval component performance.
Retrieve-and-generate: Accesss end-to-end RAG performance including both retrieval and generation, which is required for full system validation.

Metric

Precision@K: P@k is a standard retrieval performance metric that measures the share of relevant chunks in the top K retrieved results for a given query, making it directly useful for comparing the effectiveness of different chunking strategies, as chunking design directly impacts retrieval relevance. Precision@K is a ranking metric that answers: “Of the top K results I returned, how many were actually relevant?”

Generative AI Evaluation Metrics

There are several official metrics used to understand model performance.

BERT Score: BERT Score is a standard accuracy metric for summarization that measures semantic similarity between generated outputs and reference summaries.
RWK Score: Real World Knowledge (RWK) Score measures factual accuracy of text generation outputs, both directly aligned to the ecommerce use case accuracy requirement.

BERT Score answers “does this sound like the reference?” while RWK Score answers “is this factually correct?”

Amazon Augmented AI (A2I)

A workflow integration capability that routes low-confidence, high-risk, or flagged generative AI outputs to human reviewers for targeted validation, balancing automation efficiency with human oversight to reduce risk in high-stakes use cases.

Deployment

Metric-Gated Canary Deployment with Step Functions

Traffic shifts incrementally (10% → 25% → 50% → 75% → 100%). After each shift, a Lambda queries CloudWatch and decides whether to continue or rollback.

flowchart TD
    A([CI/CD trigger]) --> B[ShiftTrafficLambda\ncanary = 10%]
    B --> C[⏱ Wait State\nStep Functions — free idle time]
    C --> D[MetricEvaluatorLambda\nquery CloudWatch:\nerror rate · p99 latency · alarms]

    D --> E{Quality Gate}

    E -- FAIL --> F[RollbackLambda\ncanary = 0%]
    E -- PASS\ncanary < 100% --> G[ShiftTrafficLambda\ncanary += next step]
    E -- PASS\ncanary = 100% --> H[PromoteLambda\nremove old version]

    G --> C

    F --> I[SNS: Rollback alert]
    H --> J[SNS: Deploy success]

Step Functions orchestrates the loop — Wait states avoid billing Lambda during idle stabilization windows. Each MetricEvaluatorLambda invocation checks error rate (> 1% → rollback), P99 latency (> 2× baseline → rollback), and any composite CloudWatch alarm in ALARM state.

Step Functions

AWS Step Functions is a serverless orchestration service that enables developers to build and coordinate complex workflows using visual state machines. It provides native support for error handling, retries, parallel execution, and integration with over 200 AWS services, making it an ideal choice for orchestrating multi-step generative AI applications that require coordination between various components such as data preprocessing, model inference, post-processing, and monitoring. For fixed, deterministic linear workflows, step functions is more cost-effective and simpler than agent-based tools.

Step functions has a Payload Quota which enforce a hard 256KB limit for data passed directly between states in a workflow. We can pass references to external storage (e.g., S3 object keys) instead of large payloads to work around this limit for generative AI workflows that involve large inputs or outputs.

ResultSelector and ResultPath are native Step Functions features that let users filter, reshape, and rout task outputs between states without custom code.

Step Function Callback Pattern

The AWS-recommanded partttern for adding human approval steps to serverless workflows, which uses the waitForTaskToken parameter to pause execution until an external process (including human review) completes, eliminating the need for custom state management logic.

Native Integration and Intrinsic Functions

Native service integratiosn allow Step Functions to call AWS services like S3 and Amazon Bedrock directly without wrapping calls in Lambda, reducing operational overhead. Intrinsic functions enable data parsing and transformation directly within Pass states, eliminating the need for external compute for simple data operations.

Features

SageMaker Feature store is a centralized repository for storing, sharing, and reusing ML features. It solves the problem of teams recomputing the same features independently and ensures training/serving consistency. It has online and offline feature stores, supports feature versioning, and integrates with SageMaker training and inference services.

The feature store solves the train/serve skew — if your training pipeline computes user_avg_spend_30d differently than your inference service, your model performs well offline but poorly in production. A feature store enforces a single computation that both use.

Workflow Orchestration

Step Functions provides native support for parallel execution of independent tasks, which is a critical parttern for reducing end-to-end latency for multi-steo generative AI workloads that require multiple foundation model calls.

Audit and Compliance

IAM

Implementing least-privilege IAM policies to enforce approval workflows, restrict prompt modification and publishing to authorized approver roles, and enforce consistent quality standards for GenAI content.

SCP

Service Control Policies (SCPs) are a type of policy that you can use to manage permissions in your AWS organization. It explicit denies in SCPs override any allowed permissions at the IAM level,so required resources such as approved Bedrock inference profiles must be explicitly excluded from deny rules to enable access while maintaining broader governance controls.

Bedrock Permissions

Permissions for Bedrock operations can be scoped to specific resources, including per-region model ARNs and CRI inference profiles, allowing administrators to enforce approved access methods such as CRI only, rather than unregulated direct cross-region model access.

Bedrock Guardrails

Every time a guardrail triggers (or even evaluates), Bedrock logs it. This creates a record of what was blocked, when, and why. We can also use custom blocked responses to include unique identifiers or codes in the logs for easier searching and monitoring of specific guardrail events.

Obervability

Aws CloudTrail

CloudTrail provides comprehensive logging of all API calls, including those to Amazon Bedrock and related services. It can log all API actions for Amazon bedrock, including prompt creation, modification, access, and usage, to meet audit trail requirements for generative AI governance.

Immutable Audit Trails: CloudTrail logs are immutable and stored in S3, ensuring that audit records of Bedrock interactions cannot be tampered with, which is critical for compliance and forensic investigations.

CloudWatch

CloudWatch Unified Obervability for Managed Services. It provides integrated, fully managed capabilities for custom dashboarding, log analysis via Logs Insights, and threshold alerting via CloudWatch Alarms, making it the lowest-overhead monitoring solution for AWS-native services like Bedrock.

Sensitive Data Classification

This core concept covers the use of automated tools like Amazon Macie to discover, classify, and protect sensitive data used in GenAI input and output datasets to meet regulatory data protection requirements for PII, financial data, and other regulated data types.

CloudWatch Anomaly Detection

CloudWatch Anomaly Detection for Bedrock Metrics is a fully managed and machine learning-powered CloudWatch feature that uses machine learning to establish normal baselines for bedrock token usage and performance metrics, automatically flagging deviations that indicate cost anomalies or unexpected usage without requiring manual static threshold configuration.

CloudWatch Embedded Metrics Format

EMF is a format that lets you embed custom metrics directly in log data, which is automatically extracted by CloudWatch for monitoring and alerting. EMF supports custom dimensions to segment metrics by attributes like user segment or request type, enabling correlation of performance issues to specific user groups or user cases. EMF is a format that lets you embed custom metrics directly in log data.

CloudWatch Application Insights

CloudWatch Application Insights helps you monitor your applications that use Amazon EC2 instances along with other application resources.

CloudWatch Logs Insights

CloudWatch Logs Insights, you can interactively search and analyze your log data in Amazon CloudWatch Logs. You can perform queries to help you more efficiently and effectively respond to operational issues.

Bedrock Invocation Logging

A native bedrock feature that logs all model invocation details including input content, output content, token counts, and invocation metadata to S3 or CloudWatch Logs with minimal configuration, supporting compliance, auditing, and performance monitoring without custom logging code.

Architectural for Generative AI Applications

The API Gateway REST API

Auth at the edge — IAM, Cognito, or API keys before any Lambda invocation, so the Lambda doesn’t handle auth logic
Throttling — prevents Lambda from being overwhelmed if traffic spikes, which matters when each invocation calls a Bedrock FM (expensive + has its own rate limits)
Stable public endpoint — the client always hits the same URL regardless of what’s behind it; you can swap Lambda versions, add canary deployments, or change routing logic without clients noticing
Request validation — reject malformed payloads before they reach Lambda, reducing unnecessary Bedrock calls

VPC

Interface VPC endpoints for Amazon bedrock enable private access to bedrock runtime and control plane APIs without routing traffic over the public internet, this is important for processing sensitive data.

Lambda functions deployed to private VPC subnets can access managed AWS services via VPC endpoints without requiring public internet access, which is a standard architecture pattern for secure, compliant GenAI applications. However, we need to ensure that the Lambda execution role has the necessary permissions to use the VPC endpoints and that the security groups and network ACLs (Access control list: stateless firewall rules at the subnet level in a VPC) are configured to allow traffic between the Lambda functions and the endpoints.

flowchart TB
    Internet([Internet]) --> IGW
    
    subgraph VPC["VPC (10.0.0.0/16)"]
        IGW[Internet Gateway]

        subgraph PublicSubnet["Public Subnet (10.0.1.0/24)"]
            NATGW[NAT Gateway]
            ALB[Application Load Balancer]
        end

        subgraph PrivateSubnet["Private Subnet (10.0.2.0/24)"]
            Lambda[Lambda Function]
            ECS[ECS Container]
        end

        subgraph DataSubnet["Data Subnet (10.0.3.0/24)"]
            RDS[(Aurora PostgreSQL\npgvector)]
            OpenSearch[(OpenSearch\nServerless)]
        end

        subgraph VPCEndpoints["VPC Endpoints (PrivateLink)"]
            EP_S3[S3 Gateway Endpoint]
            EP_Bedrock[Bedrock Interface Endpoint]
            EP_SM[Secrets Manager Endpoint]
        end

        IGW --> ALB
        ALB --> Lambda
        ALB --> ECS
        Lambda -- private traffic --> RDS
        Lambda -- private traffic --> OpenSearch
        ECS -- private traffic --> RDS

        Lambda -- no internet needed --> EP_Bedrock
        Lambda -- no internet needed --> EP_S3
        Lambda -- fetch secrets --> EP_SM

        Lambda -- outbound to internet --> NATGW
        ECS -- outbound to internet --> NATGW
        NATGW --> IGW
    end

    EP_Bedrock --> Bedrock[Amazon Bedrock]
    EP_S3 --> S3[Amazon S3]
    EP_SM --> SecretsManager[Secrets Manager]

Public subnet — ALB and NAT Gateway have public IPs; these are the only resources exposed to the internet
Private subnet — Lambda and ECS have no public IPs; inbound only via ALB, outbound via NAT Gateway
Data subnet — RDS and OpenSearch are fully isolated, reachable only from within the VPC
VPC Endpoints (PrivateLink) — Lambda calls Bedrock, S3, and Secrets Manager without traffic ever leaving AWS’s network, no NAT Gateway needed for those calls
NAT Gateway — allows private resources to initiate outbound internet connections (e.g. downloading packages) without being publicly reachable

VPC endpoint physically is an ENI (Elastic Network Interface) — a private IP address — that AWS creates inside your subnet. When Lambda calls bedrock-runtime.amazonaws.com, DNS resolves that hostname to the private IP of the ENI instead of the public IP. So the request goes through your VPC’s private network directly to Bedrock.

Without endpoint: bedrock-runtime.us-east-1.amazonaws.com → 52.x.x.x (public) With endpoint: bedrock-runtime.us-east-1.amazonaws.com → 10.0.2.x (your private subnet)

and the private IP will use AWS PrivateLink technology to securely route the request to the Bedrock service without exposing it to the public internet.

LF-tagging

AWS Lake formation is a managed service for building, securing, and managing a data lake on s3. Without Lake Formation, securing a data lake means managing S3 bucket policies, IAM policies, Glue catalog permissions, and Athena/Redshift access separately — it becomes a mess at scale. Lake Formation centralizes all of that. Lake Formation LF-tag based access controal is purpose-built for fine-grained, cross-account permissions at the table and column level for governed data lakes across multiple AWS accounts.

Storage

S3 Metadata Types

S3 supports three core metadata categories:

System metadata: auto-managed by S3 for properties like timestamps and object size.
User-defined metadata: custom key-value pairs that you can assign to objects for application-specific purposes
Object tags: Searchable, indexable key-value pairs for categorization and access control. This enable flexible, structured metadata implementation for unstructured data assests.

S3 Object Lock

Prevents data from being deleted or overwritten for a set retention period.

Ensures stored data is tamper-proof for compliance
Two modes: Governance (admins can override) and Compliance (nobody can override)
Required by regulations like GDPR, HIPAA that mandate data integrity

Amazon Q

Amazon Q Business is AWS’s fully managed, enterprise-focused AI assistant — think of it as a “ChatGPT for your company’s internal data.”

Amazon Q business provides native support for access control list (ACL) files for S3 data soruces, which map S3 object and prefixes to IAM identity Center users and groups to enforce fine-grained access to indexed content at query time, eliminating the need for custom control logic.

MCP

MCP defines two primary transport types:

STDIO: For local, colocated MCP servers.
Streamablr HTTP for remote, network-accessible MCP servers.

When MCP servers handle sensitive user data, service-to-service access controls are insufficient. Per-request end-user authorization, implemented via standards like OAuth 2.1 with identity providers such as Amazon Cognito, is required to enforce least priviliege access for individual users.

Lambda is a supported runtime for MCP servers, paried with Amazon API Gateway HTTP API to provide a managed, secure endpoint for Streamable HTTP transport, with native support for streaming responses and integrated authorization controls.

Event-Driven ML

Event-Driven ML Automation is the practice of using serverless event services like Amazon EventBridge and AWS Lambda to trigger ML workflows in response to data events such as S3 object uploads, enabling continous model retraining as new data becomes available.

This includes the requirement to use EventBridge as an intermediary to trigger Step Functions workflows, as S3 does not support direct Step Function invocations.

Three common event-driven architectural patterns for generative AI applications include:

SQS — one worker processes each job (order processing, async tasks)
SNS — broadcast same message to many subscribers (notifications, alerts)
EventBridge — route events based on content to the right target (MLOps pipelines, automation)

Feature	SQS	SNS	EventBridge
Type	Queue (pull)	Topic (push)	Event bus (push)
Message consumers	One consumer per message	Multiple subscribers	Multiple targets
Delivery model	Consumer polls the queue	Fan-out to all subscribers	Rule-based routing
Filtering	No	Basic (attribute-based)	Advanced (content-based JSON rules)
Message retention	Up to 14 days	No retention	No retention
Ordering	Standard or FIFO	No	No
Replay	No	No	Yes (archive + replay)
Dead letter queue	Yes	Yes	Yes
Max message size	256 KB	256 KB	256 KB
Throughput	Very high	Very high	High
External SaaS sources	No	No	Yes (Salesforce, Zendesk, etc.)
Schema registry	No	No	Yes
Use case	Decoupling, job queues, rate limiting	Fan-out notifications, alerts	Event-driven architecture, routing, automation

CloudFormation

CloudFormation StackSets

StackSets extend CloudFormation functionality to enable central deployment of CloudFormation stacks across multiple AWS accounts and Regions in an organization, supporting consistent, automated resource provisioning at scale without per-account manual configuration.

Dynamic Configuration#

Prompt Versioning#

Data Engineering#

Glue#

Textract#

Amazon Comprehend#

Multi-FM Routing Architecture#

Serverless Practices:#

Bedrock Guardrails#

Prompt Engineering Best Practices#

Version Control and Collaboration#

Hallucination Mitigation#

Rag Pipeline#

Vector Store Options for RAG#

Bedrock Knowledge Bases Integration#

Metadata Filtering#

RetrieveAndGenerate#

Bedrock Flows#

Streaming#

HTTP API vs REST API#

Inference#

CRI#

Request Batching#

Bedrock App#

Agent Core#

Bedrock Throughput Routing#

Bedrock Retry#

Vector Database#

Bedrock Intelligent Prompt Routing#

Evaluation#

Human-in-the-loop#

Bedrock Managed Model Evaluation#

Bedrock Evaluation#

Evaluation type#

Metric#

Generative AI Evaluation Metrics#

Amazon Augmented AI (A2I)#

Deployment#

Metric-Gated Canary Deployment with Step Functions#

Step Functions#

Step Function Callback Pattern#

Native Integration and Intrinsic Functions#

Features#

Workflow Orchestration#

Audit and Compliance#

IAM#

SCP#

Bedrock Permissions#

Bedrock Guardrails#

Obervability#

Aws CloudTrail#

CloudWatch#

Sensitive Data Classification#

CloudWatch Anomaly Detection#

CloudWatch Embedded Metrics Format#

CloudWatch Application Insights#

CloudWatch Logs Insights#

Bedrock Invocation Logging#

Architectural for Generative AI Applications#

The API Gateway REST API#

VPC#

LF-tagging#

Storage#

S3 Metadata Types#

S3 Object Lock#

Amazon Q#

MCP#

Event-Driven ML#

CloudFormation#

CloudFormation StackSets#

Dynamic Configuration

Prompt Versioning

Data Engineering

Glue

Textract

Amazon Comprehend

Multi-FM Routing Architecture

Serverless Practices:

Bedrock Guardrails

Prompt Engineering Best Practices

Version Control and Collaboration

Hallucination Mitigation

Rag Pipeline

Vector Store Options for RAG

Bedrock Knowledge Bases Integration

Metadata Filtering

RetrieveAndGenerate

Bedrock Flows

Streaming

HTTP API vs REST API

Inference

CRI

Request Batching

Bedrock App

Agent Core

Bedrock Throughput Routing

Bedrock Retry

Vector Database

Bedrock Intelligent Prompt Routing

Evaluation

Human-in-the-loop

Bedrock Managed Model Evaluation

Bedrock Evaluation

Evaluation type

Metric

Generative AI Evaluation Metrics

Amazon Augmented AI (A2I)

Deployment

Metric-Gated Canary Deployment with Step Functions

Step Functions

Step Function Callback Pattern

Native Integration and Intrinsic Functions

Features

Workflow Orchestration

Audit and Compliance

IAM

SCP

Bedrock Permissions

Bedrock Guardrails

Obervability

Aws CloudTrail

CloudWatch

Sensitive Data Classification

CloudWatch Anomaly Detection

CloudWatch Embedded Metrics Format

CloudWatch Application Insights

CloudWatch Logs Insights

Bedrock Invocation Logging

Architectural for Generative AI Applications

The API Gateway REST API

VPC

LF-tagging

Storage

S3 Metadata Types

S3 Object Lock

Amazon Q

MCP

Event-Driven ML

CloudFormation

CloudFormation StackSets