Dynamic Configuration
- AWS AppCongif is the recommended manged service for manging dynamic routing rules, feature flags, compliance policies, and cost thresholds for generatiev AI applications, as it supports zero-code deployments fo configuration changes, immediate propagation, and validation guardrails for high-risk updates.
Multi-FM Routing Architecture
For amazon Bedrock, Centralized routing layers taht pull dynamic configuration from a managed service enable organizations to switch between FMs (foundation models), run A/B tests, enforce regional compliance, and optimize cost without modifying client integrations or core application code.
flowchart LR
%% Client layer
A[Client Application] --> B[Amazon API Gateway REST API]
%% API Gateway to Lambda
B --> C[Lambda Router Function]
%% AppConfig integration
C --> D[AWS AppConfig<br/>FM Selection Config]
%% Decision logic
D --> E{Model Routing Logic}
%% Foundation models (Bedrock)
E -->|chat| F1[Amazon Bedrock<br/>Claude]
E -->|summarize| F2[Amazon Bedrock<br/>Nova]
E -->|code| F3[Amazon Bedrock<br/>Llama]
%% Responses
F1 --> G[Response]
F2 --> G
F3 --> G
%% Return path
G --> B
B --> A
Serverless Practices:
- Using Lambda with the AppConfig Agent layer provides low-latency, high-concurency support for real-time generative AI use casees while reducing operational overhead, as the agent handles configuration caching and updates automatically without custom code.
Bedrock Guardrails
There are 5 types of guardrails that can be implemented with Amazon Bedrock to ensure responsible and compliant use of generative AI models:
| Guardrail Type | Purpose | What it Blocks | Example Action |
|---|---|---|---|
| Content Filters | Detect harmful/sensitive content | configure filter strength for Hate, Insults, Sexual, Violence, Misconduct, Prompt attack | Refuse or rewrite response |
| Denied Topics | Business policy enforcement | Finance advice, medical diagnosis, internal data, we can add topics and their definition | Hard refusal |
| PII Redaction | Protect personal data | Emails, phone numbers, IDs, addresses, we can add the type or using regex | Mask or remove data |
| Word Filters | We can specify the words to filter | Offensive language, slurs, company name, we can add them or upload from files | Refuse or rewrite response |
| Grounding Control | Prevent hallucination | Unsupported facts not in context, we can set the grounding score threshold and relevence threshold for that | Refuse or restrict to sources |
Guardrail can be applied on input prompts or output responses. It can be used with Bedrock invocation like IncokeModel/Converse or Independent Evaluation ApplyGuardrail API. When a guardrail is triggered, the system can either refuse to process the request, sanitize the input/output, or provide a warning message or masked response, depending on the severity and type of violation.
Prompt Engineering Best Practices
Amazon bedrock Prompt Management enables centralized, versioned management of prompt templates and use case-specific variant across teams and access controls for approval workflows, eliminating the need for custom prompt orchestration infrastructure. We can test directly with the FM models and iterate on prompt design in a collaborative environment, ensuring consistency and quality across applications.
Rag Pipeline
Vector Store Options for RAG
There are several service that we can use as a vector store for RAG (retrieval augmented generation) applications. We need to choose the storage options based on scale, latency, operational overhead, and integration capabilities for generative AI RAG workloads. This includes understanding the trade-offs between fully manage serverless options like OpenSearch Serverless, self-managed relational options like Aurora PostgreSQL with pgvector, specialty stores like Neptune Analytics, and object storage-based options like S3 Vectors.
| Option | Type | Management | Latency | Scale | Best For | Metadata Filtering |
|---|---|---|---|---|---|---|
| OpenSearch Serverless | Purpose-built vector + full-text | Fully managed, serverless | Low (ms) | Auto-scales | General RAG, hybrid search (vector + keyword) | Yes — use filter clause in the KNN query (e.g. {"term": {"metadata.field": "value"}}) to pre-filter documents before vector scoring |
| Aurora PostgreSQL + pgvector | Relational + vector extension | Self-managed (RDS/Aurora) | Low–medium | Vertical + read replicas | Apps already on PostgreSQL, structured + vector queries | Yes — standard SQL WHERE clause on any column alongside the <=> vector distance operator (e.g. WHERE category = 'finance' ORDER BY embedding <=> $1) |
| Neptune Analytics | Graph + vector | Fully managed | Low | Managed cluster | Graph-based RAG, entity relationship queries | Yes — filter by graph properties in openCypher queries before or after vector similarity (e.g. WHERE n.type = 'document') |
| S3 Vectors | Object storage + vector index | Fully managed, serverless | Medium | Massive (exabyte) | Low-cost, infrequent-access, large-scale archival RAG | Yes — attach key-value metadata to each vector at ingest; pass a filter expression in the QueryVectors API call to narrow candidates before ANN search |
Bedrock Knowledge Bases Integration
Knowledge Bases for Amazon Bedrock provide native integration with supported vector stores to eliminate custom RAG workflow development, reduce operational overhead, and simplify connecting enterprise data to foundation models.
Bedrock Knowledge Bases is a managed RAG orchestration layer that sits on top of those same vector stores. You point it at an S3 bucket or other data source and a supported vector store, and Bedrock handles:
- automatic chunking & embedding (You can chose from a variety of embedding models, and configure chunk size and overlap to optimize for your data and use case)
- syncing new documents
- retrieval + reranking
- injecting context into the prompt before calling the FM
We ofte use retrieve api and retrieve-and-generate API to connect Bedrock models to the knowledge base. The retrieve-and-generate API allows us to pass retrieved context directly into the model prompt and get a response in a single call, while the retrieve API gives us more control by allowing us to handle the retrieved context separately before invoking the model.
Bedrock Flows
Amazon Bedrock Flows is a low-code orchestration feature for building generative AI workflows, which supports integration of menaged prompts, guardrails, foundation models, and custom logic to streamline end-to-end generative AI application development.
Streaming
Amazon Bedrock Streaming API (The InvokeModelWithResponseStream API) enables incremental delivery of foundation model generated content as it is produced, rather than waiting for the full response to be complete.
Amazon API Gateway WebSocket APIs provide persistent bidirectional communication between clients and backend services, supporting low-latency delivery of incremental updates without the overhead of stateless HTTP polling. This is the recommended API type for long-run real-time interactions with generative AI applications.
A classic architecture pattern for streaming GenAI applications is to use API Gateway WebSocket APIs with an AWS Lambda integration. Configure the WebSocket API to invoke the Amazon Bedrock InvokeModelWith ResponseStream API and stream partial respones through WeSocket connections.
sequenceDiagram
participant Client
participant APIGW as API Gateway<br/>WebSocket API
participant Lambda
participant Bedrock as Amazon Bedrock<br/>InvokeModelWithResponseStream
Client->>APIGW: WebSocket connect ($connect)
APIGW->>Lambda: invoke $connect handler
Lambda-->>APIGW: 200 OK (connection established)
APIGW-->>Client: connection ID assigned
Client->>APIGW: send message (prompt)
APIGW->>Lambda: invoke $default handler<br/>+ connectionId + prompt
Lambda->>Bedrock: InvokeModelWithResponseStream(prompt)
loop streaming chunks
Bedrock-->>Lambda: chunk 1 "The answer..."
Lambda->>APIGW: PostToConnection(connectionId, chunk 1)
APIGW-->>Client: chunk 1
Bedrock-->>Lambda: chunk 2 " is 42..."
Lambda->>APIGW: PostToConnection(connectionId, chunk 2)
APIGW-->>Client: chunk 2
end
Bedrock-->>Lambda: stream complete
Lambda->>APIGW: PostToConnection(connectionId, [DONE])
APIGW-->>Client: [DONE]
Client->>APIGW: WebSocket disconnect ($disconnect)
Evaluation
For regulated, high-risk use cases like clinical decision support, a hybrid evaluation approch combining automated LLM-as-judge screening and targeted human review delivers the optimal balance of accuracy, cost efficiency, and compliance as fully automated evaluation may miss critical edge cases while full human review is prohibitively expensive.
Amazon bedock provides pre-configured, purpose-built evaluation capabilities for RAG workflows, including metrics for retrieval precission, response relevance, and hallucination detection, which can be used to monitor and optimize the performance of generative AI applications over time.
Features
SageMaker Feature store is a centralized repository for storing, sharing, and reusing ML features. It solves the problem of teams recomputing the same features independently and ensures training/serving consistency. It has online and offline feature stores, supports feature versioning, and integrates with SageMaker training and inference services.
The feature store solves the train/serve skew — if your training pipeline computes user_avg_spend_30d differently than your inference service, your model performs well offline but poorly in production. A feature store enforces a single computation that both use.
Audit and Compliance
Aws CloudTrail
CloudTrail provides comprehensive logging of all API calls, including those to Amazon Bedrock and related services. It can log all API actions for Amazon bedrock, including prompt creation, modification, access, and usage, to meet audit trail requirements for generative AI governance.
IAM
Implementing least-privilege IAM policies to enforce approval workflows, restrict prompt modification and publishing to authorized approver roles, and enforce consistent quality standards for GenAI content.
Architectural for Generative AI Applications
The API Gateway REST API
- Auth at the edge — IAM, Cognito, or API keys before any Lambda invocation, so the Lambda doesn’t handle auth logic
- Throttling — prevents Lambda from being overwhelmed if traffic spikes, which matters when each invocation calls a Bedrock FM (expensive + has its own rate limits)
- Stable public endpoint — the client always hits the same URL regardless of what’s behind it; you can swap Lambda versions, add canary deployments, or change routing logic without clients noticing
- Request validation — reject malformed payloads before they reach Lambda, reducing unnecessary Bedrock calls
VPC
Interface VPC endpoints for Amazon bedrock enable private access to bedrock runtime and control plane APIs without routing traffic over the public internet, this is important for processing sensitive data.
Lambda functions deployed to private VPC subnets can access managed AWS services via VPC endpoints without requiring public internet access, which is a standard architecture pattern for secure, compliant GenAI applications. However, we need to ensure that the Lambda execution role has the necessary permissions to use the VPC endpoints and that the security groups and network ACLs (Access control list: stateless firewall rules at the subnet level in a VPC) are configured to allow traffic between the Lambda functions and the endpoints.
flowchart TB
Internet([Internet]) --> IGW
subgraph VPC["VPC (10.0.0.0/16)"]
IGW[Internet Gateway]
subgraph PublicSubnet["Public Subnet (10.0.1.0/24)"]
NATGW[NAT Gateway]
ALB[Application Load Balancer]
end
subgraph PrivateSubnet["Private Subnet (10.0.2.0/24)"]
Lambda[Lambda Function]
ECS[ECS Container]
end
subgraph DataSubnet["Data Subnet (10.0.3.0/24)"]
RDS[(Aurora PostgreSQL\npgvector)]
OpenSearch[(OpenSearch\nServerless)]
end
subgraph VPCEndpoints["VPC Endpoints (PrivateLink)"]
EP_S3[S3 Gateway Endpoint]
EP_Bedrock[Bedrock Interface Endpoint]
EP_SM[Secrets Manager Endpoint]
end
IGW --> ALB
ALB --> Lambda
ALB --> ECS
Lambda -- private traffic --> RDS
Lambda -- private traffic --> OpenSearch
ECS -- private traffic --> RDS
Lambda -- no internet needed --> EP_Bedrock
Lambda -- no internet needed --> EP_S3
Lambda -- fetch secrets --> EP_SM
Lambda -- outbound to internet --> NATGW
ECS -- outbound to internet --> NATGW
NATGW --> IGW
end
EP_Bedrock --> Bedrock[Amazon Bedrock]
EP_S3 --> S3[Amazon S3]
EP_SM --> SecretsManager[Secrets Manager]
- Public subnet — ALB and NAT Gateway have public IPs; these are the only resources exposed to the internet
- Private subnet — Lambda and ECS have no public IPs; inbound only via ALB, outbound via NAT Gateway
- Data subnet — RDS and OpenSearch are fully isolated, reachable only from within the VPC
- VPC Endpoints (PrivateLink) — Lambda calls Bedrock, S3, and Secrets Manager without traffic ever leaving AWS’s network, no NAT Gateway needed for those calls
- NAT Gateway — allows private resources to initiate outbound internet connections (e.g. downloading packages) without being publicly reachable
VPC endpoint physically is an ENI (Elastic Network Interface) — a private IP address — that AWS creates inside your subnet. When Lambda calls bedrock-runtime.amazonaws.com, DNS resolves that hostname to the private IP of the ENI instead of the public IP. So the request goes through your VPC’s private network directly to Bedrock.
Without endpoint: bedrock-runtime.us-east-1.amazonaws.com → 52.x.x.x (public) With endpoint: bedrock-runtime.us-east-1.amazonaws.com → 10.0.2.x (your private subnet)
and the private IP will use AWS PrivateLink technology to securely route the request to the Bedrock service without exposing it to the public internet.
LF-tagging
AWS Lake formation is a managed service for building, securing, and managing a data lake on s3. Without Lake Formation, securing a data lake means managing S3 bucket policies, IAM policies, Glue catalog permissions, and Athena/Redshift access separately — it becomes a mess at scale. Lake Formation centralizes all of that. Lake Formation LF-tag based access controal is purpose-built for fine-grained, cross-account permissions at the table and column level for governed data lakes across multiple AWS accounts.