Dynamic Configuration

  • AWS AppCongif is the recommended manged service for manging dynamic routing rules, feature flags, compliance policies, and cost thresholds for generatiev AI applications, as it supports zero-code deployments fo configuration changes, immediate propagation, and validation guardrails for high-risk updates.

Multi-FM Routing Architecture

For amazon Bedrock, Centralized routing layers taht pull dynamic configuration from a managed service enable organizations to switch between FMs (foundation models), run A/B tests, enforce regional compliance, and optimize cost without modifying client integrations or core application code.

flowchart LR

%% Client layer
A[Client Application] --> B[Amazon API Gateway REST API]

%% API Gateway to Lambda
B --> C[Lambda Router Function]

%% AppConfig integration
C --> D[AWS AppConfig<br/>FM Selection Config]

%% Decision logic
D --> E{Model Routing Logic}

%% Foundation models (Bedrock)
E -->|chat| F1[Amazon Bedrock<br/>Claude]
E -->|summarize| F2[Amazon Bedrock<br/>Nova]
E -->|code| F3[Amazon Bedrock<br/>Llama]

%% Responses
F1 --> G[Response]
F2 --> G
F3 --> G

%% Return path
G --> B
B --> A

Serverless Practices:

  1. Using Lambda with the AppConfig Agent layer provides low-latency, high-concurency support for real-time generative AI use casees while reducing operational overhead, as the agent handles configuration caching and updates automatically without custom code.

Bedrock Guardrails

There are 5 types of guardrails that can be implemented with Amazon Bedrock to ensure responsible and compliant use of generative AI models:

Guardrail TypePurposeWhat it BlocksExample Action
Content FiltersDetect harmful/sensitive contentconfigure filter strength for Hate, Insults, Sexual, Violence, Misconduct, Prompt attackRefuse or rewrite response
Denied TopicsBusiness policy enforcementFinance advice, medical diagnosis, internal data, we can add topics and their definitionHard refusal
PII RedactionProtect personal dataEmails, phone numbers, IDs, addresses, we can add the type or using regexMask or remove data
Word FiltersWe can specify the words to filterOffensive language, slurs, company name, we can add them or upload from filesRefuse or rewrite response
Grounding ControlPrevent hallucinationUnsupported facts not in context, we can set the grounding score threshold and relevence threshold for thatRefuse or restrict to sources

Guardrail can be applied on input prompts or output responses. It can be used with Bedrock invocation like IncokeModel/Converse or Independent Evaluation ApplyGuardrail API. When a guardrail is triggered, the system can either refuse to process the request, sanitize the input/output, or provide a warning message or masked response, depending on the severity and type of violation.

Prompt Engineering Best Practices

Amazon bedrock Prompt Management enables centralized, versioned management of prompt templates and use case-specific variant across teams and access controls for approval workflows, eliminating the need for custom prompt orchestration infrastructure. We can test directly with the FM models and iterate on prompt design in a collaborative environment, ensuring consistency and quality across applications.

Rag Pipeline

Vector Store Options for RAG

There are several service that we can use as a vector store for RAG (retrieval augmented generation) applications. We need to choose the storage options based on scale, latency, operational overhead, and integration capabilities for generative AI RAG workloads. This includes understanding the trade-offs between fully manage serverless options like OpenSearch Serverless, self-managed relational options like Aurora PostgreSQL with pgvector, specialty stores like Neptune Analytics, and object storage-based options like S3 Vectors.

OptionTypeManagementLatencyScaleBest ForMetadata Filtering
OpenSearch ServerlessPurpose-built vector + full-textFully managed, serverlessLow (ms)Auto-scalesGeneral RAG, hybrid search (vector + keyword)Yes — use filter clause in the KNN query (e.g. {"term": {"metadata.field": "value"}}) to pre-filter documents before vector scoring
Aurora PostgreSQL + pgvectorRelational + vector extensionSelf-managed (RDS/Aurora)Low–mediumVertical + read replicasApps already on PostgreSQL, structured + vector queriesYes — standard SQL WHERE clause on any column alongside the <=> vector distance operator (e.g. WHERE category = 'finance' ORDER BY embedding <=> $1)
Neptune AnalyticsGraph + vectorFully managedLowManaged clusterGraph-based RAG, entity relationship queriesYes — filter by graph properties in openCypher queries before or after vector similarity (e.g. WHERE n.type = 'document')
S3 VectorsObject storage + vector indexFully managed, serverlessMediumMassive (exabyte)Low-cost, infrequent-access, large-scale archival RAGYes — attach key-value metadata to each vector at ingest; pass a filter expression in the QueryVectors API call to narrow candidates before ANN search

Bedrock Knowledge Bases Integration

Knowledge Bases for Amazon Bedrock provide native integration with supported vector stores to eliminate custom RAG workflow development, reduce operational overhead, and simplify connecting enterprise data to foundation models.

Bedrock Knowledge Bases is a managed RAG orchestration layer that sits on top of those same vector stores. You point it at an S3 bucket or other data source and a supported vector store, and Bedrock handles:

  • automatic chunking & embedding (You can chose from a variety of embedding models, and configure chunk size and overlap to optimize for your data and use case)
  • syncing new documents
  • retrieval + reranking
  • injecting context into the prompt before calling the FM

We ofte use retrieve api and retrieve-and-generate API to connect Bedrock models to the knowledge base. The retrieve-and-generate API allows us to pass retrieved context directly into the model prompt and get a response in a single call, while the retrieve API gives us more control by allowing us to handle the retrieved context separately before invoking the model.

Bedrock Flows

Amazon Bedrock Flows is a low-code orchestration feature for building generative AI workflows, which supports integration of menaged prompts, guardrails, foundation models, and custom logic to streamline end-to-end generative AI application development.

Streaming

Amazon Bedrock Streaming API (The InvokeModelWithResponseStream API) enables incremental delivery of foundation model generated content as it is produced, rather than waiting for the full response to be complete.

Amazon API Gateway WebSocket APIs provide persistent bidirectional communication between clients and backend services, supporting low-latency delivery of incremental updates without the overhead of stateless HTTP polling. This is the recommended API type for long-run real-time interactions with generative AI applications.

A classic architecture pattern for streaming GenAI applications is to use API Gateway WebSocket APIs with an AWS Lambda integration. Configure the WebSocket API to invoke the Amazon Bedrock InvokeModelWith ResponseStream API and stream partial respones through WeSocket connections.

sequenceDiagram
    participant Client
    participant APIGW as API Gateway<br/>WebSocket API
    participant Lambda
    participant Bedrock as Amazon Bedrock<br/>InvokeModelWithResponseStream

    Client->>APIGW: WebSocket connect ($connect)
    APIGW->>Lambda: invoke $connect handler
    Lambda-->>APIGW: 200 OK (connection established)
    APIGW-->>Client: connection ID assigned

    Client->>APIGW: send message (prompt)
    APIGW->>Lambda: invoke $default handler<br/>+ connectionId + prompt

    Lambda->>Bedrock: InvokeModelWithResponseStream(prompt)

    loop streaming chunks
        Bedrock-->>Lambda: chunk 1 "The answer..."
        Lambda->>APIGW: PostToConnection(connectionId, chunk 1)
        APIGW-->>Client: chunk 1

        Bedrock-->>Lambda: chunk 2 " is 42..."
        Lambda->>APIGW: PostToConnection(connectionId, chunk 2)
        APIGW-->>Client: chunk 2
    end

    Bedrock-->>Lambda: stream complete
    Lambda->>APIGW: PostToConnection(connectionId, [DONE])
    APIGW-->>Client: [DONE]
    Client->>APIGW: WebSocket disconnect ($disconnect)

Evaluation

For regulated, high-risk use cases like clinical decision support, a hybrid evaluation approch combining automated LLM-as-judge screening and targeted human review delivers the optimal balance of accuracy, cost efficiency, and compliance as fully automated evaluation may miss critical edge cases while full human review is prohibitively expensive.

Amazon bedock provides pre-configured, purpose-built evaluation capabilities for RAG workflows, including metrics for retrieval precission, response relevance, and hallucination detection, which can be used to monitor and optimize the performance of generative AI applications over time.

Features

SageMaker Feature store is a centralized repository for storing, sharing, and reusing ML features. It solves the problem of teams recomputing the same features independently and ensures training/serving consistency. It has online and offline feature stores, supports feature versioning, and integrates with SageMaker training and inference services.

The feature store solves the train/serve skew — if your training pipeline computes user_avg_spend_30d differently than your inference service, your model performs well offline but poorly in production. A feature store enforces a single computation that both use.

Audit and Compliance

Aws CloudTrail

CloudTrail provides comprehensive logging of all API calls, including those to Amazon Bedrock and related services. It can log all API actions for Amazon bedrock, including prompt creation, modification, access, and usage, to meet audit trail requirements for generative AI governance.

IAM

Implementing least-privilege IAM policies to enforce approval workflows, restrict prompt modification and publishing to authorized approver roles, and enforce consistent quality standards for GenAI content.

Architectural for Generative AI Applications

The API Gateway REST API

  1. Auth at the edge — IAM, Cognito, or API keys before any Lambda invocation, so the Lambda doesn’t handle auth logic
  2. Throttling — prevents Lambda from being overwhelmed if traffic spikes, which matters when each invocation calls a Bedrock FM (expensive + has its own rate limits)
  3. Stable public endpoint — the client always hits the same URL regardless of what’s behind it; you can swap Lambda versions, add canary deployments, or change routing logic without clients noticing
  4. Request validation — reject malformed payloads before they reach Lambda, reducing unnecessary Bedrock calls

VPC

Interface VPC endpoints for Amazon bedrock enable private access to bedrock runtime and control plane APIs without routing traffic over the public internet, this is important for processing sensitive data.

Lambda functions deployed to private VPC subnets can access managed AWS services via VPC endpoints without requiring public internet access, which is a standard architecture pattern for secure, compliant GenAI applications. However, we need to ensure that the Lambda execution role has the necessary permissions to use the VPC endpoints and that the security groups and network ACLs (Access control list: stateless firewall rules at the subnet level in a VPC) are configured to allow traffic between the Lambda functions and the endpoints.

flowchart TB
    Internet([Internet]) --> IGW
    
    subgraph VPC["VPC (10.0.0.0/16)"]
        IGW[Internet Gateway]

        subgraph PublicSubnet["Public Subnet (10.0.1.0/24)"]
            NATGW[NAT Gateway]
            ALB[Application Load Balancer]
        end

        subgraph PrivateSubnet["Private Subnet (10.0.2.0/24)"]
            Lambda[Lambda Function]
            ECS[ECS Container]
        end

        subgraph DataSubnet["Data Subnet (10.0.3.0/24)"]
            RDS[(Aurora PostgreSQL\npgvector)]
            OpenSearch[(OpenSearch\nServerless)]
        end

        subgraph VPCEndpoints["VPC Endpoints (PrivateLink)"]
            EP_S3[S3 Gateway Endpoint]
            EP_Bedrock[Bedrock Interface Endpoint]
            EP_SM[Secrets Manager Endpoint]
        end

        IGW --> ALB
        ALB --> Lambda
        ALB --> ECS
        Lambda -- private traffic --> RDS
        Lambda -- private traffic --> OpenSearch
        ECS -- private traffic --> RDS

        Lambda -- no internet needed --> EP_Bedrock
        Lambda -- no internet needed --> EP_S3
        Lambda -- fetch secrets --> EP_SM

        Lambda -- outbound to internet --> NATGW
        ECS -- outbound to internet --> NATGW
        NATGW --> IGW
    end

    EP_Bedrock --> Bedrock[Amazon Bedrock]
    EP_S3 --> S3[Amazon S3]
    EP_SM --> SecretsManager[Secrets Manager]
  • Public subnet — ALB and NAT Gateway have public IPs; these are the only resources exposed to the internet
  • Private subnet — Lambda and ECS have no public IPs; inbound only via ALB, outbound via NAT Gateway
  • Data subnet — RDS and OpenSearch are fully isolated, reachable only from within the VPC
  • VPC Endpoints (PrivateLink) — Lambda calls Bedrock, S3, and Secrets Manager without traffic ever leaving AWS’s network, no NAT Gateway needed for those calls
  • NAT Gateway — allows private resources to initiate outbound internet connections (e.g. downloading packages) without being publicly reachable

VPC endpoint physically is an ENI (Elastic Network Interface) — a private IP address — that AWS creates inside your subnet. When Lambda calls bedrock-runtime.amazonaws.com, DNS resolves that hostname to the private IP of the ENI instead of the public IP. So the request goes through your VPC’s private network directly to Bedrock.

Without endpoint: bedrock-runtime.us-east-1.amazonaws.com → 52.x.x.x (public) With endpoint: bedrock-runtime.us-east-1.amazonaws.com → 10.0.2.x (your private subnet)

and the private IP will use AWS PrivateLink technology to securely route the request to the Bedrock service without exposing it to the public internet.

LF-tagging

AWS Lake formation is a managed service for building, securing, and managing a data lake on s3. Without Lake Formation, securing a data lake means managing S3 bucket policies, IAM policies, Glue catalog permissions, and Athena/Redshift access separately — it becomes a mess at scale. Lake Formation centralizes all of that. Lake Formation LF-tag based access controal is purpose-built for fine-grained, cross-account permissions at the table and column level for governed data lakes across multiple AWS accounts.