Principale of Least Privilege

We start with broad permissions while developing and gradually narrow down to the minimum permissions needed for the application to function properly. We grant only the permissions required to perform a task. For example, if an application only needs to read data from an S3 bucket, we should not grant it permissions to write or delete objects in that bucket and we can narrow down the path or prefix to further restrict access.

We can use IAM access Analyzer to generate least-privilege policies based on access activity.

Data Masking

Data masking and Anonymization are techniques used to protect sensitive data by replacing it with fictional or scrambled data. For example, dealing wit PII or other sensitive data.

We can do things like masking obfuscates data: masking all the last 4 digits of a credit card or security number, masking passwords. There are supported in Glue DataBrew and Redshift.

We can also use anonymization to replace sensitive data with random values that have no meaningful relationship to the original data, such as replacing names with random strings or using a hash function to generate a unique identifier for each record, or encrypt it with deterministic or probabilistic encryption. Or we can just remove the sensitive data if it is not needed for the analysis.

SageMaker Security

  1. Use IAM to set up user accounts with only the persmissions they need.
  2. Use MFA
  3. Use SS/TLS when connecting to anything.
  4. Use CloudTrail to log API and user activity.
  5. Use encryption for data at rest and in transit. SageMaker supports encryption for data stored in S3, EBS volumes, and RDS databases. We can also use KMS to manage encryption keys.
  6. Be careful with PII.

Data protection

Data at rest

  • KMS: Amazon key management service.

    It is accepted by notebooks and all SageMaker jobs. We can use KMS to create and manage encryption keys for data at rest for training, tuning, batch transform and endpoints.

    The notebooks and everything under /opt/ml/ and /tmp can be encrypted with a kms key.

  • S3: We can use encrypted S3 buckets for training data and hosting models and S3 can also use KMS

Data in transit

  • All traffic supports TLS/SSL encryption.
  • IAM roles are assigned to SageMaker to give it permissions to access resources.
  • Inter-node training communication may be optionally encrypted. It can increase training time and cost with deep learning. It is called inter-container traffic encryption and it is enabled via console or API when setting up a training or tuning job.

Sagemaker and VPC

To enhance security, we can set up SageMaker to run within a VPC (Virtual Private Cloud). We can use a private VPC for even more security and we will need to setup S3 VPC endpoints and custom endpoint policies and S3 bucket policies can keep this secure.

The notebooks are internet-enabled by default and this can be a security hole. If it is disabled, the VPC needs an interface endpoint (PrivateLink) or NAT Gateway, and allow outbound connections, for training and hosting to work.

Traning and Inference containers ar also Internet-enabled by default and the network isolationis an option, but this also prevents S3 access.

Sagemaker and IAM

We can setup permissions for: CreateTrainingJobs, CreateModel, CreateEndpointConfig, CreateTransformJob, CreateHyperParameterTuningJob, CreateNotebookInstance, UpdateNotebookInstance.

There are also predefined policies: AmazonSageMakerReadOnly, AmazonSageMakerFullAccess, AdministratorAccess, DataScientist.

Logging and Monitoring

CouldWatch can log, monitor and alarm on:

  • Invocations and latency of endpoints.
  • Health of instance nodes (CPU, memory, etc)
  • Ground Truth (active human being workers, how much they are doing on my labeling jobs, etc)

CloudTrail records actions from users, roles, and services within SageMaker. The Log files contain these information are delivered to S3 for auditing purposes.

IAM

IAM (Identity and Access Management) is a service that allows us to manage access to AWS resources securely. It is a Global service.

Users: When we created an AWS account, we created a root account and it shouldn’t be used or shared for everyday tasks. We should instead create Users and they are people within our organization and they can be grouped.

Groups: A group can contain only users and a user can belong to multiple groups. It is also possible the a user is not in any group.

IAM Permissions

Users or Groups can be assigned JSON documents called policies. A policy is a document that defines permissions for an action on a resource. It consists of one or more statements, and each statement includes an effect (allow or deny), an action (the specific AWS service operation), and a resource (the AWS resource to which the action applies).

IAM Policies

IAM policies inherit from group to user. I user not attached to any group can have inline policies directly attached to it. For users who are in two groups, the permissions from both groups are combined. If there is a conflict between policies (e.g., one policy allows an action while another policy denies it), the explicit deny will take precedence over any allow.

A policy consist of:

  • Version: The version of the policy language. The current version is “2012-10-17”.
  • id: An identifier for the policy (optional).
  • statement: One or more individual statements (required).

A statement consist of:

  • Sid: An identifier for the statement (optional).
  • Effect: Whether the statement allows or denies access (required). It can be “Allow” or “Deny”.
  • Principal: Account, User, Role to which this policy applied to.
  • Action: List of actions this policy allows or denies.
  • Resource: List of resources to which the actions applied to.
  • Condition: Optional conditions for when the policy is in effect.

IAM Password Policy

Strong password policies can be enforced for IAM users. We can set requirements for password length, complexity, and rotation. This helps to enhance the security of user accounts and prevent unauthorized access.

We can allow IAM users to change their own passwords, and we can also require users to reset their passwords after a certain period of time. This can help to ensure that passwords are regularly updated and reduce the risk of compromised accounts.

Multi Factor Authentication (MFA) can be enabled for IAM users to provide an additional layer of security. With MFA, users are required to provide a second form of authentication (e.g., a code from a mobile app or a hardware token) in addition to their password when signing in. This helps to protect against unauthorized access even if a user’s password is compromised.

IAM Roles

Some AWS services will need to perform actions on our behalf. To do so, we will assign permissions to AWS services with IAM Roles.

From example if we want our EC2 instance to access S3 buckets, we can create an IAM role with the necessary permissions and attach it to the EC2 instance. This way, the EC2 instance can access the S3 buckets without needing to store AWS credentials on the instance itself, which enhances security.

Encryption

We have encrytion in flight like TLS/SSL and encryption at rest with KMS and S3. We can also use client-side encryption to encrypt data before it is sent to AWS services, providing an additional layer of security.

TLS certificates help with encryption (HTTPs) and encryption in flight ensures no MITM (man in the middle attack) can happen. For a login process:

  1. The client initiates a connection to the server and send the username and password with TLS encryption on the client side.
  2. The encrypted data is transmitted securely to the server over the network.
  3. The server receives the encrypted data and decrypts it using its private key (TLS Decryption). Then it verifies the username and password against the stored credentials.

The server-side encryption (SSE) is used to encrypt data at rest. When we upload data to S3, we can specify that it should be encrypted using SSE. S3 will then automatically encrypt the data before storing it and decrypt it when we access it. Data is decrypted before being sent and it is stored in an encrypted form thanks to a key (usually a data key). The encryption and decryption keys must be managed somewhere, and the server must ahve access to it.

Client-side encryption is when we encrypt data on the client side before sending it to AWS services. This way, the data is encrypted end-to-end and only the client has access to the encryption keys. The server should not be able to decrypt the data. We could leverage Envelope Encryption.

AWS KMS

This is the main service for managing encryption keys in AWS. It is fully integrated with IAM for authorization and we can use CloudTrail to audit KMS key usage. It is seamlessly integrated with most AWS services.

We should never store secrets in plaintext, especially in code. The KMS key encryption is also available through API calls (SDK, CLI, etc) and encrypted secrets can be stored inthe code or in environment variables, and then decrypted at runtime using the KMS API.

There is symmetric (AES-256 keys) and asymmetric (RSA and ECC keys) encryption. Symmetric encryption uses the same key for encryption and decryption, while asymmetric encryption uses a pair of keys (public and private) for encryption and decryption.

AWS services that are integrated with KMS use Symmetric CMKs and we can never get access to the KMS key unencrypted, we must call the KMS API to encrypt and decrypt data.

The asymmetric keys are used for Encrypt and decrypt operations. The public key is downloadable and we can not access the Private key unencrypted. The use case is the encryption outside of AWS by users who can’t call the KMS API. So the client can encrypt data with the public key and then upload the encrypted data to AWS, and then we can use the KMS API to decrypt it with the private key.

Types of KMS keys

  • AWS owned keys (free): SSE-Ss, SSE-SQS, SSE-DDB.
  • ASW managed keys (free):aws/service-name: aws/rds,aws/ebs.
  • Customer keys created in KMS: 1 dollar/month.
  • Customer managed keys imported: 1 dollar/month.
  • API call to KMS: 0.03 dollar per 10,000 requests.

Automatic key rotation:

  • AWS-managed keys: automatically rotated every 1 years.
  • Customer-managed KMS key: automatic & on-demand (must be enabled).
  • Imported KMS key: only manual rotation possible using alias.

KMS keys are per region and so the same data replicated across regions will be encrypted with different keys.

KMS key policies

Control access to KMS keys is “similar” to S3 bucket policies. The different is that if we do not have a policy on kms key then no one ca use it.

Default LMS key policy: Created if we don’t provide a specific KMS key policy. This give the complete access to the key to the root user which is the entire AWS account. It is recommended to create a custom KMS key policy that grants access only to specific IAM users or roles that need to use the key, and to follow the principle of least privilege when granting permissions. This can also allow the cross-account access to the KMS key if needed.

Copying Snapshots across regions

When we copy snapshots across regions, we follow these steps:

  1. Create a snapshot, encrpted with our own KMS key (Customer managed key).
  2. Attach a KMS key policy to authorize cross-account access.
  3. Share the encrypted snapshot.
  4. Create a copy of the Snapshot, encrypt it with a different CMK in our account.
  5. Create a volume from the snapshot.

Macie

Amazon Macie is a security service that uses machine learning to automatically discover, classify, and protect sensitive data in AWS. It can identify and classify sensitive data such as personally identifiable information (PII), financial data, and intellectual property. Macie can also monitor data access patterns and provide alerts for suspicious activity. It is integrated with S3 and can be used to protect data stored in S3 buckets.

When there is an anomaly detected, Macie can notify us through Amazn EventBridge, and we can set up rules to trigger actions such as sending an email notification or invoking a Lambda function to investigate the issue further.

AWS Secrets Manager

It meant for storing secrets and it can force rotation of secrets every X days. It is integrated well with other AWS services. It replicate secrets across multiple AWS Regions and Secrets Manager keeps read replicas in sync with the primary secret.

AWS WAF

AWS WAF (Web Application Firewall) is a security service that helps protect web applications from common web exploits (Layer 7: http) and attacks. It allows us to create custom rules to block or allow traffic based on specific conditions, such as IP addresses, HTTP headers, or request patterns. AWS WAF can be used to protect applications hosted on Amazon CloudFront, Application Load Balancer, API Gateway, and AWS App Runner.

It can be deploy on: Application Load Balancer, Amazon CloudFront, API Gateway, AppSync GraphQL API, Cognito User Pool.

We can define Web ACL (Web Access Control List) Rules:

  • IP Set: up to 10,000 IP addresses - use multiple Rules for more IPs.
  • filter by HTTP header, http body, method, query string protectios from common attack - SQL injection and Cross-site scripting (XSS).
  • Size constraints, geographic match, rate-based rules (e.g., block IPs that make more than 100 requests in 5 minutes) for DDoS protection.
  • Web ACL are Regional except for CloudFront which is global.
  • A rule group is a reusable set of rules that yu can add to a web ACL.

How to get fixed IP while using WAF with a Load Balancer? WAF does not support the Network Load Balancer (Layer 4). We need to use an Application Load Balancer (Layer 7) and then we can get a fixed IP address by using AWS Global Accelerator for fixed IP and WAF on the ALB.

AWS Shield

AWS Shield is a managed Distributed Denial of Service (DDoS) protection service that safeguards applications running on AWS.

AWS shield standard is a free service that is activated for every AWS customer and provicdes protection form attacks such SYN/UDP fllods, reflection attacks and otehr layer 3 and layer 4 attacks.

VPC and Subnet Primer

A VPC (Virtual Private Cloud) is a virtual network that we can create in AWS an ti is a regional resoruce. It is logically isolated from other virtual networks in the AWS cloud. We can define our own IP address range, create subnets, and configure route tables and network gateways.

Subnet allow us to partition our network inside our VPC. It is an available Zone resource. A public subnet is a subnet that is accessible from the internet while a private subnet is a subnet that is not accessible from the internet.

To define access to the internet and between subnets, we use Route Tables.

Internet Gateway and NAT Gateway

Internet Gateways helps our VPC instances connect with the internet, the public Subnets will have a route to the internet gateway. In this way the public subnet can access the internet and be accessed from the internet.

If we want to private subnet to access the internet, but we don’t want to expose the instances in the private subnet to the internet, we can use a NAT Gateway (AWS-managed) or NAT instance (self-managed). The private subnet will have a route to the NAT Gateway and the NAT Gateway will have a route to the Internet Gateway. In this way, instances in the private subnet can access the internet but cannot be accessed from the internet.

Network ACL and Security Groups

NACL (Network ACL) is a firewall which control traffic from and to subnet. It can have allow and deny rules. We can attach a NACL at Subnet level and rules only include IP address. It is stateless.

Security Group is a firewall that control traffic to and from an ENI an ECS instance and can only allow rules. The rules include IP addresses and other security groups. It is stateful.

VPC flow logs can capture information about IP traffic into the interfaces and helps to moonitor and torubleshoot connectivity issues. It captures network information from aws managed interfaces too: Elastic load balancers, ElasticCache, RDS, Aurora, etc.

VPC flow logs data can go to S3, Cloudwatch Logs, and Kinesis Data Firehose.

VPC Peering

  • We can connect two VPCs privately using AWS’s network that makes them behave as if they were in teh same network.
  • There must not be any overlapping CIDR blocks between the two VPCs.
  • VPC Peering connection is not transitive. If VPC A is peered with VPC B, and VPC B is peered with VPC C, VPC A does not have access to VPC C unless there is a separate peering connection between A and C.

VPC Endpoints

  • Endpoint allow us to connect to AWS Services using a private network instead of the public www network. It provide private access to AWS services within VPC.
  • This give us enhanced security and lower latency to access AWS services.
  • VPC endpoint Gateway: Only for S3 and DynamoDB, it is a gateway that we add to our route table.
  • VPC endpoint Interface: for most other AWS services, it is an elastic network interface (ENI) with a private IP address that serves as an entry point for traffic destined to the AWS service.

Site to site VPN and Direct Connect

  • Site-to-Site VPN: It allows us to securely connect our on-premises network to our AWS VPC over the public internet using IPsec VPN tunnels. The connection is automatically encrypted.
  • Direct Connect: It is a dedicated network connection between our on-premises data center and AWS. It estalish a physical connection to AWS and it can provide more consistent network performance and lower latency compared to a VPN connection. It goes over a private network and it can tak a month to establish.

AWS PrivateLink

This the most secure and scalable way to expose a service to 1000s of VPCs. It does not require VPC peering, internet gateway, NAT, route tables, etc. It requires a network load balancer (Service VPC) and ENI (Customer VPC).

In the world of AWS, an ENI (Elastic Network Interface) is essentially a virtual network card that we can attach to an EC2 instance. Just like a physical server needs an Ethernet port to connect to a network, our virtual server needs an ENI to communicate with other instances, the internet, or our local database.