S3 Buckets

S3

Advertised as infinitely scaling storage. Many AWS services also use S3 as part of its service.

You can use S3 for:

Backup
Storage
Disaster recovery
Archive
Static website
Software delivery
Data lakes and big data analytics

The objects are stored into buckets (think of it as directories). Each bucket that you create must be globally unique (across all regions and accounts). However, the buckets are per region. No uppercase no underscore restrictions for naming buckets.

Each file stored into a bucket have a key, the key for S3 is the full path. However, S3 does not have directories concept! If you have folders then it will be named as prefixed. The object key contains prefixed (which can have folder path) + the actual name file itself.

The values that the key mapped to contain the content of the file itself, max at 5TB. If you're uploading a file more than 5GB must use multi-part upload.

S3 Bucket security

User-based rules: Attach IAM policies to specify which API call to S3 bucket the use can make to a user.
Resource-based: Have three types
1. Bucket policies: Bucket wide rules from the S3 console, this allow cross account to let other account from AWS to access it
2. Object access control list: finer grain
3. Bucket access control list: less common now

So basically you can attach policy to the user to give it permission to the S3 bucket. Or you attach the policy to the S3 to allow who is able to access it. You can also add IAM role to resources to give it permission so that say an EC2 instance can access it. You can also add cross account to grant it access to the bucket.

IAM principal (the who) can access and S3 object if the user IAM allows it or resource policy allows it and there is no explicit deny

You can also encrypt the object using encryption keys to add more security.

S3 static website hosting

S3 can be used for hosting static websites. The static website URL will be the bucket's public URL. In order to make this work you must configure the bucket to be publicly accessible.

S3 versioning

You can enable versioning for your files in S3. Any changes over the same key will create a version for that particular key. For example if you upload the same file twice it will just overwrite the file with version 2.

You can rollback to the previous version to prevent from accidental corruption or deletion. Files that aren't versioned have version null if you didn't enable it first.

S3 replication

Must enable version in the source and destination bucket that you are creation replication to. The buckets could be in other AWS account.

The replication occurs asynchronously.

There are two flavors:

CRR (Cross region replication): Compliance, lower latency access, replication across accounts
SRR (Same region replication): Log aggregation, live replication between production and test accounts

Only new objects are replicated, old objects won't be replicated you must use S3 batch replication to do so.
Deletions with version ID are not replicated. But deleting the entire object will be replicated since it only place a delete marker.

S3 storage classes

Durability: What are the chances of your object being lost. On average single object is loss once every 10,000 years if you store 10,000,000 objects. This durability is the same for all storage.

Availability: How readability available the S3 is. On average down 53 minutes per year.

General purpose

This is used for frequently accessed data. Use it for big data analytics, mobile and gaming application. No cost for retrieval, only for storage.

Infrequent access

For data that is less frequent accessed, but that needs rapid access sometimes.

Has lower cost than S3 standard but has cost when retrieving.

Standard-infrequent access: Use for disaster recovery and backups

One Zone-Infrequent Access: Very high durability in a single availability zone, but the data can be lost if the AZ is destroyed. Use it for storing secondary backup

Glacier

Low cost when storing for archiving and backup.

You will be paying for low storage price but high object retrieval cost

Glacier instant retrieval: Give you millisecond get time. Good for data accessed once a quarter.

Glacier flexible retrieval: Expedited (1-5 minutes), standard (3-5 hours), bulk (5-12 hours) to get data back

Glacier deep archive: Meant for long archive. Standard (12 hour), bulk (48 hours)

Intelligent-tiering

Let you move objects automatically based on access pattern. No retrieval charges. But you need to pay monitoring and auto-tiering fee.

Frequent access tier: default tier
Infrequent access tier: if obj hasn't been access for 30 days
Archive instant access tier: 90 days
Archive access tier: 90 days to 700+ days
Deep archive access tier: 180 to 700+ days