S3 Buckets
S3
Advertised as infinitely scaling storage. Many AWS services also use S3 as part of its service.
You can use S3 for:
- Backup
- Storage
- Disaster recovery
- Archive
- Static website
- Software delivery
- Data lakes and big data analytics
The objects are stored into buckets (think of it as directories). Each bucket that you create must be globally unique (across all regions and accounts). However, the buckets are per region. No uppercase no underscore restrictions for naming buckets.
Each file stored into a bucket have a key, the key for S3 is the full path. However, S3 does not have directories concept! If you have folders then it will be named as prefixed. The object key contains prefixed (which can have folder path) + the actual name file itself.
The values that the key mapped to contain the content of the file itself, max at 5TB. If you're uploading a file more than 5GB must use multi-part upload.
S3 Bucket security
- User-based rules: Attach IAM policies to specify which API call to S3 bucket the use can make to a user.
- Resource-based: Have three types
- Bucket policies: Bucket wide rules from the S3 console, this allow cross account to let other account from AWS to access it
- Object access control list: finer grain
- Bucket access control list: less common now
So basically you can attach policy to the user to give it permission to the S3 bucket. Or you attach the policy to the S3 to allow who is able to access it. You can also add IAM role to resources to give it permission so that say an EC2 instance can access it. You can also add cross account to grant it access to the bucket.
IAM principal (the who) can access and S3 object if the user IAM allows it or resource policy allows it and there is no explicit deny
You can also encrypt the object using encryption keys to add more security.
S3 static website hosting
S3 can be used for hosting static websites. The static website URL will be the bucket's public URL. In order to make this work you must configure the bucket to be publicly accessible.
S3 versioning
You can enable versioning for your files in S3. Any changes over the same key will create a version for that particular key. For example if you upload the same file twice it will just overwrite the file with version 2.
You can rollback to the previous version to prevent from accidental corruption or deletion. Files that aren't versioned have version null if you didn't enable it first.
S3 replication
Must enable version in the source and destination bucket that you are creation replication to. The buckets could be in other AWS account.
The replication occurs asynchronously.
There are two flavors:
- CRR (Cross region replication): Compliance, lower latency access, replication across accounts
- SRR (Same region replication): Log aggregation, live replication between production and test accounts
Only new objects are replicated, old objects won't be replicated you must use S3 batch replication to do so.
Deletions with version ID are not replicated. But deleting the entire object will be replicated since it only place a delete marker.
S3 storage classes
Durability: What are the chances of your object being lost. On average single object is loss once every 10,000 years if you store 10,000,000 objects. This durability is the same for all storage.
Availability: How readability available the S3 is. On average down 53 minutes per year.
General purpose
This is used for frequently accessed data. Use it for big data analytics, mobile and gaming application. No cost for retrieval, only for storage.
Infrequent access
For data that is less frequent accessed, but that needs rapid access sometimes.
Has lower cost than S3 standard but has cost when retrieving.
Standard-infrequent access: Use for disaster recovery and backups
One Zone-Infrequent Access: Very high durability in a single availability zone, but the data can be lost if the AZ is destroyed. Use it for storing secondary backup
Glacier
Low cost when storing for archiving and backup.
You will be paying for low storage price but high object retrieval cost
Glacier instant retrieval: Give you millisecond get time. Good for data accessed once a quarter.
Glacier flexible retrieval: Expedited (1-5 minutes), standard (3-5 hours), bulk (5-12 hours) to get data back
Glacier deep archive: Meant for long archive. Standard (12 hour), bulk (48 hours)
Intelligent-tiering
Let you move objects automatically based on access pattern. No retrieval charges. But you need to pay monitoring and auto-tiering fee.
- Frequent access tier: default tier
- Infrequent access tier: if obj hasn't been access for 30 days
- Archive instant access tier: 90 days
- Archive access tier: 90 days to 700+ days
- Deep archive access tier: 180 to 700+ days