Google Cloud Storage
Google Cloud Storage is a scalable, fully-managed, highly reliable, and cost-efficient object/blob store.
It stores unstructured data: sequence of bytes.
It is good for images, pictures, and videos, objects and blobs and unstructured data.
Features
- It offers a single unified API
- Optimized price/performance accross 4 storage classes with Object Lifecycle Management.
- Access data instantly from any storage classes.
- Durability 99,999999999% (9 nines).
- Availability from 99% to 99,95% depending on location type.
- Scalability up to exabyte of data with milliseconds responsivenes.
- Data encryption at rest.
- Per-project rate limit on bucket create/delete operations (1 operation every 2 seconds) but no limit on object create/delete operations.
Structure
Buckets
Data is stored as objects in Buckets.
Unlike directories, you cannot nest buckets. Directory is a flat namespace.
On creating a bucket, user specifies name, geographic location (including location type) and default storage class.
Location type can be a regional, dual-regional o multi-regional one (increasing availability).
Bucket names that contain dots must be valid domain names. They have to be unique within the entire Google Cloud Storage namespace.
Objects
They are stored in buckets and they can weight up to 5TB.
Objects have data and metadata.
There is no limit in number of objects in bucket.
Objects are immutable, but you can overwrite objects. A single object can only be overwritten up to once per second (violation returns 503 service unavailable error).
Storage Classes
Default storage class of a bucket can be modified, but that will only be applied to new Objects.
User can also specify storage class for each Object, on adding it to the bucket, and change it afterwards.
The available storage classes depend on location of bucket.
From more expensive $/GB to cheaper:
Standard
For data frequently accessed, like website content, streaming, interactive workloads or data supporting mobile and gaming applications.
No minimum storage duration.
Nearline
Low cost, highly durable storage service,
Storing backups and infrequently accessed data (less than once a month).
30 day minimum storage duration, with cost per data access.
Coldline
Very low-cost, highly durable storage service
Disaster recovery and very infrequently accessed data (less than once a quarter).
90 day minimum storage duration, with cost per data access, and higher per-operation cost.
Data is available within milliseconds.
Archive
Lowest cost, highly durable storage service.
Best for long-term digital preservation.
365 day minimum storage duration.
Pricing
Depending on storage class, it is based on following componentes:
- Data storage: data stored in buckets. Storage fee.
- Network usage: data moved between buckets.
- Operations usage: actions over buckets. Access fee.
- Retrieval and early deletion fees: fee for nearline, coldline and archive storage classes.
Consistency
Strong Consistency for following operations for data and metadata:
- Read after write.
- Read after metadata update.
- Read after delete.
- Bucket listing
- Object listing
- Granting access to resources.
Eventual Consistency Operations:
- Revoking access from resources.
Availability
Standard storage class offers highest availability: 99,99% in single region and >99,99% in dual region and multiple regions.
Rest storage classes have 99,9% in single region and 99,95% in dual region and multiple regions.
Archive storage class is the only class with no availability SLA.
Object Versioning
When enabled, user can list archived versions of object, restore the live version or permanently delete an archived version.
Every archived version is given a generation number.
It can add cost, but it can be managed with Lifecycle control.
There are 2 properties to manage the version: version of object's data and version of object's metadata.
Object Lifecycle Management
Rules that apply to current and future objects in bucket.
Set of conditions trigger one rule, which contains one action. All conditions must be met.
Actions can be tracked with logs, and with Pub/Sub notifications for Cloud storage, enabled at bucket level.
Actions: Delete, SetStorageClass
Contiditions: Age, CreatedBefore, IsLive, MatchesStorageClass, NumberOfNewerVersions (limit number of versions stored if object versioning is enabled).
Retention Policy
A retention policy can be set to a bucket, so objects cannot be deleted nor overwritten before retention period is met.
It retroactively applies to all objects in bucket.
A retention policy can be locked so it is permanently set on a bucket, so a regular administrator cannot remove it nor reduce it (only increases are allowed), and bucket cannot be delete unless all objects within such bucket meet retention period. This is performed to comply regulations.
User can also place additional holds on objects, to prevent them from being deleted or overwritten.
Static Website Hosting
Buckets can be use to host static websites, or for hosting static assets from dynamic websites.
- Static websites: name of bucket must be equal to CNAME record.
- Static assets for dynamic websites: url is cloud.google.com and them bucket name.
The objects inside the bucket are webpages for your site. User can create a load balancer pointing to the bucket, and pointing domain to loadbalancer with an A record.
User can either make all files in your bucket public, or only some of them. In order to do so, assign "Storage Object Viewer" permissions on the bucket to "all users".
User can set properties for default page and error pages, e.g. properties MainPageSuffix and NotFoundPage. In order to do so, select "Edit Website configuration" option over the bucket.
Authorization
Identity and Access Management (IAM)
You can set them at project level, at bucket level and at object level.
Roles: owner, editor and viewer.
Based on Access Control Lists (ACLSs).
ACLs
- Signed URLs: giving access to users temporarily, based on a signed and timed cryptographic key.
- Signed Policy Document: it controls file upload policy.
Cloud Storage Notification
Object Change Notification
They can be used to notify an application via HTTPS when an object is added/updated/deleted in a bucket.
E.g. Add a new picture to a bucket, an application could be notified to create a thumbnail.
This will create a notification channel that sends notification events to the given application URL for the given bucket.
They are not recommended, since they are slower and more expensive than rest of notifications.
Cloud Pub/Sub notifications
Google Cloud Storage sends a message to a Google Cloud PubSub topic, with information about changes to objects in bucket, after adding a notification configuration rule to a bucket that specifies the topic, trigger events and notification details.
They require up to 30 seconds to start sending notifications, with at least once delivery.
Event types are finalize, metadata update, delete and archive.
Cloud Functions notifications
Bucket must reside in the same project as invoked function.
The function is invoked through a pub/sub notification.
Use Cases
- Storing and streaming multimedia
- Storage for custom data analytics pipelines
- Archive, backup, and disaster recovery
Command Line Interface: gsutil
- List buckets
gsutil ls
- Create bucket
gsutils mb -p <project-name> -c <storage-class> -l <bucket-location> gs://<bucket-name>
- Deleting bucket and all objects recursively
gsutil rm -r gs://<bucket-name>
- Check bucket size
gsutil du -s gs://<bucket-name>
- Display bucket's location and default storage class
gsutil ls -L -b gs://<bucket-name>
- Changing default storage class of a bucket
gsutil defstorageclass set <storage-class> gs://<bucket-name>
- Listing objects
gsutil ls -r gs://<bucket-name>
- Check lifecycle configuration on a bucket
gsutil lifecycle get gs://<bucket-name>
Copying objects from/to bucket
- Upload objects
gsutil cp <local-object-location> gs://<bucket-name>
- Downloading objects
gsutil cp gs://<bucket-name>/<object-name> <object-destination>
- Copy objects from bucket to bucket
gsutil cp -r gs://<source-bucket-name>/* gs://<destination-bucket-name>
Deleting objects
- Deleting objects
gsutil rm gs://<bucket-name>/<object-name>
- Deleting all objects
gsutil rm -r gs://<bucket-name>/**
Object Metadata
- Display object metadata
gsutil ls -L gs://<bucket-name>/<object-name>
- Updating object metadata
gsutils setmeta -h "<metadata-key>:<metadata-value>" gs://<bucket-name>/<object-name>
Adding notifications
- Adding an object change notification. Create a notification channel that sends notification events to the given application URL for the given bucket
gsutil notification watchbucket <-i channel-id> <-t client-token> <application-url> gs://<bucket-name>
- Adding a pub/sub notification
gsutil notification create -t <topic-name> -f json gs://<bucket-name>