Google Cloud Storage

From Luis Gallego Hurtado - Not Another IT guy
Jump to: navigation, search


Google Cloud Storage is a scalable, fully-managed, highly reliable, and cost-efficient object/blob store.

It stores unstructured data: sequence of bytes.

It is good for images, pictures, and videos, objects and blobs and unstructured data.

Features

  • It offers a single unified API
  • Optimized price/performance accross 4 storage classes with Object Lifecycle Management.
  • Access data instantly from any storage classes.
  • Durability 99,999999999% (9 nines).
  • Availability from 99% to 99,95% depending on location type.
  • Scalability up to exabyte of data with milliseconds responsivenes.
  • Data encryption at rest.
  • Per-project rate limit on bucket create/delete operations (1 operation every 2 seconds) but no limit on object create/delete operations.

Structure

Buckets

Data is stored as objects in Buckets.

Unlike directories, you cannot nest buckets. Directory is a flat namespace.

On creating a bucket, user specifies name, geographic location (including location type) and default storage class.

Location type can be a regional, dual-regional o multi-regional one (increasing availability).

Bucket names that contain dots must be valid domain names. They have to be unique within the entire Google Cloud Storage namespace.

Objects

They are stored in buckets and they can weight up to 5TB.

Objects have data and metadata.

There is no limit in number of objects in bucket.

Objects are immutable, but you can overwrite objects. A single object can only be overwritten up to once per second (violation returns 503 service unavailable error).

Storage Classes

Default storage class of a bucket can be modified, but that will only be applied to new Objects.

User can also specify storage class for each Object, on adding it to the bucket, and change it afterwards.

The available storage classes depend on location of bucket.

From more expensive $/GB to cheaper:

Standard

For data frequently accessed, like website content, streaming, interactive workloads or data supporting mobile and gaming applications.

No minimum storage duration.

Nearline

Low cost, highly durable storage service,

Storing backups and infrequently accessed data (less than once a month).

30 day minimum storage duration, with cost per data access.

Coldline

Very low-cost, highly durable storage service

Disaster recovery and very infrequently accessed data (less than once a quarter).

90 day minimum storage duration, with cost per data access, and higher per-operation cost.

Data is available within milliseconds.

Archive

Lowest cost, highly durable storage service.

Best for long-term digital preservation.

365 day minimum storage duration.

Pricing

Depending on storage class, it is based on following componentes:

  • Data storage: data stored in buckets. Storage fee.
  • Network usage: data moved between buckets.
  • Operations usage: actions over buckets. Access fee.
  • Retrieval and early deletion fees: fee for nearline, coldline and archive storage classes.

Consistency

Strong Consistency for following operations for data and metadata:

  • Read after write.
  • Read after metadata update.
  • Read after delete.
  • Bucket listing
  • Object listing
  • Granting access to resources.

Eventual Consistency Operations:

  • Revoking access from resources.

Availability

Standard storage class offers highest availability: 99,99% in single region and >99,99% in dual region and multiple regions.

Rest storage classes have 99,9% in single region and 99,95% in dual region and multiple regions.

Archive storage class is the only class with no availability SLA.

Object Versioning

When enabled, user can list archived versions of object, restore the live version or permanently delete an archived version.

Every archived version is given a generation number.

It can add cost, but it can be managed with Lifecycle control.

There are 2 properties to manage the version: version of object's data and version of object's metadata.

Object Lifecycle Management

Rules that apply to current and future objects in bucket.

Set of conditions trigger one rule, which contains one action. All conditions must be met.

Actions can be tracked with logs, and with Pub/Sub notifications for Cloud storage, enabled at bucket level.

Actions: Delete, SetStorageClass

Contiditions: Age, CreatedBefore, IsLive, MatchesStorageClass, NumberOfNewerVersions (limit number of versions stored if object versioning is enabled).

Retention Policy

A retention policy can be set to a bucket, so objects cannot be deleted nor overwritten before retention period is met.

It retroactively applies to all objects in bucket.

A retention policy can be locked so it is permanently set on a bucket, so a regular administrator cannot remove it nor reduce it (only increases are allowed), and bucket cannot be delete unless all objects within such bucket meet retention period. This is performed to comply regulations.

User can also place additional holds on objects, to prevent them from being deleted or overwritten.

Static Website Hosting

Buckets can be use to host static websites, or for hosting static assets from dynamic websites.

  • Static websites: name of bucket must be equal to CNAME record.
  • Static assets for dynamic websites: url is cloud.google.com and them bucket name.

The objects inside the bucket are webpages for your site. User can create a load balancer pointing to the bucket, and pointing domain to loadbalancer with an A record.

User can either make all files in your bucket public, or only some of them. In order to do so, assign "Storage Object Viewer" permissions on the bucket to "all users".

User can set properties for default page and error pages, e.g. properties MainPageSuffix and NotFoundPage. In order to do so, select "Edit Website configuration" option over the bucket.

Authorization

Identity and Access Management (IAM)

You can set them at project level, at bucket level and at object level.

Roles: owner, editor and viewer.

Based on Access Control Lists (ACLSs).

ACLs

  • Signed URLs: giving access to users temporarily, based on a signed and timed cryptographic key.
  • Signed Policy Document: it controls file upload policy.

Cloud Storage Notification

Object Change Notification

They can be used to notify an application via HTTPS when an object is added/updated/deleted in a bucket.

E.g. Add a new picture to a bucket, an application could be notified to create a thumbnail.

This will create a notification channel that sends notification events to the given application URL for the given bucket.

They are not recommended, since they are slower and more expensive than rest of notifications.

Cloud Pub/Sub notifications

Google Cloud Storage sends a message to a Google Cloud PubSub topic, with information about changes to objects in bucket, after adding a notification configuration rule to a bucket that specifies the topic, trigger events and notification details.

They require up to 30 seconds to start sending notifications, with at least once delivery.

Event types are finalize, metadata update, delete and archive.

Cloud Functions notifications

Bucket must reside in the same project as invoked function.

The function is invoked through a pub/sub notification.

Use Cases

  • Storing and streaming multimedia
  • Storage for custom data analytics pipelines
  • Archive, backup, and disaster recovery

Command Line Interface: gsutil

  • List buckets
gsutil ls
  • Create bucket
gsutils mb -p <project-name> -c <storage-class> -l <bucket-location> gs://<bucket-name>
  • Deleting bucket and all objects recursively
gsutil rm -r gs://<bucket-name>
  • Check bucket size
gsutil du -s gs://<bucket-name>
  • Display bucket's location and default storage class
gsutil ls -L -b gs://<bucket-name>
  • Changing default storage class of a bucket
gsutil defstorageclass set <storage-class> gs://<bucket-name>
  • Listing objects
gsutil ls -r gs://<bucket-name>
  • Check lifecycle configuration on a bucket
gsutil lifecycle get gs://<bucket-name>

Copying objects from/to bucket

  • Upload objects
gsutil cp <local-object-location> gs://<bucket-name>
  • Downloading objects
gsutil cp gs://<bucket-name>/<object-name> <object-destination>
  • Copy objects from bucket to bucket
gsutil cp -r gs://<source-bucket-name>/* gs://<destination-bucket-name>

Deleting objects

  • Deleting objects
gsutil rm gs://<bucket-name>/<object-name>
  • Deleting all objects
gsutil rm -r gs://<bucket-name>/**

Object Metadata

  • Display object metadata
gsutil ls -L gs://<bucket-name>/<object-name>
  • Updating object metadata
gsutils setmeta -h "<metadata-key>:<metadata-value>" gs://<bucket-name>/<object-name>

Adding notifications

  • Adding an object change notification. Create a notification channel that sends notification events to the given application URL for the given bucket
gsutil notification watchbucket <-i channel-id> <-t client-token> <application-url> gs://<bucket-name>
  • Adding a pub/sub notification
gsutil notification create -t <topic-name> -f json gs://<bucket-name>