Cloud Storage

Overview

Cloud Storage is an object storage service designed to handle unstructured data. It stores and retrieves binary objects without analyzing their contents. It can mimic file system behavior, allowing you to copy files in and out like a typical file system.

Use cases include serving web content, storing archival or backup data, and distributing large files globally.

What is a Bucket?

Buckets are global containers for objects. Each bucket must have a unique name in the global namespace. Avoid sensitive data in bucket names.

Bucket Properties

Location (cannot be changed after creation)
- europe-north1
- asia
- nam4 (e.g., us-central1, us-east1)
Availability
- Single-region
- Multi-region
- Dual-region
Storage Class
- Standard (regional / multi-regional)
- Nearline
- Coldline

Example bucket name: declass

Encryption Options

GMEK – Google-managed encryption keys
Encryption at rest and in transit is always enabled. Uses two-level encryption: a data encryption key (DEK) and a key encryption key (KEK) stored in Cloud KMS. Automatic key rotation is handled by Google.
CMEK – Customer-managed encryption keys
You control and manage the KEK using Cloud KMS.
CSEK – Customer-supplied encryption keys
You provide the KEK directly. Google does not store the key.

Encryption vs. Data Locking

Encryption protects data confidentiality, while locking ensures data immutability. Locked objects cannot be modified or deleted before a set retention period.

What is an Object?

Objects are stored with metadata and are automatically replicated for durability. In multi-region buckets, objects are spread across multiple regions; in single-region buckets, they are replicated across zones.

Example object path: de/modules/O2/script.sh

Creating & Managing Buckets

Create a Bucket

gsutil mb -p $DEVSHELL_PROJECT_ID \ -c regional \ -l us-central1 \ gs://$DEVSHELL_PROJECT_ID-vcm/

Copy Files to Your Bucket

gsutil -m cp -r gs://cloud-training/automl-lab-clouds/* gs://$DEVSHELL_PROJECT_ID-vcm/

Load Data into BigQuery

gsutil -m cp ...

Listing Bucket Contents

List All Folders

gsutil ls gs://$DEVSHELL_PROJECT_ID-vcm/

List All Files in All Folders

gsutil ls gs://$DEVSHELL_PROJECT_ID-vcm/*

Cloud Storage vs. HDFS

Cloud Storage is an object store, while HDFS is a true distributed file system. Cloud Storage only simulates a directory structure.

mv gs://foo/bar gs://foo/bar2 simulates directory renaming by copying and deleting objects.

Simulated operations:
- List: gs://foo/bar
- Copy: gs://foo/bar/baz1, gs://foo/bar/baz2
- Delete: gs://foo/bar/baz1, gs://foo/bar/baz2
Modern output committers mitigate the rename limitation by optimizing write patterns.

Google Cloud Storage