Google Cloud Storage

Overview

Cloud Storage is an object storage service designed to handle unstructured data. It stores and retrieves binary objects without analyzing their contents. It can mimic file system behavior, allowing you to copy files in and out like a typical file system.

Use cases include serving web content, storing archival or backup data, and distributing large files globally.

What is a Bucket?

Buckets are global containers for objects. Each bucket must have a unique name in the global namespace. Avoid sensitive data in bucket names.

Bucket Properties

Example bucket name: declass

Access Control

Access is managed via IAM roles and Access Control Lists (ACLs).

Encryption Options

Encryption vs. Data Locking

Encryption protects data confidentiality, while locking ensures data immutability. Locked objects cannot be modified or deleted before a set retention period.

What is an Object?

Objects are stored with metadata and are automatically replicated for durability. In multi-region buckets, objects are spread across multiple regions; in single-region buckets, they are replicated across zones.

Example object path: de/modules/O2/script.sh

Retention Policies & Locks

Creating & Managing Buckets

Create a Bucket

gsutil mb -p $DEVSHELL_PROJECT_ID \
-c regional \
-l us-central1 \
gs://$DEVSHELL_PROJECT_ID-vcm/

Copy Files to Your Bucket

gsutil -m cp -r gs://cloud-training/automl-lab-clouds/* gs://$DEVSHELL_PROJECT_ID-vcm/

Load Data into BigQuery

gsutil -m cp ...

Listing Bucket Contents

List All Folders

gsutil ls gs://$DEVSHELL_PROJECT_ID-vcm/

List All Files in All Folders

gsutil ls gs://$DEVSHELL_PROJECT_ID-vcm/*

Cloud Storage vs. HDFS

Cloud Storage is an object store, while HDFS is a true distributed file system. Cloud Storage only simulates a directory structure.

mv gs://foo/bar gs://foo/bar2 simulates directory renaming by copying and deleting objects.