Data Type |
Raw, unstructured, semi-structured, and structured |
Structured and semi-structured |
Schema |
Schema-on-read |
Schema-on-write |
Purpose |
Store large volumes of raw data for analytics, ML, and big data use |
Optimized for querying and reporting with curated data |
Users |
Data engineers, scientists, ML developers |
Business analysts, BI users |
Performance |
Depends on external tools; slower for complex queries |
High performance for structured queries |
Data Quality |
Accepts all data, including unclean/raw data |
Requires clean, validated data with schema |
Cost |
Lower storage cost (especially in cloud) |
Higher due to compute and optimization layers |
Examples (GCP) |
Cloud Storage, Cloud Bigtable |
BigQuery, Cloud SQL |
Scalability |
Scales easily for massive raw data volumes |
Scales well for structured data with some limitations |
Best For |
Machine learning, raw data storage, streaming |
Business reporting, dashboards, fast queries |
Data Ingestion |
Any data, batch and real-time |
Only transformed/cleaned data |
Storage Format |
Supports formats like Parquet, Avro, JSON, CSV |
Optimized columnar/row formats for analysis |