Data Storage Type |
Row-oriented |
Columnar |
Columnar |
Distributed file system |
Designed For |
Streaming, serialization, row-based processing |
Analytical queries, big data processing |
Analytical queries, big data processing (Hive-centric) |
Storing large files reliably and distributed across many nodes |
Compression |
Yes (supports several codecs) |
Yes (Snappy, Gzip, LZO, etc.) |
Yes (Zlib, Snappy, LZO, etc.) |
No native data compression for format; relies on client or application-level compression |
Schema Evolution |
Yes, flexible schema evolution |
Yes, supports adding/removing columns |
Yes, supports schema evolution |
No schema enforcement; handles files and blocks |
Splittable for Parallel Processing |
No (not ideal) |
Yes |
Yes |
Yes – file blocks are distributed and processed in parallel |
Typical Use Cases |
Data serialization, message passing, streaming |
Data warehousing, analytics, ETL, ML workflows |
Hive data warehousing, analytics, ETL |
Distributed data storage for Hadoop and other large-scale applications |
Integration |
Kafka, Hadoop, various streaming systems |
Spark, Hive, Presto, Impala |
Hive, Spark, Hadoop ecosystem |
Hadoop, MapReduce, Bigtable (pre-Colossus), data-intensive apps |