What is Apache ORC?

Apache ORC (Optimized Row Columnar) is a free and open-source column-oriented data storage format of the Apache Hadoop ecosystem. It is designed to provide efficient storage and fast access for large datasets in distributed computing frameworks.

Key Features

Use Cases

Apache ORC is widely used in big data processing environments, especially with Apache Hive and Apache Spark, for analytical queries on massive datasets. It is well-suited for:

Comparison with Other Formats

While ORC is one of the popular columnar data formats, others like Apache Parquet and Avro also serve similar purposes: