Apache Beam is a portable data processing programming model. It's open source, and it can be ran in a highly distributed fashion. It's unified and that is single model, meaning your pipeline code can work for both batch and streaming data.
Unified - Use a single programming model for both batch and streaming use cases
Portable - execute pipelines on multiple executaion environments
Extensible - write and share new SDKs,IO connectors and transformation information
Inputs --> Data --> Transforms --> Data --> Outputs
Apache pipelines are written in Java, Python or Go.
A Pcollection represents both streaming data and batch data.
There's no size limits your Pcollection either bounded or unbounded.
That's why it's called a Pcollection or parallel collection.
The more data, the more it's simply distributed in parallel across more workers.
For streaming data, the Pcollection is simply without bounds.
It has no end. Each element inside a Pcollection canbe individually accessed and processed.
This how distributed processing of the Pcollection is implemented.