4. Client-Side Streaming | parsimonIT ZebraStream Knowledge Base

Are you ready to enter the world of continuous data transmission with ZebraStream? Hopefully, this tutorial will spark some new thoughts and ideas. It will walk you through the process of sending and receiving data like sensor readings, GPS positions, or log lines using the ZebraStream Data API. This mode is useful for situations in which a remote system must continuously process data generated by frequent events. It is frequently used in real-time monitoring or tracking applications to eliminate the need for multiple requests. In this tutorial, we'll use curl to demonstrate ZebraStream's capabilities, showcasing its versatility and ease of integration with various frameworks. All settings are on the client side because ZebraStream already treats all data as a stream at the relay, whether it is finite or infinite.

Example Text Producer

Typically, each PUT upload request using HTTP requires the data size to be specified in advance, but HTTP 1.1 streaming mode, also known as chunked encoding, can be used instead. This implementation is only on the client side. Chunked encoding, a standard feature of the HTTP protocol, ensures smooth compatibility and consistent support in most HTTP libraries. In many frameworks, switching from file to streaming mode means simply replacing the input from a fixed-size object, such as a string or file, with a stream-like object, such as an iterator or pipe. In this example, we pass nanosecond timestamps from a loop to curl's standard input. The data is then automatically partitioned and transmitted to the consumer with a single PUT request. It should be noted that the loop in this example must be manually terminated in order to close the stream. In a real-world example, you might want to include a condition that closes the stream and stops the transfer.

while true; do
  date '+%Y-%m-%d %H %M %S %N'
done |
curl --request PUT --upload-file . --no-buffer --http1.1 --expect100-timeout 1800 -H 'Transfer-Encoding: chunked' -H 'Authorization: Chohph3meephith7caecoohi Bearer' 'https://data.zebrastream.io/v0/9b09a7ce-024d-4836-ab95-90b776dfd439/mystream

The only differences from the basic tutorial are

data comes from a process (the while loop) via standard input (special file name .), and is endless
we explicitly set chunked transfer encoding using the header (which curl would otherwise do automatically)

That's it!

Example Text Consumer

When processing streams, the consumer sees no difference from a regular file download, with the exception of receiving data in chunked encoding format, which should be handled transparently. Keep in mind that the download will continue until the producer closes the stream or the consumer stops reading it. Due to the potential size increase of streaming data, it is recommended to either offload the data to disk or process it directly. In this example, curl connects to the stream address, retrieves the data, and then passes it to a process called pv that consumes it to display live throughput stats.

curl --silent --request GET --no-buffer --http1.1 --header 'Authorization: Bearer kie0EesheiV3Aewoh4utiYah' 'https://data.zebrastream.io/v0/9b09a7ce-024d-4836-ab95-90b776dfd439/mystream' |
pv -ebat --line-mode > /dev/null

The only difference from the basic tutorial are

instead of writing the data to a file, it is passed on to a process

That's it!

When Should I Use Streaming Mode?

Streams are especially useful when dealing with large amounts of data that cannot be stored in memory at once. They're also useful when real-time processing or continuous data transfer is needed. Often, there is no clear distinction between flows that use files or batches and those that stream. For example, message brokers provide consumers with a stream of messages, which can also be viewed as a series of batches, depending on the data modeling. When dealing with extremely infrequent events in streaming, it is best to close the current stream and create a new one for each event. Even true continuous streams may have limits, causing them to be closed for technical reasons once they reach a certain size or duration. Ultimately, the most appropriate mode is determined by a variety of factors, including the specific application requirements and data structure.

Why Should I Select ZebraStream for Streaming?

The most common responses are simplicity, flexibility, and weak interdependency. It is common practice to begin with file-based flows and then realize, when limitations arise, that using streams would have been a better approach. At this point, switching may require replacing the entire technology stack. In contrast, ZebraStream blurs the line between file-based and streaming approaches, allowing for smooth transitions between the two. Furthermore, the producer and consumer can handle data independently as batches or streams as needed, which is consistent with ZebraStream's goal of reducing reliance on the data-exchanging parties or systems. For example, a producer on a large server may deal with large files, whereas a consumer in a web app or embedded device may process the data as a stream. This level of flexibility comes in addition to ZebraStream's other features, such as the ability to easily switch from a push-based to a pull-based flow.