3. Push and Pull | parsimonIT ZebraStream Knowledge Base

This tutorial shows how to implement robust data flows across systems and organizations using triggers and notification. In a pull based workflow, the sender waits for the receiver to request data, for instance via a download link. When it receives a request for data, it delivers the data, for instance by querying a database. In a push based workflow, the receiver waits for data to be ready. When the sender has data, for instance generated by a user interaction or sensor, the waiting receiver is notified instantly and can process the data. We demonstrate waiting and notification with ZebraStream using curl as a command line HTTP client.

Connect API

As we have previously shown, the Management API is for managing stream permissions and the Data API is for transferring data. The Connect API is at a higher level, and serves as an optional entry point to the Data API, implementing waiting and notification. This is useful for the majority of scenarios, in which sender and receiver are not coordinated precisely.

Some of the reasons we decided to keep this functionality outside the Data API, are:

Separation of concerns: the waiting process needs only light resources and is often run on different systems, for instance on a shared or embedded system. The waiter can for instance provision or spin up cloud resources just for data processing.
Embrace web technology: HTTP connections must operate over various external infrastructure services that promote short-lived connections. A dedicated endpoint increases the reliability and can evolve using different additional protocols, like WebSocket.
It just makes things clearer and less complex, to us and to our users.

The Connect API is as simplistic as the Data API and looks quite similar. Its technical documentation can be found at https://connect.zebrastream.io/redoc.

Wait

A waiter is a single long-lasting web request. It uses the same access token that is used to authorize the data transfer. To initiate a transfer, a waiter can be established on either the sender (pull based) or receiver (push based) side, or on both sides alike (mixed mode). Note that we recommend to make use of the Connect API on both sides, because it is robust to short connections breaks on the waiting side. Once both peers are matched, the waiter request comes back with the URL to the stream and the client then uses the Data API to transfer the payload data over this address with the same credentials and within the time limits set by the Data API.

Sender Connect

curl --request GET --no-buffer --http1.1 --header 'Authorization: Bearer Chohph3meephith7caecoohi' 'https://connect.zebrastream.io/v0/9b09a7ce-024d-4836-ab95-90b776dfd439/mystream?mode=await-reader'

Receiver Connect

curl --request GET --no-buffer --http1.1 --header 'Authorization: Bearer kie0EesheiV3Aewoh4utiYah' 'https://connect.zebrastream.io/v0/9b09a7ce-024d-4836-ab95-90b776dfd439/mystream?mode=await-writer'

Wait and Transfer Logic

Here is a simple shell snippet to explain the implementation logic for the sender. It uses both, the Connect API and the Data API.

# Connection settings
bearer='Chohph3meephith7caecoohi'
path='/9b09a7ce-024d-4836-ab95-90b776dfd439/mystream'
connect_url='https://connect.zebrastream.io/v0'

# Repeat waiter if connection breaks using Connect API
while true; do
stream_url="$(curl --request GET --fail --no-buffer --http1.1 --header "Authorization: Bearer $bearer" "$connect_url$path?mode=await-reader")"
test -n "$stream_url" && break
done

# Transfer payload via stream address using Data API
curl --request PUT --upload-file data.json --no-buffer --http1.1 --expect100-timeout 180 --header "Authorization: Bearer $bearer" "$stream_url"

Differences between Push or Pull?

With the ZebraStream Connect API, the semantics for push and pull based workflows are almost identical. If the receiver is meant to trigger the data flow, the sender should be waiting first. The receiver can add an optional timeout to the request, if it doesn't want to become a waiter. For push workflows, it should be the other way around. Mixed modes are possible, in which both peers are expected to connect around a particular time point with some tolerated waiting time.

Simplify

We can see that notification using the Connect API adds another request to the data exchange workflow. This is necessary, if the waiting process is run in a different environment than the data processing. However, it makes things slightly more complex in simple uses cases leveraging existing HTTP functionality for integration. For those cases, we ensure that

Data API and Connect API can still be used together (one-sided waiter)
The Connect API can implicitly redirect a receiver to the Data API, effectively creating a one-step procedure, if it supports redirects like a web browser
Like the Data API, the Connect API supports passing the access token as a query parameter

Automatic Receiver Redirect

To simplify the data receiver, add the parameter redirect=true to the Connect API request as a query parameter. This mode makes sense for pull based flows that are, for instance, triggered by a single download link. In these cases, we recommend to add a timeout, e.g. by appending timeout=30.

curl --request GET --no-buffer --http1.1 --header 'Authorization: Bearer Chohph3meephith7caecoohi' --location-trusted 'https://connect.zebrastream.io/v0/9b09a7ce-024d-4836-ab95-90b776dfd439/mystream?mode=await-writer&redirect=true&timeout=30'

Or with a single download link using the inline access token inline format

curl --location-trusted 'https://connect.zebrastream.io/v0/9b09a7ce-024d-4836-ab95-90b776dfd439/mystream?mode=await-writer&accesstoken=kie0EesheiV3Aewoh4utiYah&redirect=true&timeout=30'

Pasting this link to a web browser causes it to download the data once the sender delivers the data. You still need to handle possible connection timeouts.

Wrap-up

You are now equipped to build powerful, event-driven data flows with ZebraStream. Turn any local program output into a live, web-accessible data source. We are planning to make the Connect API even more powerful in the future.