If you are exploring various data exchange options or have expertise in a specific field, you may wonder about the benefits of using ZebraStream.This article discusses similar technologies, highlighting their typical characteristics and focusing on their limitations when used for data exchange.

File and Object Storage

Everyone knows and works with files, and they are still the predominant concept for data exchange. File servers and their protocols, like NFS, SMB, FTP, SFTP, WebDAV, and alike, have been around since the early computer days. Cloud object storage (like S3) are their modern descendants and they share many of their traits like access permissions and object hierarchy. Given our familiarity with this concept, we might not notice their flaws.

Even if the primary data source is a live process or database, it is still very common to export data in the form of one or more files and upload them to a shared storage for exchange with others. Chances are that you have already implemented something involving file upload and download, even if it is just Dropbox for sharing private data. Imagine a producer process creating a CSV file daily, which is then uploaded to an SFTP server in this business scenario. The consumer then monitors the storage for changes and retrieves the data once it detects them. Sounds simple, right? Let's delve into the details.

  1. Hosting Overhead: Someone needs to set up the file server, secure it, run updates, and manage access. If the producer and consumer are in different organizations, they need to clarify is responsible for this. You might opt for a managed cloud service, such as AWS, but there are still management responsibilities and costs associated with it.

  2. Additional Data Cleanup: Temporary files are common but need to be complemented with a cleanup policy. You can set up a time-based mechanism (retention time), let the consumer delete the data once retrieved, or have a storage-is-cheap approach, which just dumps the data in an archive. In any case, if you don't want to risk data loss, this policy needs to be clear to the consumer.

  3. Fragile Security: Unintentional data exposure is a risk closely related to infrastructure and cleanup. Some of the largest data breaches in recent years were caused just by misconfigured S3 storage. For example, breastcancer.org exposed 150 GB of sensitive patient data including nude images in 2021 and Pegasus Airlines lost 6.5 terabytes of flight data, employee personal information, and source code with plain-text passwords in 2022. These kinds of incidents can be very expensive for companies these days, not to speak of the loss of reputation. While it's easy to blame the IT staff, complex systems naturally come with a higher risk. Encrypting data client-side would make things more complicated and require even more technical expertise.

  4. Complicated Data Flow: Consumers are often data-driven processes that should start as soon as input data is available (push-based flows). To achieve this with a simple file exchange, you need to rely on external mechanisms for notification. One common approach is to have a process running that polls for changes on the server. This is a crude workaround for implementing push-based flows, which require a polling service and also consume resources at the file server. Implementing pull-based flows is more challenging as the consumer must direct the producer to create and upload data in these scenarios. In practice, implementations often resort to simple time schedules, like syncing once an hour or a day, which avoids that kind of complexity but slows down the overall process. For time-critical processes, it's essential to have fast and reliable notifications.

  5. Streaming: If there's no streaming capability, continuous time-series data, like sensor data or log lines, must be split into chunks to be turned into files. When such a file is uploaded, its entries are already outdated, even if the consumer downloads the file right after the upload. Consumers that are relying on instant processing cannot use file servers for exchange but must implement data streams.

  6. Implicit File System: Who remembers those DOS days when file names were restricted in characters and length? This might seem far away, but a file server is a remote file system, effectively a key-value store with some access control. Does any of this sound familiar to you? The server is case-sensitive, and you start seeing files Data.csv, Data.CSV, and data.csv in the same place? Files cannot be uploaded because they contain UTF-8 special characters or because they are larger than 4 gigabytes? Files cannot be created or deleted because the permissions are not correctly inherited in subfolders? I think you know what I mean.

File servers are great for storing data, but they can be complicated when you need to exchange data. If you are building a solution with file storage, remember these things and be ready to add your own logic. You might also be interested in the included features like user management, authentication, backup, and logging.

Web APIs

Web APIs are a popular choice for data exchange because they allow apps and services to share data. But when and why should we avoid them? To answer these questions, we need to understand the basics and then look at the details.

Unlike file servers, which accept any type of data, typical web APIs only accept pre-specified, structured data. This makes them a good choice for standardizing incoming and outgoing data in centralized services, which is necessary to build reliable, connected systems. For instance, companies like Stripe, eBay, and Google provide public APIs to interact with their services, and they are the gatekeepers of the implied data standards. Let's explore the challenges that arise when using web APIs for data exchange!

  1. Software Development Overhead: Web APIs are custom software that need to be developed and maintained. It's not clear who will pay for this work and the associated costs, the producer or the consumer. Also, developing software requires specific skills and is complex and challenging. When choosing this option, it's important to understand the long-term implications.

  2. Hosting Overhead: Web APIs evolved from simple web servers and are like web apps, which need to be hosted somewhere. The producer and consumer need to agree on where and how to host the API and which party will cover the costs. The type of data flow may also play a role here, because switching from a push-based flow to a pull-based flow may require a complete redesign.

  3. One-Sided Dominance: The really important question when using web APIs for data exchanges is: who is setting the standards? Web APIs were designed for a specific use case: there is one API provider that sets the standards and many API users that accept them by using the API. When the number of API clients gets small or even to one, this concept degenerates. In individual data exchange, compromises have to be made because the consumer has a precise understanding of the data, but the producer can only deliver their way.

  4. Indirect Data Flow: Let's look at data-driven applications. Whether you are a machine learning specialist, a researcher running physical simulations, or an analyst or data scientist: there are specialized computing platforms for all these applications. Suppose you built a web API to consume data from a data producer in a push-based flow. The data has to go from the API instance to the application platform before you can use it. This results in two consecutive data transfers, which can lead to similar problems and increased latency. Pull-based flows seem to have an advantage here because they can be established directly. However, it's likely that you're simply pushing the data locality problem to the producer.

  5. Data Restrictions: The traditional way web APIs work is by request and response. They weren't designed for large data transfer. Simple HTTP-based APIs mostly use inefficient JSON encoding and may run into problems with too many requests or large data. Newer contenders like GraphQL can solve some of these problems. However, adapting web APIs to the job of data exchange involves quite a bit of knowledge and consideration.

  6. Fragile Security: With web APIs, security is the burden of the hoster. Think of an API as the front door to your house, and an intruder will probably look there first. So you need to handle authentication, rate limitation, abuse detection, etc., in addition to infrastructure tasks like availability and scaling. Most cloud providers will sell you components for building API services, but to use them, you probably need to hire a cloud engineer. Notably, many big companies have failed to secure their APIs. For example, the major German casino operator Merkur exposed payment and personal data of over a million gamblers in their GraphQL API in March 2025.

In summary, Web APIs are useful for asymmetric use cases with many clients and clear ownership of the data format, as well as for exposing general logic. However, they may introduce extra costs or require additional skills for data exchange. If you use this approach, clearly define the roles of the data-exchanging partners and be aware that one of the partners will likely have the majority of the workload. On the positive side, you can add custom logic or instant data transformation to the API.

Message Brokers

Message brokers are central components of modern, event-driven architectures, like IoT networks or microservices. Just like file servers and web APIs, they facilitate data transmission between different peers. Message brokers usually work by the publish-subscribe mechanism, which connects multiple producers with multiple consumers through a topic. Topics work roughly like shared email inboxes. Data packages are called messages and are typically in a structured format like JSON. Let's see how this concept matches with a typical data exchange scenario!

  1. Hosting Overhead: Again, a message broker is centralized infrastructure, often running on top of a cluster. If you're not a cloud engineer, you can choose a managed provider with a nice web interface, but it's not unusual to pay over a thousand EUR per month for a Kafka cluster, which only makes sense for heavy use. If you want to use a message broker for data exchange, make sure that it supports access control for external partners, as these systems often lack suitable permission management because they were mostly designed for internal use.

  2. Data Restrictions: The smallest unit of data exchange is a message, and each message is typically restricted to a few megabytes. This doesn't matter for typical IoT use cases, but once data gets larger, you have to get creative. You can slice data objects or add compression. Large files or binary streams, however, need to be segmented before transmission, and keeping the order can require additional indexing, etc. As a workaround, large data objects are often uploaded to external storage like S3 and only referenced in the messages. Great, you have just combined the message broker pattern with the file server pattern!

  3. Data Flow: As the name implies, messages are published, and consumers subscribe to a feed of messages. Therefore, message brokers natively only support push-based flows, just like file servers. These systems were created for different use cases. For instance, when data processing is computationally expensive, the consumers are split into multiple instances, which can consume the input data in parallel. Similarly, if data input comes from multiple systems, they can all push into the same topic. Just like for web APIs, if the number of producers and consumers approaches one, the concept degenerates.

  4. Complex Architecture: The star-like architecture with the broker in the middle allows adding or removing producers and consumers dynamically, which can be a nice feature for data exchange in rare circumstances, but it generally makes things overly complex. Technical details, like partition management, are exposed to the clients, making the protocol and the integration more difficult. To ease the integration, a web API can be run as a bridge. Great, you have just combined the message broker pattern with the web API pattern!

PubSub sounds like a really nice idea for data exchange, and in theory it is! However, be aware that message brokers require strong alignment of the producer and consumer on the protocol level. The use of a specific technology is often pushed in data exchange by those who are already using it internally, which is a reasonable but one-sided view and may impose unexpected costs on your data exchange partner.

ZebraStream is KISS!

After reviewing the hidden complexities of file servers, web APIs, and message brokers, how does ZebraStream compare? Well, it borrows some good features while keeping things simple and stupid to avoid the overhead. Let's explore the main aspects!

  1. Stupid Simple Hosting: ZebraStream is a centralized infrastructure component and does not eliminate the need for hosting. We are a managed provider, running the necessary infrastructure with usage-based pricing with low fixed costs compared to managed file servers, web APIs, or message brokers.

  2. Stupid Simple Management: A stream is the basic unit of data exchange, without additional context like a server or cluster. To minimize the necessary remaining management overhead, we provide the simple ZebraStream Management API to issue access tokens for individual streams. We just cannot make it any simpler!

  3. Stupid Simple Integration: ZebraStream doesn't reinvent the wheel. It's basically a lean web API that uses encrypted HTTP for transfer and communication. This means that you don't have to worry if the client system speaks SFTP, S3, or Kafka.

  4. Unrestricted Data and Pure Streaming: ZebraStream only manages the transport layer, it doesn't force you to format data a specific format, like web APIs or message brokers, and even like some filesystems do. It handles data as a byte stream and not as a series of possibly unordered micro-batches like a message broker. This has major implications: First, you don't need to split data into smaller parts, either because file servers don't support streaming or because of size limits on messages. Second, the delay is shorter because there's no extra data processing. Third, data is never stored, so you don't have to worry about cleaning up or the risk of data exposure, like in those S3 incidents. The best part is that for the producer, it looks like a file upload, and for the consumer, like a file download. This means that clients don't need to be streaming-aware to use streams.

  5. Flexible Data Flow: ZebraStream is symmetric in two ways: First, both producer and consumer are connecting as clients, similar to a file server or message broker. Second, there is no difference between push-based or pull-based flows. Both, even hybrid modes, are natively supported using the ZebraStream Connect API. For illustration, you can either replace a message broker with one or more direct streams or deliver a web application for rendering via a stream on request.

  6. Simple and Secure Access: Time-restricted access tokens give access to the data content and are meant to be set per device and use case. On top of this, one can implement second-level authorization or end-to-end encryption.

Did you notice the overuse of the word simple? We're emphasizing simplicity because it's key. One of the major challenges in data exchange is the need for deep expertise in various areas and alignment of technical details, especially across different teams, organizations, and environments. Using ZebraStream can significantly reduce the time and cost of transporting data.

ZebraStream Restrictions

As with anything, there are trade-offs. The simple design of ZebraStream results in some limitations (or features, depending on the perspective).

  1. On-premise Hosting: ZebraStream is a cloud data relay with a very simple interface. If you feel uncomfortable about this dependency, keep in mind that you can always implement the Data API and establish direct connections without having to change the other peer's logic. This means you always have the option to revert to the simple hosted API pattern. So basically: no lock-in!

  2. Blocking Mode: ZebraStream requires the waiting party to be available on request (just like a file server, message broker, or web API) because the data flows instantly between producer and consumer. This shouldn't be a problem in the modern cloud era. If you find it challenging, reach out to us, and we can explore solution patterns together.

  3. All or Nothing: Being based on HTTP requests and responses, ZebraStream inherits a simple principle. There is no built-in checkpointing or strong confirmation that the consumer has received the data. Typically, ZebraStream can detect when the consumer terminates the connection correctly and propagates this information to the producer. However, if the transfer fails or is canceled by any of the peers, the other side is left with a simple failure notice. The recommended strategy is to consider a failure an exception and to have the producer retry the transmission in these rare cases. This also means that chunking large data can still make sense, although it is not strictly required. It is your responsibility to find your sweet spot for the optimal chunk size based on your specific use case.

  4. Transport Layer Only: While providing maximum flexibility, aspects of data encoding, including framing, serialization format, structure, and quality, are left untouched. In contrast, message brokers often come with some kind of (optional) format management and validation features. Web APIs typically enforce serialization and data structure. Client-side frameworks using, for instance, JSON Schema take over this work with ZebraStream.

I really hope this comparison will assist you in making a decision that will lead you to the data exchange technology that is most suitable for your particular use case!