DornerWorks

Communication Protocol Considerations Key to Modular Software Development

Posted on December 3, 2019 by Matt Johnson

In my last post I presented the case for developing embedded systems in small, robust modules by breaking up functionality among several processes or even separate circuit boards. Once logic has been separated into isolated silos, the subsystems must still work together as a whole via some form of communication.

In this post I’ll cover some of the things to consider when designing or choosing inter-process communication protocols based on my experience on a couple different projects.

“In general, it’s far better and more maintainable to have a clean interface that code calls into to interact with another area of the system.”

It’s generally considered bad programming practice to use global variables because it can make code quickly spiral into a spaghetti mess. Why then do some developers often share info between embedded processes with shared memory buffers or files that are accessible whenever and by whomever? In certain constrained embedded devices, this may be the only viable option, but in general, it’s far better and more maintainable to have a clean interface that code calls into to interact with another area of the system.

It’s easier to develop (and far easier to test) against common interfaces as defined by an interface control document. When designing a system, after separating out subsystems by functionality, it’s crucial to define clean interfaces that each subsystem or process will use. This applies to both processes on the same board and different boards communicating over a bus. In general, there are three main areas to consider when developing or selecting a shared communication strategy: serialization, routing, and transport layers.

Translate data through serialization

Serialization is the process of converting an internal data structure of one system to a message format that can be communicated over a channel to another system in a way that other system can understand it. In order to have one subsystem understand another they need to speak the same “language.” Some considerations here are:

  • Version headers: if one part of the system has been upgraded while another part is running an old version, does the message specify which version of the protocol is being used? And are systems backwards compatible with older protocol versions? Including a version number in your message’s header allows a receiver to know what version of a protocol to interpret the bytes as or if it’s supported. Make sure to always include a version “1” header field for the first release to make things simpler in the future should the protocol need to change.
  • Checksums: If data is somehow corrupted in transmission, how is the error detected and mitigated? CRCs or error correction codes should be included in your message to ensure nothing gets lost in translation or transmission. This also adds robustness that a receiver doesn’t start acting on validly formatted but faulty data.
  • Packet headers and frame sizes: Certain protocols are stream based (like RS232) while others are packet based (like TCP). If data is streamed, each message on the line must be delimited in some way so that the receiver knows where one message starts and another ends. Likewise, if a message is too large to fit in one packet, the sender must split the message among several packets and the receiver must know which packet is the end of an individual message and the start of another message. One way to implement this is to have a sentinel value that’s always present in the header to detect the start of a message (assuming the value can’t be included in the message body) and include the size of the message in the header so the receiver knows when to stop reading (assuming that size didn’t get corrupted). Another way is to COBS encode the message so that your data is packed into 0x00-0xFE values with a reserve value of 0xFF (or other value) acting as a delimiter between messages.
  • Binary or human readable: Packed binary data requires less conversion (better performance) and is generally smaller to transport, but text-based data is less susceptible to minor errors, more compatible when memory storage formats change, and easier to test and debug. Often binary is preferred for performance of real-time data while human readable is preferred for persistent storage accessed infrequently, like user configurations.
  • Endian conversions: If the message is sent from a little endian processor to a board running a big endian processor or vice versa, your raw data types will be misinterpreted with the byte swapping (assuming your data is binary instead of text based). Instead, the sender and receiver should interpret the messages using a consistent endianness, and if the native endianness mismatches the protocol endianness, perform the necessary byte swapping when encoding or decoding the messages. Unless conforming to an external standard, it’s best to avoid the overhead of endian swaps if all systems involved are of common endianness.
  • Shared or unique serialization scheme: if your system uses one shared encoding scheme, message types and encoding/decoding routines can be reused among each subsystem, even if the underlying routing or transport layer is different. In certain cases though, another encoding scheme is required in order to conform to some standard or interact with a preexisting system.

On one project we developed our own custom streamed header/payload format that used code generation scripts to create serialize/deserialize functions (similar to Google’s Protocol Buffers project). For another, we used JSON as a fast, flexible and human readable format that was easy to prototype quickly. For some projects we’ve also used XML (for persistent storage, not message communication), which is human readable but can be validated against a schema. Lastly, on another project we’ve used C#’s reflection and Marshalling plus COBS encoding to serialize Bluetooth data.

In each project, once the serialization scheme was designed and in place, it was trivial to add or update messages as development progressed. This allowed us to rapidly prototype each subsystem without needing to constantly fight over changing internal memory layouts of each subsystem, since a common area of code defined the shared serialization language of the various subsystems.

Define communication paths with routing

With multiple subsystems that a message could go to, the next thing to consider with message protocols is the routing. Routing defines those message paths. In this realm, consider:

  • Direct vs brokered: When an application sends a message, does it have direct communication with the other app or does the message go through some centralized message broker? Direct may be more efficient, but brokered can allow broadcasting or replaying messages in a fire and forget manner without each app needing to worry about as many details.
  • Does a sending subsystem impact the routes?: Does the sender need to send a copy to each system it wants to send it to? Or does it fire and forget that whoever’s listening will get a copy? Or does the type of message imply where the sender should deliver it?
  • Does a receiving subsystem impact the routes?: Can a receiver decide to subscribe to receive messages from whoever sends them? Does the receiver need to know who sent the message and respond differently based on the sender (such as replying to a specific sender)?

In one project, we defined the subsystem routing defaults as a property of the message type (though the sender could override this) and copies of the messages were sent directly to each application by the sender. On another project we used MQTT’s publish/subscribe framework so receivers subscribed to topics of interest and senders broadcast to the listening receivers via a broker.

By pruning routes so that only certain messages were sent and received by each subsystem, we were able to decouple subsystems’ interdependence and reduce the required amount of testing. Adapting our routing definitions, we could easily and dynamically re-route messages between test applications and the subsystems they tested, ensuring the inputs and outputs of each subsystem behaved correctly.

Move data through the transport layer

At this point our subsystem has a message serialized and knows where it’s going, but in order to actually get there, the system needs some sort of transport layer. This layer can be an existing bus like UART, CAN, Bluetooth or Ethernet, or something internal or custom. Some points to consider when choosing a transport layer are:

  • Throughput: How fast and how much overhead is there to read/write on this channel? Given your serialized data size, how many messages on average can you get from one subsystem to another per second and with what latency?
  • Reliability: How likely is it for data to be lost or corrupted in transmission? Does the transport layer automatically handle resending data or does your application need to handle this manually?
  • Discrete vs continual stream: When sending data, does it need to be broken up into packets and reassembled in the correct order? Make sure to buffer the receiving end for parts of messages until a whole message comes through.
  • Sharing the transport layer with others: If two subsystems are sending data between each other and a third tries to send or receive at the same time, the system needs to keep messages from mixing and interfering with each other. This can be partially accomplished by using different ports/topics, header data, and packetization.

In Linux, it’s common to use sockets, file pointers, or pipes for inter-process communication. However, using TCP or UDP packets on a local network interface can work well too and can make development easier by being able to sniff traffic or inject new messages dynamically on the target from another PC on the network (though make sure to disable any remote access in the release build for security reasons). One of my projects used UDP packets under Linux while using the same serialization for RS-232 serial communication between different boards: a proxy application acted as a bridge that retransmitted messages from one transport layer to the other.

“Break your large applications down into bite-sized apps then glue them back together with clear communication protocols. They will be stronger than if they stayed in one piece.”

A less traditional approach that worked well on a few of projects I’ve been on was to use MQTT as implemented by the program “mosquitto.” (see this post for more details). In addition to the publish/subscribe routing it provided, it added several extra bonus features automatically such as delivering retained messages to processes that started listening only after the sender already sent the message, and notifying other applications that an app exited via a last-will message.

Bringing your modular software development together

Modular embedded applications require each subsystem to communicate well with each other to get the job done. If communication is done right, the advantages of such systems over traditional monolithic applications are immense. Not only can a developer or tester monitor or inject messages between subsystems dynamically, but entire sessions of inter-process communication can be recorded and played back for reliable test case automation. Modular embedded architectures help cut development and testing costs while improving maintainability and reliability.

Break your large applications down into bite-sized apps then glue them back together with clear communication protocols. They will be stronger than if they stayed in one piece.

Matt Johnson
by Matt Johnson
Embedded Engineer
Matt Johnson is an embedded engineer at DornerWorks.