Understanding Checksums, CRCs, & Hashes in Embedded Software

Oct 31, 2024

Understanding Checksums, CRCs, & Hashes in Embedded Software

Every communication protocol should have a mechanism to determine whether the received data is valid. I once encountered a team that didn’t have such a mechanism. Their device operated in a radiation-intense environment, and the data they received from their sensors was suspect due to an inability to validate the incoming sensor readings. They couldn’t tell if the rapidly changing values were real or if bits were flipping in transit from their sensors! 

Whenever data is being transferred from one computer system to another, developers must include some means to verify the data on the receiver side of the transmission. During transmission, it’s possible for a bit, or several, to flip and change the value of the transmitted data. Obviously, this would then corrupt the data, which could wreak havoc on the system. 

In today’s post, we will examine some of the options available to verify that the transmitted data is valid. We will also include hashes and describe how they differ from checksums and CRCs. 

The differences between checksums, CRCs, and hashes

Before we dive into the nitty-gritty details, let’s define checksums, CRCs, and hashes. 

A checksum is an algorithm designed to detect errors occurring naturally or randomly. The algorithm is executed across a set of data to get the checksum, which is then later compared against a recomputed version to verify the data. It’s important to realize that all checksums are not created equal and can detect different errors. For example, one checksum may detect that a single bit has changed, but a different checksum may be able to detect whether several bits change simultaneously. Just because a checksum is found to match does not guarantee that there are no errors in the data! 

Now, a CRC is a checksum. It’s a specific type of checksum that uses polynomial division to calculate the checksum. As you can imagine, performing polynomial division on an embedded system, especially a microcontroller-based embedded system, is computationally expensive! There are added benefits, though, in that a CRC can detect a more-extensive range of errors than can simpler checksums. 

CRCs are so effective that many microcontroller vendors include a hardware-based CRC calculator to allow a developer to use one without the added overhead of calculating it in software. Unfortunately, whether it is included is very hit-and-miss, so you need to carefully read your microcontroller’s datasheet. (They tend to be excluded in the value line parts, but I have seen this starting to change). 

Checksums and CRCs are designed to detect random errors, but they are not good at detecting intentional changes to the data. It’s fairly easy to reverse-engineer a checksum used to verify the data integrity of a file or a message. An attacker could then change data and recalculate the checksum. In order to protect data against intentional changes, a developer would need to use a cryptographic hash. A hash maps a dataset to another dataset. They are designed to be unidirectional; small changes in the input can result in significant changes in the output. They are helpful when you want to quickly compare the integrity of a “large” amount of data. 

When designing a communication protocol for an embedded system, using a hash on the packet protocol usually isn’t the correct solution. A CRC such as a CRC-32 will be able to detect errors more readily especially if a hash were to be reduced to within a 32-bit value where hash collisions could occur. Let’s take a look at a common checksum that has been shown to be a very good balance between error detection and performance. 

The Fletcher16 checksum

The Fletcher16 checksum has great application within embedded systems because it was designed to approach the error detection capabilities of a CRC but with lower computational power through the use of sums. The Fletcher16 is unique in that it uses an 8-bit modulus just like any other simple checksum; However, the algorithm also computes a second simple checksum, the sum of the previous results. 

Once the checksums are calculated, they are combined into a final 16-bit checksum. This has two extraordinary benefits over simple checks. First, it expands the possible checksum values from 255 to 65535, improving the chances that a random error will result in a successful checksum down from 0.4% to 0.0015%. Second, the checksum can now detect if the order of bytes is reversed due to the calculation of the second checksum. 

An example implementation for the Fletcher16 checksum in C can be seen below:

uint16_t Fletcher16_Calculate(uint8_t const * Data, uint16_t Bytes)

{

     uint16_t sum1 = 0;

     uint16_t sum2 = 0;

     int index = 0;

     for (index = 0; index < Bytes; ++index )

    {

        sum1 = (sum1 + data[index]) % 255;

        sum2 = (sum2 + sum1) % 255;

     }

     return (sum2 << 8) | sum1;

}

As you can see, this does not require much computational power to calculate; even for data packets that are a kilobyte in size, the checksum can be calculated in a few microseconds on most modern processors. 

Conclusion

When selecting whether to use a hash, checksum, or CRC, it’s crucial to emphasize the importance of selecting the right method for error detection based on the context of your embedded system. While checksums, CRCs, and hashes all play valuable roles in ensuring data integrity, they each offer different levels of protection and performance trade-offs.

For most embedded systems, where performance and memory constraints are critical, leveraging a checksum like Fletcher16 or a CRC-32 provides an efficient balance between computational cost and error detection. These methods can safeguard data transmission without introducing significant overhead, making them ideal for real-time systems where timely responses are critical.

That said, as the complexity of embedded systems and communication protocols increases—especially in security-sensitive applications—it’s important to understand the limits of these mechanisms. In scenarios where tampering is a concern, integrating cryptographic hashes may be necessary for ensuring data integrity and authenticity.

Ultimately, understanding the differences between these techniques and how to implement them effectively is key to building robust, reliable, and secure communication protocols in embedded systems.