Building Endian-Independent Code With C++
In most computers, each byte has its own unique address. Because the address refers to all eight bits of the byte, it's not useful to talk about the order of the bits within the byte. We can discuss the significance of the bits, but not their order.
Since the bytes of multi-byte numbers are individually addressable, we do need to consider the order of their bytes in memory. Two reasonable possibilities exist: big-endian and little-endian byte order.
Big-endian machines store the most-significant byte of a multi-byte number at the lowest-numbered address, with the other bytes following in decreasing order of significance. Little-endian machines store multi-byte numbers in the opposite order: from least-significant to most-significant.
Within a single computer, the byte order doesn't matter because the processor does all its memory loads and stores in the correct order, whichever that may be. The problem arises when we need to move data between machines with different endianness.
We could convert all multi-byte numbers to ASCII representations and transmit those, but that would require extra bandwidth and would complicate generating and parsing the messages. To avoid these costs, most low-level protocols use fixed-size binary fields. If these fields are larger than one byte then their endianness becomes an issue.
This article shows how to use C++ to deal with such mixed-endian systems in a simple and reliable way.
A real-world example
Consider the Internet Protocol. Each Internet datagram begins with an IP header, which consists of a series of fixed-size fields:
| Field Name | Size in bits |
|---|---|
| Version/Header Length | 8 |
| Type of Service | 8 |
| Total Length | 16 |
| Identification | 16 |
| Flags/Fragment Offset | 16 |
| Time to live | 8 |
| Protocol | 8 |
| Header Checksum | 16 |
| Source Address | 32 |
| Destination Address | 32 |
(The header can be extended with option fields, but we'll ignore that for now.)
The fields are transmitted in the order shown. For single-byte fields, the bit order isn't visible outside the hardware since the processor transfers bytes to and from the hardware as complete bytes, just as it does with memory bytes. In fact, the transmitted bit order can differ for various physical media, but this is not visible to the software.
But what about the multi-byte fields? Here we need an agreement on the byte order, or else some systems will interpret the multi-byte fields in the wrong order. For the Internet protocols, we transmit all multi-byte fields most-significant byte first, or in big-endian order. For example, we transmit the source address field as:
| Partial Field | Size in bits |
|---|---|
| Source Address, MSB | 8 |
| Source Address, 2SB | 8 |
| Source Address, 3SB | 8 |
| Source Address, LSB | 8 |
Representing fixed-format data in C/C++
When building a networked system, we'd like to use a single code base to describe the communications packet structures. A common code base avoids nasty copy-paste errors during development and makes maintenance easier. If the system includes multiple CPU types, however, we'll need to make the code base architecture-independent, which includes endian-independence.
In C or C++, fixed-format structures like those in the Internet protocols
are best described with structs. As a first try, we might define the IP
datagram header as:
struct IpHeader { char ver; // version and header length char tos; // type of service short len; // total length of datagram short id; // datagram identification short flags; // flags and fragment offset char ttl; // time to live char proto; // protocol of payload short sum; // header checksum int src; // source address int dst; // destination address };
This looks plausible, but there a few problems:
- while
chars are always 8 bits nowadays,shorts andint's can vary in size on different architectures - we should use unsigned types, since in this case the semantics of the fields are unsigned
- we haven't specified the byte order of the multi-byte fields.
We can easily solve the first two problems using fixed-size types from
stdint.h:
struct IpHeader { uint8_t ver; // version and header length uint8_t tos; // type of service uint16_t len; // total length of datagram uint16_t id; // datagram identification uint16_t flags; // flags and fragment offset uint8_t ttl; // time to live uint8_t proto; // protocol of payload uint16_t sum; // header checksum uint32_t src; // source address uint32_t dst; // destination address };
The stdint.h header comes from C99; it guarantees that the uintN_t types
will be N bits wide on all machines. If used, the "u" prefix indicates an
unsigned type.
The byte-order problem is harder; the rest of this article addresses it.
The traditional C approach to endianness
In our example of the IP header, big-endian machines will see the multi-byte fields in the order they expect, since that's the order they appear in the actual datagrams on the wire. Little-endian machines, however, will need the multi-byte fields byte-swapped before they can be used.
Take the len field for example. A datagram length of 64 bytes shows up in
the datagram as:
| Partial Field | Contents |
|---|---|
| Total Length, MSB | 0x00 |
| Total Length, LSB | 0x40 |
When a big-endian machine reads this two-byte field, it sees 0x0040, the correct number. A little-endian machine loading this field into a register sees 0x4000, which is incorrect. The little-endian machine needs to byte-swap the number before use.
The POSIX standard provides the functions htonl(), htons(), ntohl(),
and ntohs(), where the h means "host", the n means "network", and l
and s represent "long" (32-bits) and "short" (16-bits) respectively.
These routines perform the indicated transformation on all architectures,
whether or not this requires a byte swap. For example, htonl() might be
defined as:
#if BYTE_ORDER == BIG_ENDIAN uint32_t htonl(uint32_t x) { return x; } #else // LITTLE_ENDIAN uint32_t htonl(uint32_t x) { return x << 24 & 0xFF000000 | x << 8 & 0x00FF0000 | x >> 8 & 0x0000FF00 | x >> 24 & 0x000000FF; } #endif
(You can find BYTE_ORDER in <endian.h> on many systems. If not, it's
easy enough to set up something similar yourself.)
As an example, let's say we need to check whether a datagram's source address is in a "class-A" address block. Class-A addresses have their highest bit clear:
bool isSourceClassA(IpHeader* hdr) { return ntohl(hdr->src) & 0x8000000 == 0; }
In this (admittedly trivial) example, passing hdr->src to ntohl() before
use ensures we're testing the proper address bit, regardless of the
endianness of the processor. Note that if we want to modify a header
field, we need an extra step: we'll need to call ntohl(), modify the
value, then finally call htonl() before storing the result.
Problems with the traditional approach
The POSIX byte-swapping functions are fine for big-endian Internet datagrams, but what do we do for a schizophrenic protocol like ISO 11783? This protocol has big-endian headers but little-endian data fields. There's also the occasional 24-bit field we have to deal with as well.
We could start to define more byte-swappers, like these:
uint64_t htole64(uint64_t); uint32_t htole32(uint32_t); uint16_t htole16(uint16_t); . . .
and so on for about nine more variants, not including the 24-bit cases, of
course. Oh, and if we want fixed-endian int's, long's, float's or
double's, we'll need to cast to and from the unsigned integral types
provided by these new byte-swappers. We'd like to eliminate this sort of
casting if we can, in the interest of type-safety, but that would take many
more variants.
Of course, all of these are actually just aliases for either no-op's (where
swapping isn't required) or for one of a small number of byte-swap routines:
swap16(), swap24(), swap32(), and so forth.
But aside from this explosion of routines, there's also the maintenance problem. Every time we work on code which touches a fixed-endian field (either reading or writing it), we must manually remember to use the proper byte-swappers. If we miss just one, the program will be incorrect, but we may not notice until that particular part of the program runs on a machine with an endianness opposite that of the network.
Of course even with these drawbacks, there's been plenty of good network code written over the years. But if we use C++ intelligently, we can do better.
What we'd like to see
Before we move to C++, let's identify some design goals:
- Build a solution for every reasonable data size, optimizing the most common cases (16, 32, 64 bits) if possible.
- Write a single piece of code to handle the whole thing -- avoid error-prone copy-paste coding.
- Avoid the need to remember to use the swapper routines every time we touch a fixed-endian structure.
- Maintain type safety -- avoid the need to explicitly cast to and from the network structure fields.
The last two requirements bear closer examination.
The real problem with the traditional approach is the implicit nature of the data type and endianness of the fields. In other words, the type and endianness of the fields must be held in the programmer's mind instead of being made explicit in the field declarations.
Wouldn't it be better if we could simply declare a packet field as, say, a big-endian 16-bit integer and have the compiler sort out the required byte swapping and type conversions automatically?
To achieve this goal, we need to do two things: automate the byte-swapping and generalize the solution over any desired data type.
Automatic byte swapping
As a concrete example, let's build a BigEndianInt16 data type. This
16-bit integral type will be stored in big-endian order on all machines,
regardless of the machine's endianness. When we access it, however, we'll
automatically do any necessary byte swapping both after reading it and
before writing it. Of course we need only byte-swap on little-endian
machines, but we'll try to hide that decision from the application code.
To demonstrate BigEndianInt16, let's consider a fictitious protocol whose
header includes a 16-bit field to record the number of machines a packet has
passed through. Each machine which handles a packet will increment this hop
count.
class Header { // ... BigEndianInt16 hopCount; // number of hops void updateHopCount(); // add a hop to the count // ... };
When we receive a packet, it will come to us in a buffer pointed to by a
void*. We'll then "map" the header structure onto the buffer so we can
manipulate the header fields:
void receive(void* buf) { Header* hdr = static_cast<Header*>(buf); hdr->updateHopCounter(); // ... }
Using static_cast<>, we can tell the compiler that's it's safe to assume
that buf points to a Header. (We presumably know this since we're
familiar with the code that calls receive().) Once we have a Header, we
can call updateHopCount() on it.
For updateHopCount(), we'd like to write something such as the following
and have it do the right thing, regardless of the endianness of the machine
running the code:
void Header::updateHopCount() { hopCount++; }
BigEndianInt16 will be a class with two versions: one for big-endian
machines and another for little-endian machines. Each version will have the
same signature, by which we mean the same memory layout and public member
functions. Having the same signature, they'll be interchangeable -- we'll
choose the correct version for the target machine based on its endianness:
#if BYTE_ORDER == BIG_ENDIAN class BigEndianInt16 { }; // no byte-swapping needed #else class BigEndianInt16 { }; // byte-swapping on read and write #endif
The version for big-endian machines looks like this:
#if BYTE_ORDER == BIG_ENDIAN class BigEndianInt16 // no byte-swapping needed { int16_t rep; // in big-endian order public: BigEndianInt16() { } BigEndianInt16(int16_t i) : rep(i) { } operator int16_t() const { return rep; } }; #else class BigEndianInt16 { }; // byte-swapping on read and write #endif
The rep field holds the actual contents of the BigEndianInt16 object, in
big-endian byte order. The default constructor (used when mapping onto a
packet from the network) doesn't change rep's contents; it just allows us
to interpret the existing contents of rep as a BigEndianInt16.
The second constructor is special: it's a constructor with a single argument
of a type other than the class type. It's called a conversion constructor
because by creating a BigEndianInt16 from an int16_t, it, in at least
some sense, converts an int16_t to a BigEndianInt16. Since on a
big-endian machine the byte-order is already correct, this particular
conversion constructor just stores its argument in the rep field.
The operator int16_t() conversion operator does the opposite of the
conversion constructor; it converts a BigEndianInt16 to an int16_t.
Note that we don't specify the return value of a conversion operator; it's
assumed to be the type we're converting to. Again, no byte swap is needed.
While we're here, a mention of const is in order. The const on the
conversion operator is a promise that the operator won't modify the state of
the BigEndianInt16 object. In other words, a const member function is a
read-only function. The compiler can check our code against this promise
and can sometimes use it to generate better code.
Now version for little-endian machines:
#if BYTE_ORDER == BIG_ENDIAN class BigEndianInt16 { }; // as shown above #else class BigEndianInt16 // byte-swapping on read and write { int16_t rep; // in big-endian order public: BigEndianInt16() { } BigEndianInt16(int16_t i) : rep(swapInt16(i)) { } operator int16_t() const { return swapInt16(rep); } }; #endif
Since this code will run on a little-endian processor, the argument to the
conversion constructor will be in little-endian byte order. We therefore
need to call swapInt16() to swap the bytes of the argument before storing
them in memory in big-endian byte order. Likewise, the return value of
operator int16_t() must be in little-endian order, so we swap the bytes of
rep before returning them.
The function swapInt16() would look something like this:
int16_t swapInt16(int16_t x) { return x << 8 & 0xFF00 | x >> 8 & 0x00FF; }
Now we're ready to try our example. Unfortunately, we can't use exactly the syntax we wanted, but we can get close. Recall that we wanted to write:
void Header::updateHopCount() { hopCount++; }
We can't use the post-increment operator because the compiler doesn't know
what "++" means when applied to an BigEndianInt16. We can, however, add
one to the hopCount and store it back:
void Header::updateHopCount() { hopCount = hopCount + 1; }
The compiler deals with this code by taking the following steps:
The compiler first observes that the addition operation involves an
int16_t(1) and something else (aBigEndianInt16). The compiler knows what addition means for anint16_t, so it looks for a user-defined conversion fromBigEndianInt16toint16_t. It applies the conversion operatorBigEndianInt16::operator int16_t()tohopCountto obtain anint16_t.The compiler now generates code to add the two integers.
The compiler then observes that the assignment operator requires assigning an
int16_t(the sum) to aBigEndianInt16. It doesn't know what that means, but it does know how to convert anint16_tto aBigEndianInt16by calling the conversion constructor.Finally, the compiler performs the default assignment of class objects, which is a blind byte-by-byte copy of the contents. Since at this point we have two
BigEndianInt16's, that's exactly what we want.
Notice that the conversion operator is called when reading the original
contents of hopCount (swapping the bytes if necessary), and that the
conversion constructor is called before writing the new value to hopCount,
again swapping the bytes if needed. These automatic conversions to and from
user-defined class types are a standard feature of C++.
The byte swaps are now automatic. So long as we declare hopCount as a
BigEndianInt16, it will be stored in memory in big-endian order, but it
will be operated on in the proper host-endian order, regardless of the
endianness of the host system.
Generalizing the field type
We can use the C++ template facility to build a set of data types with
common characteristics. For example, rather than defining BigEndianInt16
or BigEndianDouble, we can generalize to a BigEndian template with a
compile-time argument of int16_t or double:
So instead of many separate types such as
BigEndianInt16 int16Field; BigEndianDouble doubleField;
we can define a BigEndian template and instantiate it for any desired
type:
BigEndian<int16_t> int16Field; BigEndian<double> doubleField;
What's the advantage? Simply that using a template we can write the code
for all big-endian data types once, instead of doing a copy-paste-edit every
time we need a new BigEndian type. With a template, if we need a new
big-endian type we can just create it on the fly.
This time rather than write two BigEndian template classes (one each for
big-endian and little-endian systems), let's write a single template class
and push the optional byte-swapping as far down inside as possible.
The syntax for the declaration of the class template looks like this:
template <typename T> class BigEndian { // details of class, in terms of type "T" };
So what's inside those brackets? First, we need storage for the object itself:
template <typename T> class BigEndian { T rep; // more details, in terms of type "T"; see below };
Whatever type T is, rep will be of that type -- the compiler will
determine the type from the template argument when we instantiate the
template.
Next we'll need a routine to swap the bytes in a T object. Here we'll use
a template function, so the compiler can automatically create the many
variations we need.
Since T can be of any size, we'll use sizeof(T) to find out how many
bytes to swap. Also, since the big-to-little and little-to-big swaps
actually use identical code, we can write a single routine to do either.
We'll add a boolean template argument to indicate which endianness we want,
setting wantBig true if we want big-endian byte order in memory:
template <typename T> T swap(const T& arg, bool wantBig) { #if (BYTE_ORDER == BIG_ENDIAN) == wantBig return arg; // no byte-swapping needed #else // swap bytes T ret; char* dst = reinterpret_cast<char*>(&ret); const char* src = reinterpret_cast<const char*>(&arg + 1); for (size_t i = 0; i < sizeof(T); i++) *dst++ = *--src; return ret; #endif }
If the desired endianness matches the system endianness, we just return the argument without swapping. In the mismatched-endianness case, we'll return the argument with its bytes swapped.
Since a T can be any number of bytes in size, shifts and masks won't work
-- instead we'll treat our T's as character arrays. We can't, however,
simply assign the address of a T to a character pointer; we need a cast.
We'll use reinterpret_cast, which completely turns off type-checking for
the pointer assignment. Now reinterpret_cast should be used only rarely,
but this is one case where it's helpful.
Once we have the pointers and size, we simply copy the bytes in reverse
order from the source to the destination then return the result. The
iteration may appear slow, but a good optimizing compiler (such as gcc)
will unroll the loop into straight-line code, at least for reasonable values
of sizeof(T). Writing the code as a loop has the advantage of being
correct for any T. (If necessary for some other compiler, we can unroll
the loop explicitly for the common cases after first ensuring the general
version works properly.)
Now we can complete the BigEndian template declaration and the
complementary LittleEndian template:
template <typename T> class BigEndian { T rep; public: BigEndian() { } BigEndian(const T& t) : rep(swap(t, true)) { } operator T() const { return swap(rep, true); } }; template <typename T> class LittleEndian { T rep; public: LittleEndian() { } LittleEndian(const T& t) : rep(swap(t, false)) { } operator T() const { return swap(rep, false); } };
It really is that simple. Note the similarity to the BigEndianInt16 class
we defined above. Instead of int16_t, we now have the placeholder T;
the rest of the code is similar, except that both the big-endian and
little-endian templates use swap(), passing in the desired endianness.
Refinements
We could stop here, but there are still a few things to improve.
Extract a base class
First, we can reduce global namespace pollution by making swap() a private
member of the template classes. We could do this by giving both
BigEndian and LittleEndian their own copies of swap(), but we'd like
to avoid such "copy-paste" coding if we can.
Instead, let's factor out a base class template, FixedEndian. While we're
at it, let's pull in everything else possible from the derived classes to
reduce redundancy.
template <typename T, bool wantBig> class FixedEndian { T rep; static T swap(const T& arg) { // Exactly as before. } public: FixedEndian(); FixedEndian(const T& t) : rep(swap(t)) { } operator T() const { return swap(rep); } };
Now we can rewrite BigEndian and LittleEndian in terms of FixedEndian:
template <typename T> class BigEndian : public FixedEndian<T, true> { public: BigEndian(const T& t) : FixedEndian<T, true>(t) { } }; template <typename T> class LittleEndian : public FixedEndian<T, false> { public: LittleEndian(const T& t) : FixedEndian<T, false>(t) { } };
Notice that we can use a template class as a base class. (In this case the
derived class is also a template class, but that's by no means required.)
The constructor syntax is a bit involved, but if you remember that the base
class's full name is FixedEndian<typename T, bool>, it should become
clear.
Notice also that we've moved swap()'s bool wantBig argument to the
template argument list since otherwise operator T() in the base class
wouldn't know the desired endianness. Although not as common as type
arguments, integral template arguments like wantBig are completely legal,
and are handy in situations like these. A bool template argument like
wantBig reduces to a compile-time switch in the much same way #ifdef
does.
Eliminate dependency on BYTE_ORDER
Using a symbol like BYTE_ORDER makes our code dependent on the system
headers, which can vary among different development environments. Though
such dependencies are sometimes necessary, we can avoid BYTE_ORDER with a
few lines of C++ code.
Consider:
union HostEndianness { int i; char c[sizeof(int)]; };
Recall that unlike a struct's fields, a union's fields all begin at the
same memory address, so that what we store in one field we can read back
from another field.
Let's store 1 into the i field of a HostEndianness union. On a
big-endian machine, the four bytes of the c array field will be {0, 0, 0,
1}, but on a little-endian machine, the bytes will be {1, 0, 0, 0}. To
determine the machine's endianness, we simply look at the first character in
the array -- on big-endian machines it will hold 0; on little-endian
machines it will hold 1.
In C++, we can encapsulate all of this logic into member functions of the
union:
union HostEndianness { int i; char c[sizeof(int)]; HostEndianness() : i(1) { } bool isBig() const { return c[0] == 0; } bool isLittle() const { return c[0] != 0; } };
Now we can create a HostEndianness object (which sets i to 1) and invoke
its isBig() method to determine the endianness of the processor. Since we
only need to run a single method on the object, we don't even need to name
the object -- we can just invoke the constructor and call the method on its
output.
FixedEndian::swap() now looks like this:
static T swap(const T& arg) { if (HostEndianness().isBig() == wantBig) return arg; else { T ret; char* dst = reinterpret_cast<char*>(&ret); const char* src = reinterpret_cast<const char*>(&arg + 1); for (size_t i = 0; i < sizeof(T); i++) *dst++ = *--src; return ret; } }
While this may look as if it generates a lot of code, it needn't. A decent
optimizing compiler will completely eliminate the HostEndianness object
and the HostEndianness().isBig() == wantBig expression, leaving only the
boolean result to guide the code generation. Since it knows this boolean at
compile-time, the compiler will generate either a direct reference to arg
or the code to byte-swap arg before use. The result is strong
encapsulation and great generality in the source code with no added cost at
run-time.
The full source code is available here. If you've found this article useful, we'd like to hear from you. Please leave a comment or email the author.