cpu-architecture

Endianness: Big vs Little (with Bus Diagrams)

Denny Denny
9 min read
A multi-byte word laid out in memory two ways, big-endian above little-endian, with an address line going right.

TL;DR: Endianness is the convention for how a multi-byte value is laid out in memory. Big-endian stores the most-significant byte at the lowest address (network protocols, PowerPC); little-endian stores the least-significant byte at the lowest address (x86, RISC-V default). The choice is invisible until you read or write binary across machines, at which point it becomes everything.

A 32-bit integer takes 4 bytes. Memory is addressed one byte at a time. So when you write a 32-bit value to memory, which byte goes at the lowest address? That choice is endianness. It is one of the few decisions in computer architecture where two reasonable answers persist side by side, and where getting it wrong silently produces wrong numbers.

What “Endianness” Means

Take the 32-bit value 0x123456780\text{x}12345678. It has four bytes:

  • 0x120\text{x}12most-significant byte (MSB)
  • 0x340\text{x}34
  • 0x560\text{x}56
  • 0x780\text{x}78least-significant byte (LSB)

To store this in memory at address AA, you have to put four bytes at addresses AA, A+1A+1, A+2A+2, A+3A+3. The two main options are:

Big-endian. Store the MSB first.

AddressAAA+1A+1A+2A+2A+3A+3
Byte0x120\text{x}120x340\text{x}340x560\text{x}560x780\text{x}78

Little-endian. Store the LSB first.

AddressAAA+1A+1A+2A+2A+3A+3
Byte0x780\text{x}780x560\text{x}560x340\text{x}340x120\text{x}12

The same 32-bit value, the same four bytes, completely different memory layouts. A program that writes 0x123456780\text{x}12345678 on a big-endian machine and a program that reads four bytes from AA..A+3A+3 on a little-endian machine, byte by byte, will reconstruct 0x785634120\text{x}78563412. Same bytes, wrong number.

The names come from a 1980 essay by Danny Cohen, who borrowed from Gulliver’s Travels: the Lilliputians fought a war over which end of a boiled egg to crack first. The point of the analogy was that the choice is arbitrary, and yet groups will commit to it firmly enough to start a fight.

A Bus Diagram of the Read

The endianness convention is enforced by the CPU’s load/store unit. When the CPU executes a 32-bit load from address AA, it asserts AA on the address bus and takes 4 bytes from the data bus.

   CPU                         Memory
+--------+   address=A     +---------+
|  load  | ───────────────>|  byte 0 | A    : 0x12 (big) or 0x78 (little)
| 32-bit |                 |  byte 1 | A+1  : 0x34 (big) or 0x56 (little)
|        |   data[31:0]    |  byte 2 | A+2  : 0x56 (big) or 0x34 (little)
|        | <───────────────|  byte 3 | A+3  : 0x78 (big) or 0x12 (little)
+--------+                 +---------+

On a big-endian CPU, byte 0 (at the lowest address) is shifted into the most-significant lane of the 32-bit register; byte 3 lands in the least-significant lane. On a little-endian CPU, byte 0 lands in the least-significant lane and byte 3 in the most-significant.

The REGISTER and RAM component references describe the storage primitives that participate in this transfer. Endianness is purely a wiring convention between them: which byte of the wide register connects to which byte address of memory.

In synchronous memory access, the bytes on the bus are typically presented simultaneously over a wide data path; on a narrower 8-bit bus, the same bytes arrive sequentially as four cycles. The endianness convention applies identically in both cases.

Worked Example: 0x12345678 in C

Consider the following snippet of pseudocode:

uint32_t x = 0x12345678;
uint8_t  *p = (uint8_t *)&x;
printf("%02x %02x %02x %02x\n", p[0], p[1], p[2], p[3]);

On a big-endian machine: 12 34 56 78. On a little-endian machine: 78 56 34 12. The pointer pp steps through memory byte by byte, and the order of bytes is exactly the in-memory layout — which differs by endianness.

This is the standard endianness probe. It is also, incidentally, a common bug source: code that uses pointer-cast tricks to “extract” individual bytes from an integer is endianness-dependent and only works on one host without explicit byte swapping.

Which Architectures Pick Which?

ArchitectureDefault endianness
x86, x86-64Little-endian
RISC-VLittle-endian (big-endian variant exists)
ARMBi-endian, configurable; little-endian is overwhelmingly common
PowerPCBig-endian historically; later cores are bi-endian
MIPSBi-endian; defaults vary by SoC
SPARCBig-endian (V9 is bi-endian)
Motorola 68kBig-endian
z/Architecture (IBM mainframe)Big-endian
Network byte order (TCP/IP, IPv4, IPv6 headers)Big-endian

The bias today is firmly toward little-endian. x86 and ARM together account for the overwhelming majority of CPUs in service. Network protocols stay big-endian for legacy reasons — TCP/IP was specified in the late 1970s when big-endian DEC PDPs and IBM mainframes still set the tone.

Why Two Camps Persist

There are real advantages on each side, none decisive.

The Case for Big-Endian

Reads top-down match how humans write numbers. 0x120\text{x}12 is the first byte you see in memory, and it is also the first digit you would write on paper. A hex dump of a big-endian struct is directly legible.

Network byte order. The Internet runs big-endian by spec (RFC 791, RFC 1700). Anywhere a binary protocol gets serialised onto a wire, big-endian is the established convention.

Sign comparison is direct. The sign bit of a two’s complement integer is in the most-significant byte; a big-endian machine sees it at the lowest address, so a sign-extending load only has to look at the first byte fetched.

The Case for Little-Endian

LSB at LSB. The byte at address AA is the least-significant byte of the integer at address AA. This means a 32-bit pointer and an 8-bit pointer to the same address both point at the least-significant byte. Casting int* to char* does not move the address, so reading a low byte of a 32-bit value out of a 16-bit register is a free operation.

Arithmetic carry order. Multi-precision addition starts from the least-significant byte and ripples carries upward — toward higher addresses. A little-endian machine reads operands and writes results in the natural memory traversal order. This was a real performance argument on the 8086 in 1978 with limited buffering.

Width-agnostic loads. A 32-bit value at address AA is the same as a 16-bit value at address AA followed by another 16-bit value at A+2A+2, and the same as four 8-bit values starting at AA. Type punning between widths is invisible at the lowest byte. This is part of why x86 software historically tolerated wild casting.

Neither camp has won. ARM is the architectural compromise: bi-endian, with the kernel choosing at boot.

When Endianness Matters

In day-to-day programming on a single machine, endianness is invisible. The CPU does the byte ordering correctly for its own integer types, the compiler emits the right loads and stores, and the program never has to think about it.

It becomes visible the moment you read or write bytes — not integers — and those bytes have to mean the same thing somewhere else.

Binary file formats. ELF headers, PNG chunks, MP4 boxes, GIF, BMP, ZIP. Some are big-endian (PNG), some are little-endian (BMP, ZIP), and a few mix. Reading an ELF header on the wrong-endian host without swapping reads back garbage offsets.

Network protocols. TCP, UDP, IP, ICMP, DNS, NTP — all big-endian. The C library functions htonl, htons, ntohl, ntohs (“host to network long/short” and the inverses) are present specifically to convert between host byte order and network byte order. On a big-endian host these are no-ops; on a little-endian host they swap bytes.

#include <arpa/inet.h>

uint32_t host_value = 0x12345678;
uint32_t net_value  = htonl(host_value);  // 0x78563412 on x86
                                          // 0x12345678 on PowerPC

The wire bytes are identical on both hosts. The in-memory representation differs.

Device protocols. SPI flash, I2C EEPROMs, sensor registers — most pick a byte order in their datasheet and demand the host comply. Many are big-endian; some are mixed.

Type punning. Casting between integer types of different widths through a pointer or union, expecting bytes at fixed offsets — a pattern that “just works” on one host and silently fails on the other.

Memory-mapped registers in mixed-endian systems. A big-endian CPU controlling a little-endian peripheral (or vice versa) needs explicit byte-swap instructions on every register access.

The general rule: as soon as bytes leave the CPU’s exclusive custody, endianness becomes a contract that must be enforced. Inside one process on one host, you can ignore it. Across machines, files, protocols, or peripherals, you cannot.

Mixed-Endian and Middle-Endian

A footnote in the byte-order story: a few historical systems used middle-endian orderings, where 32-bit values were stored as two 16-bit halves in big-endian order, but each half was internally little-endian. The PDP-11 used this layout for 32-bit values in some operating modes. The result is that 0x123456780\text{x}12345678 stored as four bytes becomes 0x34,0x12,0x78,0x560\text{x}34, 0\text{x}12, 0\text{x}78, 0\text{x}56 — neither pure big nor pure little. Modern systems do not use middle-endian, but it occasionally surfaces in old binary file formats produced by PDP-era tools.

ARM in some old configurations also supported a “BE-32” mode where 32-bit accesses were big-endian but 16-bit and 8-bit were little-endian; this has been replaced by cleaner BE-8 in modern ARMv6+.

Endianness in the Fetch-Decode-Execute Cycle

When a CPU executes an instruction, it has to fetch the instruction word from memory. On a big-endian CPU, the instruction word is laid out top-byte-first; on a little-endian CPU, bottom-byte-first. The decode logic has to match. This is invisible in normal operation — the CPU only ever decodes its own instruction stream, so its own endianness is consistent — but it matters when reading a binary across hosts, e.g., for a debugger or disassembler examining a binary built for a different target.

The full instruction-fetch pipeline is covered in the fetch-decode-execute cycle post; endianness is the last detail that has to align between memory and the program counter.

Why Little-Endian “Feels Wrong” but is Hardware-Friendly

Looking at a hex dump of 0x123456780\text{x}12345678 on x86 — 78 56 34 12 — feels backward. The eye reads left to right, the digits are in reversed pairs, and the mental conversion is awkward.

But consider what the hardware does. The byte at address AA is the byte you most often want to operate on first when doing arithmetic: it is the least-significant byte, where carries originate. A multi-precision adder reads address AA, computes a sum and a carry, then moves to address A+1A+1, reads, adds with carry, and continues. Address-incrementing is the natural mode of every CPU’s address generator. Little-endian aligns the order of memory traversal with the order of arithmetic.

Big-endian has the reverse property: the byte at address AA is the most-significant byte, where the sign and the magnitude live, but where arithmetic finishes, not where it starts. For carry-propagate arithmetic, big-endian forces you to either pre-decrement an address pointer or do two separate passes. The wide-bus design of modern CPUs makes this almost free — all four bytes of a 32-bit operand fit on the data bus simultaneously — but on the narrow-bus 8-bit machines where these conventions were established, little-endian was a measurable win.

The aesthetic argument for big-endian is real but secondary. The mechanical argument for little-endian is what produced the modern silicon majority. The same trade-off shaped the ripple-carry adder topology and the byte-order conventions on top of it.

Detecting Endianness at Runtime

A portable byte-order detection in C uses a union or a pointer cast:

#include <stdint.h>

int is_little_endian(void) {
    uint16_t x = 0x0001;
    return *(uint8_t *)&x == 0x01;
}

If the byte at the lower address of xx is 0x010\text{x}01, the LSB is at the lowest address, which is little-endian. Otherwise the byte is 0x000\text{x}00 and the host is big-endian.

C99 and later provide a more disciplined alternative through <endian.h> (POSIX) or <sys/endian.h> (BSD), with macros __BYTE_ORDER__, __ORDER_BIG_ENDIAN__, __ORDER_LITTLE_ENDIAN__. Compile-time detection avoids any runtime overhead and lets the compiler eliminate dead branches.

For simulation in DigiSim, you can build a small read-modify-write loop that loads a byte at a low address, places it in the high byte of a register, and verify the result against an expected pattern — a hands-on way to feel how the wiring difference affects the value. The 3,500-year journey of binary post puts this in the longer context of how human-readable digit ordering interacts with machine convenience.

What’s Next

Endianness is one of the byte-level conventions you have to know to read or write binary across machines. The others are the integer encoding itself (two’s complement signed arithmetic) and the floating-point format (IEEE 754). The full instruction execution path that ties bytes back to running code is covered in the upcoming how a microprocessor works — fetch-decode-execute deep dive post.

Open the basic RAM memory system template to see byte-addressed memory in action, then write a 32-bit value across four addresses and read it back with a different byte ordering to feel the wiring difference directly.