Instruction Register and the Decode Stage
TL;DR: The instruction register (IR) holds the binary instruction word fetched from memory. The decode stage splits that word into fields — opcode, register addresses, immediate — and feeds each field to its own decoder so the control unit can generate the right control signals for the execute stage.
The instruction register is the bridge between memory and control. Once the program counter selects an address and memory returns the bits stored there, those bits land in the IR and stay there for the rest of the cycle. Everything that happens next — register reads, ALU operation selection, write-back — depends on which fields of that bit pattern the decoder pulls out.
This post covers what the IR holds, how field extraction works, why fixed-width RISC instructions are easier to decode than x86, and how the IR connects to the control unit. Each part maps to a working circuit in the Sequential Instruction Executor template.
What Does the Instruction Register Hold?
The IR holds the entire binary word that represents one instruction. It is a regular register with one specialized purpose: the decode stage reads its bits, splits them into named groups, and uses each group as an address for a smaller decoder.
For an 8-bit toy CPU, an instruction might look like:
7 6 5 4 3 2 1 0
┌─┬─┬─┬─┬─┬─┬─┬─┐
│ opcode │ rd │
└─────────┴─────┘
3 bits 5 bits ? (depends on encoding)
For a real 32-bit RISC instruction (RISC-V R-type as the canonical example):
31 25 24 20 19 15 14 12 11 7 6 0
┌────────────┬────────┬────────┬───────┬─────────┬──────────┐
│ funct7 │ rs2 │ rs1 │ funct3│ rd │ opcode │
└────────────┴────────┴────────┴───────┴─────────┴──────────┘
7 bits 5 bits 5 bits 3 bits 5 bits 7 bits
The IR for that CPU is exactly 32 bits wide. After fetch, every one of those bits is available in parallel — the decoder simply taps wires IR[6:0], IR[11:7], and so on.
DigiSim’s INSTRUCTION_REGISTER component is the latch sitting between memory and the decode logic in the simulator’s CPU template.
How Does the IR Get Loaded?
The IR is loaded from the memory data register (MDR) — the latch that captures the value memory returns on the data bus. The sequence each cycle is:
- The PC drives an address onto the address bus.
- Memory reads that address and places the value on the data bus.
- The MDR latches the data bus.
- On the next clock edge, the IR latches the MDR’s value.
That fourth step is just the IR’s clock-enable being pulled high while every other write-enable in the CPU is held low. Once the IR holds the word, the rest of the cycle reads from it, never from the MDR or the data bus directly. This is what gives the decode stage a stable input even if memory’s data bus changes during execute.
The full memory side of this handshake is covered in the Fetch-Decode-Execute case study.
Field Extraction: Wires, Not Logic
Splitting the instruction into fields is the easiest part of CPU design: it is just naming groups of wires. There is no arithmetic, no clocking, no logic gate. A 32-bit IR has 32 output wires. The decoder simply renames a contiguous slice of them.
For the RISC-V R-type encoding above:
opcode = IR[6:0] // 7 wires
rd = IR[11:7] // 5 wires
funct3 = IR[14:12] // 3 wires
rs1 = IR[19:15] // 5 wires
rs2 = IR[24:20] // 5 wires
funct7 = IR[31:25] // 7 wires
In a hardware description language this becomes a one-liner. In an actual schematic — like the kind drawn in DigiSim — it is just bundles of wires routed to different consumers.
The Three Consumers
Each field flows to a different part of the CPU:
- opcode + funct3 + funct7 flow to the control unit. They select which signals fire —
RegWrite,ALUSrc,MemRead,Branch— and which ALU operation runs. - rs1, rs2 flow to the register file as read-port addresses. Two general-purpose registers come out, one per port.
- rd flows to the register file as the write-port address.
- immediate fields (in I-type, S-type, B-type, U-type, J-type RISC-V encodings) flow through a sign-extender into the ALU’s second input.
Decode Logic: A Small Decoder Per Field
The opcode is the field that needs real logic, because it must be turned into one-hot control signals. This is exactly the decoder component covered in Decoders and Encoders Driving a 7-Segment Display.
For an opcode field of width , the decoder produces output lines, exactly one of which is high. Each output line is the “this is instruction X” signal for a specific instruction.
Worked truth-table fragment for a 3-bit opcode field on a toy CPU:
| Opcode bits | Instruction | One-hot output |
|---|---|---|
000 | NOP | is_NOP |
001 | LDA | is_LDA |
010 | STA | is_STA |
011 | ADD | is_ADD |
100 | SUB | is_SUB |
101 | JMP | is_JMP |
110 | BNZ | is_BNZ |
111 | HLT | is_HLT |
Each is_X line then drives a small AND/OR mesh that activates the right control signals. For example:
The boolean expressions for the control signals are derived using the same techniques covered in Mastering Sum of Products and minimized with the K-map approach from Visualizing Logic with Karnaugh Maps.
Why Fields Get Their Own Decoders
Register addresses do not need a separate logic decoder — they are passed directly to the register file’s address input, which has its own internal decoder converting the 5-bit field into 32 word-line selects. The IR’s job ends at handing over the right 5 wires.
Immediates do not need a decoder at all. They are extended (zero or sign) and routed to the ALU. The “decode” of an immediate is just rewiring with a sign-extension circuit.
Real-World Encodings: RISC vs CISC
Fixed-Width RISC: Easy
RISC architectures — RISC-V, MIPS, ARM A32, classic SPARC — fix the instruction width at one or two sizes. RISC-V is 32 bits for base instructions, 16 bits for compressed. The decoder always knows where each field starts because it never moves.
Decoding a fixed-width instruction is single-cycle, single-stage, and combinational. A handful of decoders and a few hundred gates produce the full set of control signals.
Variable-Length CISC: Hard
x86 is the canonical messy case. An x86 instruction is 1 to 15 bytes long and consists of:
- Optional prefix bytes (up to four): override defaults like operand size, address size, segment, or repeat behavior.
66,67,F0,F2,F3, segment overrides, and theREXprefix in 64-bit mode. - Opcode (1 to 3 bytes).
- Optional ModR/M byte specifying addressing mode.
- Optional SIB byte for scaled-index addressing.
- Optional displacement (1, 2, or 4 bytes).
- Optional immediate (1, 2, 4, or 8 bytes).
The IR cannot be a fixed-width register holding “the instruction.” Instead, modern x86 implementations:
- Buffer 16 bytes of instruction stream at a time.
- Run a length-decoder pass that reports where each instruction begins and ends.
- Translate each instruction into one or more micro-operations (μops) — small RISC-like internal instructions.
- Feed μops to an out-of-order execution core that looks essentially RISC inside.
The “instruction register” in an x86 chip is conceptually distributed across the fetch buffer, length-decode, and μop queue stages. The lesson stays the same: the decoder needs to know where each field is before it can do anything else, and variable lengths make that step nontrivial.
ARM Thumb, RISC-V’s compressed extension, and Renesas RX use simpler variable-width schemes — the first few bits of the instruction tell you how long the rest is, so the decoder is still cheap.
Hardwired vs Microcoded Decoding
There are two ways to turn IR fields into control signals:
- Hardwired control. A combinational network of gates — exactly the decoders and AND/OR meshes described above. Fast, fixed, hard to change after fabrication. Used in nearly all RISC chips.
- Microcoded control. The opcode field selects an entry in a microcode ROM. Each entry is a wide word containing the control signals for one microinstruction. A microinstruction sequencer steps through several microinstructions per machine instruction. Slower but much easier to implement complex CISC instructions like x86’s
ENTERorLOOPNE.
Real x86 chips combine both: simple instructions are decoded with hardwired logic, while a handful of complex ones (string operations, far calls) drop into microcode. The microcode ROM is in essence a ROM-like array indexed by a μop counter that runs in parallel with the main PC.
Putting the IR in Context
The IR sits in a four-register pipeline that defines the fetch and decode stages of a simple CPU:
PC ──▶ MAR ──▶ memory ──▶ MDR ──▶ IR ──▶ decode logic
The PC drives the memory address; the MAR latches it; memory returns a word; the MDR latches it; the IR latches that word on the next clock edge; the decoder pulls fields out combinationally and the control unit fires.
Every register in this chain except the IR is built from the same 4-bit register building block widened to match the bus. The IR is structurally identical — it is the role in the data flow that makes it the IR, not any specialized hardware.
Common Pitfalls
- Loading IR before MDR is stable. The IR’s clock-enable must fire after memory has returned data and the MDR has captured it. Pipelining these two registers in the same cycle without a clear handoff causes the IR to latch garbage.
- Reusing IR bits during execute. If the execute stage modifies a register that feeds back into the IR’s wiring, the decoder sees the new value mid-cycle. Keep the IR write-disabled for the rest of the cycle after fetch.
- Treating the opcode as the only “decoded” field. Funct fields, mode bits, and prefixes carry decoder-relevant information. Missing them means SUB and ADD share an opcode and the CPU does the wrong arithmetic.
- Letting the IR float on reset. On power-up, the IR may contain random bits whose opcode happens to be HLT or, worse, a memory-write instruction. Either reset the IR to a known NOP or hold the control unit in a known state until the first fetch completes.
Build It in DigiSim
Open the Sequential Instruction Executor and find the IR. Stop the clock, manually load a binary value into the IR via the test harness, and watch the control unit’s output lines change in real time as you flip individual bits. Toggling the opcode bits demonstrates how a one-hot decoder routes a single instruction selection to dozens of downstream control signals.
The next steps in this CPU-architecture series cover the program counter (already published) and the upcoming CPU Flags Register: Carry, Zero, Overflow, Sign post, which closes the loop on how the result of one instruction influences whether the next branch is taken. Together those three pieces — PC, IR, flags — are the entire control plane of a simple processor.