Program Counter Explained: The CPU's Bookmark
TL;DR: The program counter (PC) is the register that holds the memory address of the next instruction a CPU will fetch. It increments automatically after every fetch, but branch and jump instructions overwrite it with a new target address — that single piece of hardware is what makes loops, conditionals, and function calls possible.
The program counter is the smallest, simplest register in a CPU and also the one that controls everything. Every instruction the processor will ever execute is selected by whatever address the PC happens to be holding at the start of the next fetch cycle. Without it, the processor has no idea where the program is, where it has been, or where it is going next.
This post walks through what the PC stores, how it increments, how branches and jumps redirect it, and how to wire one up from a register, an adder, and a multiplexer. Each component links back to a working circuit you can poke at in the Sequential Instruction Executor template.
What Is the Program Counter?
The program counter is a register dedicated to one job: pointing at the next instruction in memory. It is not a general-purpose register that user code reads and writes freely. It is a special-purpose register driven by the CPU’s control logic.
Its width determines how much memory the CPU can address for code:
- An 8-bit PC can address memory locations.
- A 16-bit PC can address bytes.
- A 32-bit PC can address billion bytes.
- A 64-bit PC can address bytes.
DigiSim’s PROGRAM_COUNTER_8BIT component implements the 8-bit case, which is large enough to run real programs in the simulator while staying small enough to inspect bit-by-bit.
The PC Sits at the Top of the Fetch Stage
The fetch stage of the fetch-decode-execute cycle reads four signals in sequence:
- The PC drives its current value onto the address bus.
- Memory returns the instruction word at that address.
- The instruction word is latched into the instruction register.
- The PC updates so that the next fetch reads the next instruction.
Steps 1–3 are addressed in the broader fetch coverage. Step 4 is what this post is about.
Why Must the PC Increment Automatically?
If the PC stayed put, the CPU would fetch the same instruction forever. Hardware therefore wires an automatic increment into the PC so that, by default, every fetch cycle is followed by a step forward.
The increment is computed by a dedicated adder — typically a small ripple-carry adder like the one detailed in Mastering Binary Addition — whose inputs are the current PC and a constant offset. The output of that adder feeds back into the PC’s data input, and the PC latches it on the next clock edge.
PC + 1 vs PC + 4: Why Instruction Width Matters
The constant added to the PC is the width of an instruction in addressable units, not always the literal value 1.
| Architecture | Instruction width | Increment | Reason |
|---|---|---|---|
| DigiSim 8-bit toy CPU | 1 byte (1 word) | PC + 1 | Each address holds a complete instruction |
| MIPS, classic RISC-V (RV32I) | 4 bytes | PC + 4 | Memory is byte-addressed; instructions are 4 bytes wide |
| ARM (A32) | 4 bytes | PC + 4 | Same reason |
| ARM Thumb / RV32C | 2 bytes | PC + 2 | Compressed instruction encoding |
| x86 | 1–15 bytes | PC + (decoded length) | Variable-length instructions; the decoder reports the length |
The principle is the same in every case: after fetching an instruction, advance the PC past the bytes that instruction occupies. RISC architectures get a fixed adder; x86 needs the decoder to feed back the length.
Boolean Description
For an architecture with fixed instruction width , the default next-PC value is:
In the DigiSim 8-bit case, , so the increment is just a +1 adder.
How Do Branches and Jumps Override the PC?
Sequential execution is the boring case. Real programs need loops, conditionals, function calls, and returns — all of which require the PC to skip somewhere other than the next sequential address.
Two operations cover all of them:
- Jump (unconditional): Replace the PC with a target address regardless of any condition.
- Branch (conditional): Replace the PC with a target address only if a flag — zero, carry, sign, overflow — is in the required state. Otherwise fall through to PC + W. The flag bits live in the flags register, covered in the upcoming companion post on carry, zero, overflow, and sign.
Both operations boil down to writing a non-sequential value into the PC.
Where Does the Target Address Come From?
The target is encoded in one of three ways:
- Immediate (PC-relative): The instruction word contains an offset. The next PC is
PC + offset. RISC-VBEQ, ARMB, and x86 short jumps all use this form. The advantage is position-independent code. - Absolute: The instruction word contains a complete target address. This is common in older architectures and in x86 long jumps.
- Register-indirect: The target lives in a general-purpose register or in the ALU output. RISC-V
JALR, ARMBX, and x86JMP raxare examples. This form supports function pointers, virtual dispatch, and computedgoto.
In every case, the new PC value arrives at the program counter through a multiplexer.
Implementation: Register + Increment + Mux
The hardware structure of the PC is one of the cleanest data paths in a CPU:
┌──────────────┐
PC current ─────▶│ Adder (+W) │──── PC + W ──┐
└──────────────┘ │
▼
┌───────────────┐
│ MUX (2:1) │
target_addr ─────────────────▶│ sel = take? │──── next_PC
└───────────────┘
│
▼
┌───────────────┐
clock ─────────────────────▶│ PC register │──── current_PC
└───────────────┘
Three pieces:
- A register to hold the current address. This is just an 8-bit register — the same building block covered in Mastering the 4-bit Register, widened to match the address bus.
- An adder wired with one input as PC and the other as the constant width .
- A 2:1 multiplexer with one input from the adder (default sequential path) and one from the branch target (taken-branch path). The select line is driven by the control unit and is asserted when the current instruction is a jump or a branch whose condition evaluates true. Multiplexers are the topic of The Data Traffic Controller.
On every clock edge the PC latches whatever the mux is currently selecting. That single latch is the entire mechanism that distinguishes “advance” from “branch.”
Optional Inputs to the Mux
Real designs add more sources to the same mux:
| Source | When selected |
|---|---|
| PC + W | Default — sequential execution |
| PC + offset (PC-relative) | Conditional branches, short jumps |
| Absolute immediate | Long jumps, calls to fixed addresses |
| Register / ALU output | Indirect jumps, function returns, switch dispatch |
| Reset / boot vector | After power-on or external reset |
| Exception / interrupt vector | After a trap |
The select line widens from 1 bit to 3 bits, but the structural idea is unchanged.
Reset Behavior: Where Does the PC Start?
When the CPU is powered on or reset, the PC must hold a known address — otherwise execution begins at random.
There are two common conventions:
- PC = 0: Execution begins at address
0x0000…0. The DigiSim toy CPU and many embedded microcontrollers use this. ROM is mapped at the bottom of the address space, and the first instruction at address 0 is the first instruction of the program. - Boot vector: A fixed non-zero address. x86 cold-resets to physical address
0xFFFFFFF0(the top of the 4 GB space, near the BIOS ROM). ARM Cortex-M cores read the initial PC from a reset vector stored at address0x00000004. Some 6502-family chips read the reset vector from0xFFFC/0xFFFD.
In hardware, “reset” is a wire to the PC’s asynchronous-clear (or synchronous-load) input. When asserted, the PC is forced to the boot value regardless of the mux.
Worked Example: A Five-Instruction Loop
Consider the following pseudo-program in an 8-bit CPU with :
0x00: LDI R0, 5 ; load 5 into R0
0x01: LDI R1, 0 ; load 0 into R1
0x02: ADD R1, R0 ; R1 = R1 + R0
0x03: DEC R0 ; R0 = R0 - 1, sets Z flag if R0 hits 0
0x04: BNZ 0x02 ; if Z == 0, branch back to 0x02
0x05: HLT
Stepping through PC values:
| Cycle | PC at fetch | Instruction | After fetch | Branch taken? | Next PC |
|---|---|---|---|---|---|
| 1 | 0x00 | LDI R0, 5 | PC + 1 = 0x01 | n/a | 0x01 |
| 2 | 0x01 | LDI R1, 0 | PC + 1 = 0x02 | n/a | 0x02 |
| 3 | 0x02 | ADD R1, R0 | PC + 1 = 0x03 | n/a | 0x03 |
| 4 | 0x03 | DEC R0 (R0=4, Z=0) | PC + 1 = 0x04 | n/a | 0x04 |
| 5 | 0x04 | BNZ 0x02 | PC + 1 = 0x05; target = 0x02 | yes (Z=0) | 0x02 |
| 6 | 0x02 | ADD R1, R0 | … | … | … |
At cycle 5 the mux selects the branch target (0x02) instead of the sequential next address (0x05) because the Z flag is clear. The same one-bit decision drives every loop, every if, and every function call in every CPU on Earth.
Common Pitfalls
- Off-by-one in PC + W. A common simulator bug is to use
PC + 1for an architecture whose instructions are 4 bytes wide. The CPU will fetch the second byte of every instruction as the start of the next instruction and behave as if every program were random data. - Updating PC before the fetch reads memory. The PC must hold the current address while memory is being read. Update on the clock edge, not combinationally.
- Forgetting the reset. A PC with no reset wire boots to whatever value the flip-flops randomize to on power-up. Always tie the reset.
- Branching to an unaligned address. On RISC architectures with 4-byte instructions, jumping to address
0x1003instead of0x1004is a fault. The PC’s low bits should be tied to zero or a fault should be raised.
Build It in DigiSim
Open the Sequential Instruction Executor template. You will see:
- A
PROGRAM_COUNTER_8BITblock at the top of the fetch path. - An adder hard-wired to add
1on every clock cycle. - A mux that selects between
PC + 1and a target address driven by the control unit. - A reset switch that forces the PC to
0x00.
Step the clock manually and watch the PC advance. Then load a program containing a branch instruction and observe the moment the mux flips from “sequential” to “target” — that single bit is control flow in its purest form.
Where the PC Sits in the Bigger Picture
The program counter is one node in the larger CPU data flow. The instruction it points at gets latched into the instruction register, where the decode stage cracks it into opcode and operand fields — covered in the upcoming Instruction Register and the Decode Stage post. The result of arithmetic operations sets the flags the conditional branches consult — covered in the upcoming CPU Flags Register post.
Read the Fetch-Decode-Execute case study for the full loop, then load the Sequential Instruction Executor template and single-step through a branch instruction. Watching the PC jump non-sequentially for the first time is the moment the abstraction stops being a diagram and starts being a circuit.