Booth Multiplier Explained (With Examples)

Denny

May 7, 2026 8 min read

A multi-bit Booth multiplier with shift register and accumulator stages, partial products being summed.

TL;DR: Booth’s algorithm multiplies signed binary numbers in two’s complement by scanning the multiplier two bits at a time and choosing one of three actions per step: add the multiplicand, subtract it, or do nothing. It handles signed values natively and skips runs of identical bits, yielding fewer operations than naive shift-and-add for many real-world inputs.

The textbook way to multiply two binary numbers is shift-and-add: for each 1 in the multiplier, add a left-shifted copy of the multiplicand to a running sum. It works perfectly for unsigned values and produces a correct $n \times n \to 2n$ -bit product after $n$ iterations. The catch: it doesn’t know what to do when the operands are signed.

For two’s complement multiplication you’d have to detect signs, take absolute values, multiply, and re-apply the sign — a procedure with several edge cases (especially the most negative value, whose absolute value doesn’t fit in the same width). Andrew Booth’s 1951 algorithm sidesteps the entire problem by treating runs of consecutive 1s as a single subtract-then-add operation. It works on two’s complement values directly and, as a bonus, often executes faster.

Why Shift-and-Add Falls Short

Take the unsigned multiplication $13 \times 11$ in 4-bit binary:

       1101  (multiplicand A = 13)
     × 1011  (multiplier   B = 11)
     ------
       1101  (B0=1, add A)
      11010  (B1=1, add A << 1)
     000000  (B2=0, skip)
   1101000   (B3=1, add A << 3)
   --------
   10001111  (= 143)

Three additions for three 1-bits in $B$ . Now consider $13 \times 15$ where $B = 1111$ — four additions, all of $A$ shifted by 0, 1, 2, 3. As $B$ ‘s population count grows, so does the number of additions.

Booth observed: $1111 = 10000 - 1$ . So $A \times 1111 = (A \ll 4) - A$ . One left-shift plus one subtract instead of four additions. For runs of 1s of length $k$ , the savings are $k - 2$ operations. Modern CPUs compute multiplication at billions of operations per second; the savings compound.

Booth’s Algorithm

Pad the multiplier $B$ on the right with an implicit zero, giving a sequence of $n+1$ bits. Scan adjacent pairs $(B_i, B_{i-1})$ from least to most significant. At each step, examine the pair and accumulate one of three actions on the partial product $P$ :

$B_i$	$B_{i-1}$	Action	Meaning
0	0	$P \leftarrow P$	Inside a run of zeros
0	1	$P \leftarrow P + (A \ll i)$	End of a run of ones
1	0	$P \leftarrow P - (A \ll i)$	Start of a run of ones
1	1	$P \leftarrow P$	Inside a run of ones

After all $n$ bit pairs are processed, $P$ holds the signed $2n$ -bit product.

The intuition: when you transition from 0 to 1 (reading right to left, so $B_{i-1}=0, B_i=1$ ), you’ve started a run of 1s. Booth subtracts $A$ at this position — which is equivalent to adding $-A$ — anticipating that the run will continue. When the run ends (transition from 1 to 0, so $B_{i-1}=1, B_i=0$ ), Booth adds back the current shifted $A$ , completing the “add big-shift, subtract small-shift” pair that replaces the whole run of additions.

The standard implementation runs the recurrence iteratively. State variables: a $2n$ -bit accumulator $P$ , the multiplicand $A$ , the multiplier $B$ , and an extra “Booth bit” $B_{-1}$ initially 0. Each iteration:

Examine $(B_0, B_{-1})$ .
Apply add, subtract, or no-op to the high $n$ bits of $P$ .
Arithmetic-shift $P$ right by one position; the bit shifted out of $P$ ‘s LSB becomes the new $B_{n-1}$ , the bit shifted out of $B_0$ becomes the new $B_{-1}$ .

After $n$ iterations, $P$ concatenated with $B$ is the signed $2n$ -bit product.

Worked Example: -3 x 5 in 4-Bit Two’s Complement

Operands: $A = -3 = 1101_2$ , $B = 5 = 0101_2$ . Expected product: $-15 = 11110001_2$ (8-bit two’s complement).

Initial state. Use 4-bit accumulator $P$ and 4-bit $B$ with a Booth bit appended:

Step	P	B	$B_{-1}$	Action
0	0000	0101	0	(initial)
1	$P + (-A) = 0000 + 0011 = 0011$	0101	0	$(B_0, B_{-1}) = (1, 0)$ , subtract A
	shift right (arithmetic)
	0001	1010	1	shifted
2	$P + A = 0001 + 1101 = 1110$	1010	1	$(B_0, B_{-1}) = (0, 1)$ , add A
	shift right
	1111	0101	0	shifted
3	$P + (-A) = 1111 + 0011 = 0010$	0101	0	$(B_0, B_{-1}) = (1, 0)$ , subtract A
	shift right
	0001	0010	1	shifted
4	$P + A = 0001 + 1101 = 1110$	0010	1	$(B_0, B_{-1}) = (0, 1)$ , add A
	shift right
	1111	0001	0	shifted

Final result: $P || B = 11110001_2 = -15_{10}$ . The product is correct, signed, and computed in four steps.

A couple of details worth tracing:

Subtract A here means add $-A = 0011$ (the two’s complement of $1101$ ). Two’s complement subtraction is just addition of the negated operand — the arithmetic foundation underlying all of Booth.
Arithmetic right shift preserves the sign bit. $P = 1110$ shifts to $1111$ , not $0111$ . This is essential for the algorithm to produce a correct signed result.
The carry from each add/subtract is discarded. Only the low $n$ bits of $P$ are retained — the result fits because of two’s complement’s modular arithmetic property.

Booth Hardware

A minimal Booth multiplier looks like:

A multiplicand register holding $A$ (and a precomputed $-A$ , or a circuit to negate on the fly).
A product accumulator holding $P$ (high half) and $B$ (low half) — typically implemented as one wide shift register that holds $\{P, B, B_{-1}\}$ and shifts right by one each step.
An adder/subtractor — one ALU operating on the high half of the accumulator and either $A$ or $-A$ .
A 2-bit Booth decoder examining $(B_0, B_{-1})$ and producing control signals: enable add, enable subtract, or no-op.
A counter to drive the FSM through $n$ iterations.

The control unit cycles: decode, conditionally accumulate, shift. After $n$ shifts, the answer sits in the accumulator. A textbook implementation needs $n$ clock cycles per multiplication; pipelined or array implementations process all bits in parallel and complete in one or a few cycles.

Modified Booth (Radix-4)

Standard Booth examines two bits and processes one bit per iteration. Modified Booth (also called Radix-4 Booth or Booth-2) examines three bits and processes two per iteration, halving the number of iterations. The encoding:

$B_{i+1}$	$B_i$	$B_{i-1}$	Action
0	0	0	$0$
0	0	1	$+A$
0	1	0	$+A$
0	1	1	$+2A$
1	0	0	$-2A$
1	0	1	$-A$
1	1	0	$-A$
1	1	1	$0$

The set of multiples needed is $\{0, \pm A, \pm 2A\}$ . Multiplication by 2 is a left shift, so the only “real” operation per iteration is an add or subtract — no general multiplication is required. For a 32-bit operand, modified Booth runs in 16 iterations instead of 32, with the same per-step delay. This is the algorithm used inside virtually every commercial CPU multiplier.

Higher radices exist (Radix-8, Radix-16) but the multiples needed grow non-trivially — $3A$ doesn’t reduce to a simple shift — so they’re rarely worth the complexity.

Comparison: Shift-and-Add vs Booth vs Modified Booth

Algorithm	Iterations (n-bit)	Signed support	Operations per iter
Shift-and-add (unsigned)	$n$	No	0 or 1 add
Two’s complement shift-and-add	$n$	Yes (with sign extension)	0 or 1 add, last is subtract
Booth	$n$	Yes	0, 1 add, or 1 subtract
Modified Booth	$n/2$	Yes	0, 1 add, or 1 subtract (one of $\pm A, \pm 2A$ )

Modified Booth’s halving of iterations, combined with native signed support, makes it the standard for fixed-point integer multiplication in CPUs.

Beyond Sequential Multipliers: Wallace and Dadda Trees

Modern high-throughput multipliers don’t iterate at all. They generate all $n$ partial products simultaneously using AND gates (or, for signed, modified Booth encoders), then sum them with a tree of carry-save adders. Wallace and Dadda trees are the classic structures, both achieving $O(\log n)$ depth. The final reduction uses a carry-lookahead adder to produce the final sum from the carry-save outputs. The whole multiplier completes in a single cycle at modern clock rates.

But these massively parallel designs still use modified Booth at the partial-product generation stage to halve the number of partial products. The algorithmic insight from 1951 is still pulling its weight in 2026.

Building a Booth Multiplier in DigiSim

A working iterative Booth multiplier needs an FSM controller, a 4-bit register acting as the accumulator, a shift register holding the multiplier with the Booth bit appended, an adder/subtractor, and a counter to terminate after $n$ iterations.

A concrete starting point: open the 4-bit shift register SISO template and use it as the basis for the multiplier register. Add the ALU component for the conditional add/subtract, wire in a small FSM for the controller, and you have a complete sequential Booth multiplier. The simulator’s stepper lets you single-step through the worked example above and watch each iteration produce the same intermediate states the table shows.

Common Pitfalls

Forgetting the Booth bit. $B_{-1}$ starts at 0 and is essential for the very first iteration’s pair-bit decision. Drop it and the algorithm misfires on the LSB.
Logical vs arithmetic right shift. The shift on $P$ must be arithmetic (sign-extending). Logical shift produces a wrong result for negative accumulators.
Sign extension on the partial sum. Some implementations extend $A$ and $-A$ to $2n$ bits before adding to $P$ and skip the arithmetic shift trick. Either approach works; mixing them silently breaks correctness.
Modified Booth radix confusion. Don’t conflate “examines 3 bits” with “processes 3 bits per step.” Modified Booth examines a 3-bit overlapping window and processes 2 bits per step.

What’s Next

The next post in this series, Two’s Complement Explained: Signed Binary Arithmetic, covers the signed representation that makes Booth’s subtract-as-negate trick possible — and explains why $-A$ in two’s complement is just “invert all bits and add one.” Following that, Carry-Lookahead Adder: Faster Than Ripple-Carry attacks the addition speed problem with a different algorithmic angle.

To experiment with sequential multiplication in the simulator, open the 4-bit shift register SISO template, extend it into an 8-bit accumulator, attach an ALU for the conditional add/subtract, and step through the $-3 \times 5$ trace from this post. Watching the accumulator evolve cycle-by-cycle is the fastest way to internalize what Booth’s algorithm actually does.