Booth Multiplier Explained (With Examples)
TL;DR: Booth’s algorithm multiplies signed binary numbers in two’s complement by scanning the multiplier two bits at a time and choosing one of three actions per step: add the multiplicand, subtract it, or do nothing. It handles signed values natively and skips runs of identical bits, yielding fewer operations than naive shift-and-add for many real-world inputs.
The textbook way to multiply two binary numbers is shift-and-add: for each 1 in the multiplier, add a left-shifted copy of the multiplicand to a running sum. It works perfectly for unsigned values and produces a correct -bit product after iterations. The catch: it doesn’t know what to do when the operands are signed.
For two’s complement multiplication you’d have to detect signs, take absolute values, multiply, and re-apply the sign — a procedure with several edge cases (especially the most negative value, whose absolute value doesn’t fit in the same width). Andrew Booth’s 1951 algorithm sidesteps the entire problem by treating runs of consecutive 1s as a single subtract-then-add operation. It works on two’s complement values directly and, as a bonus, often executes faster.
Why Shift-and-Add Falls Short
Take the unsigned multiplication in 4-bit binary:
1101 (multiplicand A = 13)
× 1011 (multiplier B = 11)
------
1101 (B0=1, add A)
11010 (B1=1, add A << 1)
000000 (B2=0, skip)
1101000 (B3=1, add A << 3)
--------
10001111 (= 143)
Three additions for three 1-bits in . Now consider where — four additions, all of shifted by 0, 1, 2, 3. As ‘s population count grows, so does the number of additions.
Booth observed: . So . One left-shift plus one subtract instead of four additions. For runs of 1s of length , the savings are operations. Modern CPUs compute multiplication at billions of operations per second; the savings compound.
Booth’s Algorithm
Pad the multiplier on the right with an implicit zero, giving a sequence of bits. Scan adjacent pairs from least to most significant. At each step, examine the pair and accumulate one of three actions on the partial product :
| Action | Meaning | ||
|---|---|---|---|
| 0 | 0 | Inside a run of zeros | |
| 0 | 1 | End of a run of ones | |
| 1 | 0 | Start of a run of ones | |
| 1 | 1 | Inside a run of ones |
After all bit pairs are processed, holds the signed -bit product.
The intuition: when you transition from 0 to 1 (reading right to left, so ), you’ve started a run of 1s. Booth subtracts at this position — which is equivalent to adding — anticipating that the run will continue. When the run ends (transition from 1 to 0, so ), Booth adds back the current shifted , completing the “add big-shift, subtract small-shift” pair that replaces the whole run of additions.
The standard implementation runs the recurrence iteratively. State variables: a -bit accumulator , the multiplicand , the multiplier , and an extra “Booth bit” initially 0. Each iteration:
- Examine .
- Apply add, subtract, or no-op to the high bits of .
- Arithmetic-shift right by one position; the bit shifted out of ‘s LSB becomes the new , the bit shifted out of becomes the new .
After iterations, concatenated with is the signed -bit product.
Worked Example: -3 x 5 in 4-Bit Two’s Complement
Operands: , . Expected product: (8-bit two’s complement).
Initial state. Use 4-bit accumulator and 4-bit with a Booth bit appended:
| Step | P | B | Action | |
|---|---|---|---|---|
| 0 | 0000 | 0101 | 0 | (initial) |
| 1 | 0101 | 0 | , subtract A | |
| shift right (arithmetic) | ||||
| 0001 | 1010 | 1 | shifted | |
| 2 | 1010 | 1 | , add A | |
| shift right | ||||
| 1111 | 0101 | 0 | shifted | |
| 3 | 0101 | 0 | , subtract A | |
| shift right | ||||
| 0001 | 0010 | 1 | shifted | |
| 4 | 0010 | 1 | , add A | |
| shift right | ||||
| 1111 | 0001 | 0 | shifted |
Final result: . The product is correct, signed, and computed in four steps.
A couple of details worth tracing:
- Subtract A here means add (the two’s complement of ). Two’s complement subtraction is just addition of the negated operand — the arithmetic foundation underlying all of Booth.
- Arithmetic right shift preserves the sign bit. shifts to , not . This is essential for the algorithm to produce a correct signed result.
- The carry from each add/subtract is discarded. Only the low bits of are retained — the result fits because of two’s complement’s modular arithmetic property.
Booth Hardware
A minimal Booth multiplier looks like:
- A multiplicand register holding (and a precomputed , or a circuit to negate on the fly).
- A product accumulator holding (high half) and (low half) — typically implemented as one wide shift register that holds and shifts right by one each step.
- An adder/subtractor — one ALU operating on the high half of the accumulator and either or .
- A 2-bit Booth decoder examining and producing control signals: enable add, enable subtract, or no-op.
- A counter to drive the FSM through iterations.
The control unit cycles: decode, conditionally accumulate, shift. After shifts, the answer sits in the accumulator. A textbook implementation needs clock cycles per multiplication; pipelined or array implementations process all bits in parallel and complete in one or a few cycles.
Modified Booth (Radix-4)
Standard Booth examines two bits and processes one bit per iteration. Modified Booth (also called Radix-4 Booth or Booth-2) examines three bits and processes two per iteration, halving the number of iterations. The encoding:
| Action | |||
|---|---|---|---|
| 0 | 0 | 0 | |
| 0 | 0 | 1 | |
| 0 | 1 | 0 | |
| 0 | 1 | 1 | |
| 1 | 0 | 0 | |
| 1 | 0 | 1 | |
| 1 | 1 | 0 | |
| 1 | 1 | 1 |
The set of multiples needed is . Multiplication by 2 is a left shift, so the only “real” operation per iteration is an add or subtract — no general multiplication is required. For a 32-bit operand, modified Booth runs in 16 iterations instead of 32, with the same per-step delay. This is the algorithm used inside virtually every commercial CPU multiplier.
Higher radices exist (Radix-8, Radix-16) but the multiples needed grow non-trivially — doesn’t reduce to a simple shift — so they’re rarely worth the complexity.
Comparison: Shift-and-Add vs Booth vs Modified Booth
| Algorithm | Iterations (n-bit) | Signed support | Operations per iter |
|---|---|---|---|
| Shift-and-add (unsigned) | No | 0 or 1 add | |
| Two’s complement shift-and-add | Yes (with sign extension) | 0 or 1 add, last is subtract | |
| Booth | Yes | 0, 1 add, or 1 subtract | |
| Modified Booth | Yes | 0, 1 add, or 1 subtract (one of ) |
Modified Booth’s halving of iterations, combined with native signed support, makes it the standard for fixed-point integer multiplication in CPUs.
Beyond Sequential Multipliers: Wallace and Dadda Trees
Modern high-throughput multipliers don’t iterate at all. They generate all partial products simultaneously using AND gates (or, for signed, modified Booth encoders), then sum them with a tree of carry-save adders. Wallace and Dadda trees are the classic structures, both achieving depth. The final reduction uses a carry-lookahead adder to produce the final sum from the carry-save outputs. The whole multiplier completes in a single cycle at modern clock rates.
But these massively parallel designs still use modified Booth at the partial-product generation stage to halve the number of partial products. The algorithmic insight from 1951 is still pulling its weight in 2026.
Building a Booth Multiplier in DigiSim
A working iterative Booth multiplier needs an FSM controller, a 4-bit register acting as the accumulator, a shift register holding the multiplier with the Booth bit appended, an adder/subtractor, and a counter to terminate after iterations.
A concrete starting point: open the 4-bit shift register SISO template and use it as the basis for the multiplier register. Add the ALU component for the conditional add/subtract, wire in a small FSM for the controller, and you have a complete sequential Booth multiplier. The simulator’s stepper lets you single-step through the worked example above and watch each iteration produce the same intermediate states the table shows.
Common Pitfalls
- Forgetting the Booth bit. starts at 0 and is essential for the very first iteration’s pair-bit decision. Drop it and the algorithm misfires on the LSB.
- Logical vs arithmetic right shift. The shift on must be arithmetic (sign-extending). Logical shift produces a wrong result for negative accumulators.
- Sign extension on the partial sum. Some implementations extend and to bits before adding to and skip the arithmetic shift trick. Either approach works; mixing them silently breaks correctness.
- Modified Booth radix confusion. Don’t conflate “examines 3 bits” with “processes 3 bits per step.” Modified Booth examines a 3-bit overlapping window and processes 2 bits per step.
What’s Next
The next post in this series, Two’s Complement Explained: Signed Binary Arithmetic, covers the signed representation that makes Booth’s subtract-as-negate trick possible — and explains why in two’s complement is just “invert all bits and add one.” Following that, Carry-Lookahead Adder: Faster Than Ripple-Carry attacks the addition speed problem with a different algorithmic angle.
To experiment with sequential multiplication in the simulator, open the 4-bit shift register SISO template, extend it into an 8-bit accumulator, attach an ALU for the conditional add/subtract, and step through the trace from this post. Watching the accumulator evolve cycle-by-cycle is the fastest way to internalize what Booth’s algorithm actually does.