您将学到什么

  • Build a 4-bit difference detector using parallel per-bit XOR.
  • Reduce per-bit results to a single "any different" signal via OR.
  • Use an oscilloscope to visualise multiple bit signals in parallel.
  • Recognise the pattern: bitwise parallel XOR + log-depth reduction = equality comparator.
  • Apply this pattern to CPU compares, cache tag matching, and ECC.

工作原理

This circuit extends the single-bit difference detector to multiple bits, with an oscilloscope to visualize the per-bit and aggregate signals over time. The pattern is simple but powerful:

- For each bit pair (Ai, Bi), an XOR gate produces Diff_i = Ai ⊕ Bi. - An OR-reduction across all Diff_i produces Any_Diff = OR of all per-bit differences. - Optionally, AND-reducing inverted Diff_i produces All_Equal = AND of all per-bit XNORs (zero only when at least one bit differs).

The oscilloscope traces show how each XOR responds independently and how the OR-reduction lights up whenever any bit changes. This is exactly the topology of an N-bit equality comparator inside a CPU.

Key property: XOR-based comparison is bitwise parallel. Every bit's comparison happens at the same time — there's no carry chain like in addition. The slowest part is the OR (or AND) tree at the end, which is logarithmic in width.

For 4-bit comparison, the depth is 1 (XOR layer) + ceil(log2(4)) (OR tree) = 3 gate delays — independent of bit width by a logarithmic factor.

真值表

Showing representative 4-bit comparisons. Diff is OR of all per-bit XORs; All-Equal is its inverse.

输入 输出
A3..A0B3..B0 DiffAll-Equal
00 01 0000 vs 0000 — match
01 10 0000 vs 0001 — differ in bit 0
10 10 Any single-bit difference fires Diff
11 01 0001 vs 0001 — match

布尔表达式

Diffi=AiBi\text{Diff}_i = A_i \oplus B_i

Per-bit difference — one XOR per bit.

Any_Diff=i=0N1Diffi\text{Any\_Diff} = \bigvee_{i=0}^{N-1} \text{Diff}_i

OR-reduction across all per-bit differences. High if any bit differs.

All_Equal=i=0N1Diffi=i=0N1(AiBi)\text{All\_Equal} = \bigwedge_{i=0}^{N-1} \overline{\text{Diff}_i} = \bigwedge_{i=0}^{N-1} (A_i \odot B_i)

AND-reduction of per-bit XNORs. High only when every bit matches.

逐步尝试

在上方嵌入式电路中设置输入,然后阅读预期结果并验证。

  1. 1
    A = 0000 B = 0000
    预期: Diff = 0
    您将看到: All A and B switches match (both numbers 0). Every XOR is 0; the OR-reduce is 0; All-Equal lights up.
  2. 2
    A = 0001 B = 0000
    预期: Diff = 1
    您将看到: Flip just A0. The XOR for bit 0 fires; the OR-reduce sees a 1 and fires Diff. All-Equal goes dark.
  3. 3
    A = 1010 B = 1010
    预期: Diff = 0
    您将看到: Both switches set to 10 (decimal) — every bit matches, so the comparator declares equality.
  4. 4
    A = 0101 B = 1010
    预期: Diff = 1
    您将看到: Inputs are bit-by-bit inverses. Every XOR fires; Diff is high, All-Equal is low. Maximum mismatch.

使用的组件

实际应用

CPU compare instructions. "Are these two registers equal?" is implemented as bitwise XNOR followed by an AND-tree. Sets the zero flag in one cycle.

Cache tag matching. When a CPU looks up an address in cache, the address tag is XOR-compared with stored tags. A match (zero XOR result) means cache hit.

Branch prediction comparison. Indirect branches compare predicted vs. actual targets bitwise — the XOR result drives a misprediction signal.

Memory parity / ECC. Per-byte XOR computes a parity bit; per-codeword XOR over a syndrome-mask matrix computes ECC syndromes for single-error detection / correction.

Network checksum offload. Hardware NICs compute Internet Protocol checksums using XOR (and addition) trees over packet bytes. The XOR portion runs in a single clock for entire packets.

常见问题

Why XOR per bit and then OR? Couldn't I subtract and check for zero?
Subtraction would also work but requires an N-bit adder (with carries) — much slower and bigger. XOR is bitwise parallel: every bit's compare is independent, no carry chain. For an equality test, XOR + reduction is much faster than subtraction.
What's the gate-delay depth of an N-bit comparator?
1 XOR delay (parallel per bit) + ceil(log2 N) OR-tree delay. For 4 bits: 1 + 2 = 3 gate delays. For 64 bits: 1 + 6 = 7 gate delays. Comparators stay fast even for very wide buses.
Can I extend to less-than or greater-than comparisons?
Yes but it's harder — magnitude comparison needs ordered logic, not just bitwise. Typically subtract A − B and check the sign bit, or use a chain of priority logic from MSB downward. A pure XOR tree only checks equality.
What's an oscilloscope showing in this circuit?
Multi-channel signal traces over time. Each XOR's output is a separate channel; the OR-reduce is another. Watching the traces while you flip switches shows how XORs respond independently and how the OR fires on the union.
Why is this faster than a sequential bit-by-bit compare?
Sequential compare needs N clock cycles; this combinational compare resolves in O(log N) gate delays — usually well under one clock cycle. Hardware compare is typically combinational because the answer is needed within a cycle.

继续学习