Designing a trivial CPU, part 3: high level hardware

Sat, Oct 4, 2025

Go to the first post in this series.

Introduction

In the first two parts of this series, I talked about picking the instruction set for my trivial processor. So far, however, I haven’t mentioned how the processor actually works to execute those instructions. In the next few posts, I’m going to talk about the design and control of the circuitry in the processor.

The Loop

As I mentioned in part 1, the texts I’ve read have omitted several important steps in describing a CPU, and this is the first one of those. The CPU works by the interaction of four principal components:

the registers
the ALU
the bus(es)
the instruction decoder

Overview of main CPU components

The registers store state, and are the only component which does this. Registers are designed so that their input and output are decoupled: changing the input data to a register doesn’t change the output data, until the register is both enabled for write, and a clock tick happens.

The ALU (arithmetic and logic unit) operates using only combinatorial logic¹, and will take one or more inputs, and generate one or more outputs. In this design, I’m including the memory and I/O interfaces as part of the ALU (because I’m designing for synchronous memory).

Between the registers and the ALU is a bus (or several), which serves to move data from the registers to the ALU inputs, and from the ALU outputs to the registers.

The instruction decoder takes an instruction loaded from memory, and turns that into a set of control signals which connect registers to the ALU, and select options within the ALU.

Moving data with buses

Consider an addition operation: ADD x, y. This takes x and y, and adds them together, and puts the result into x. So we need to move x and y to the ALU, and then move the result back to x. But a bus can only move one value at a time. There’s some trade-offs we have to make here: on one extreme, we could have three buses (one for each parameter, and one for the result). At the other extreme, we could have a single bus, and use registers to store the parameter values. The first option is faster – it only takes one clock cycle to feed all the parameters to the ALU and get the result out again. The second option is conceptually simpler, but slower – it needs three clock cycles to set the parameters and retrieve the result. It’s also more expensive, because we need several extra registers.

In practice, I’m going to go with something in between. It’s going to be helpful for other reasons to have a single temporary register, t. If we do that, we can hook it directly into one input parameter of the adder, and into the address lines of the memory/IO unit, and then we only need two buses: one input and one output.

ALU Operations

What goes into the ALU? Well, we need to be able to add and subtract things – that’s the adder unit (which can do both operations, with small modifications). We need a simple shifter to handle the right-shifts. And we need something to handle getting data in and out of memory and I/O – the memory unit. The fourth thing, which seems relatively pointless but turns out to be quite useful, is a passthrough, which just connects it input parameter to its output. We can use the passthrough to move data directly between registers without modifying it.

Other registers: PC and PSR

We need two more full registers – the program counter and the instruction register – and a few more bits of state – the processor status register.

Program counter

The program counter, pc, is a register which holds the address of the next instruction to execute, and which is normally incremented on every instruction. It may also be incremented during processing an instruction, if the instruction takes an 8-bit parameter (such as ADD x, #5 or OUT (k), x). Finally, pc may be set directly by an instruction – in our case, by a JMP instruction.

Instruction register

The instruction register, ins, is simply a copy of the currently-executing opcode. This is set at the beginning of executing each instruction, and is used by the instruction decoder to control what gets done for that instruction.

Processor status register

The processor status register, psr, consists of three bits of data, which indicate whether the result of the last instruction was negative (the N bit), zero (the Z bit), or resulted in a carry (the C bit). The C bit is also used as input to the ADC, SBC and RRC instructions. This isn’t connected to the buses, but instead has direct connections to and from the adder and shifter.

Overview

Putting all of this together, we end up with a plan that looks like the picture below:

Detail of CPU design

The seven main registers (of which four are accessible to the programmer) are connected to a bus, which feeds a value from one of the registers into into one of four processing units. The processing units feed a result back into the registers. There are special connections from the t register to the adder and the memory/IO unit. There is a separate unit for incrementing the pc register, and another for updating the psr bits.

“Above” this structure, and not shown here, the instruction decoder reads from the ins register and sends control signals to all of the components to ensure that they do things in the right order.

Conclusion

In this post, we’ve seen what the high-level components of the CPU are, and how they relate to each other. In the next post, I’ll start showing how the architecture in this post can be controlled, step by step, to execute the instructions we laid out in the first two parts, and go through some of the options for implementing the instruction decoder.

Well, you could have an ALU that uses clock ticks and state registers internally for some operations, like multiplication. But I’m trying to keep things simple. ↩︎