## Example: `addi`
```mips
addi $8, $0, 5
```
001000 00000 01000 0000000000000101
- **WB:** result from ALU (5), written to register $8 (specified by rt)
- RegWrite set to 1
- RegDst set to 0
- MemtoReg set to 0
# Supporting More Instructions
## Load Instruction
Let's describe what happens for a load operation like
```mips
lw $8, 4($9)
```
100011 01001 01000 0000000000000100
- IF:
- ID:
- EX:
- MEM:
- WB:
## Exercise: Store Instruction
Describe what happens for a store operation like
```mips
sw $8, 4($9)
```
101011 01001 01000 0000000000000100
- IF:
- ID:
- EX:
- MEM:
- WB:
## Exercise: Branches
Describe what happens for a branch instruction like
```mips
beq $8, $9, 5 #jump ahead 5 instructions
```
000100 01000 01001 0000000000000101
- IF:
- ID:
- EX:
- MEM:
- WB:
# Performance Issues
## Performance Issues
- Longest delay determines clock period
-
Some stages of the datapath are idle waiting for others to finish
-
Can improve performance by **pipelining**
# Pipelining
## Pipeline Analogy
- Suppose you need to do four loads of laundry
-
Each load of laundry needs to be
1. Washed via the washing machine
2. Dried via the dryer
3. Folded
4. Put away in the closet
-
For simplicity, assume that each task takes 30 mins
## Pipeline Analogy
- How long does it take to complete four loads?
-
One approach uses only one stage at a time and does nothing in parallel:

-
Notice that the washer is unused 3/4 of the time
## Pipeline Analogy
- Another approach is harnessing parallelism by running independent stages simultaneously

-
How much of a speedup does this approach give us?
+
$8/3.5 = 2.3\times$ speedup
+
$2n/0.5n = 4\times$ speedup if running continuously
## Pipelined Datapath

## Pipelined Datapath
- Five stages:
1. **IF**: Instruction Fetch
2. **ID**: Instruction Decode
3. **EX**: Execute
4. **MEM**: Memory access
5. **WB**: Write back
## Pipeline Performance
- Assume time for stages is:
+ `$100\text{ps}$` for register read/write
+ `$200\text{ps}$` for other stages

## Without a Pipeline

- Why must the clock be set to `$800\text{ps}$` when some instructions like `beq` could be completed in `$500\text{ps}$`?
+
Clock speed is limited by **slowest** instruction: `lw`
## With a Pipeline

-
How much of a speedup does this approach give us?
+
`$2400/1400 = 1.7\times$` speedup
+
`$800n/200n = 4\times$` if running continuously
## Pipeline Performance
- Does using a pipeline increase the efficiency of executing **individual** instructions?
+
No, it slows them down from `$800\text{ps}$` to `$1000\text{ps}$`
+
Performance benefits come from increased **throughput** do to the parallelism
## Why MIPS is Good for Pipelining
- All MIPS instructions are the **same length**
+ Easy to fetch instruction in cycle 1
+ Easy to decode instruction in cycle 2
-
MIPS has only **a few instruction formats**
+ Registers will always be in same location
+ Easy to decode instructions
# Hazards
## Hazards
- Up until now, we have pretended that each instruction is **independent** of the others and that there are no conflicts
-
In reality, instructions often depends on previous ones, which may cause naive pipelining to fail
### Exercise
```mips
add $s0, $t0, $t1
sub $t2, $s0, $t3
```
- What stage does `add` write the result of `$s0` into the register file?
- What stage does `sub` read from `$s0`?
- Why is this a problem?
- Can you think of any ways to fix this?