# Pipeline Hazards And Cache --- CS 130 // 2022-11-10 # Announcements ## Installing stuff for C programming Before you come to class next week: [install guide](../../resources/installing-visual-studio-code) - install Visual Studio Code (IDE for writing code) - install C/C++ extensions for VSC - install a C compiler - try compiling a C program - do your best to troubleshoot issues - bring any remaining issues to class ## Exam 3 has been posted - Due 11/22/2022 (Tuesday before Thanksgiving) - I'm giving you a lot of time - please plan ahead and give yourself time to complete it - I will cancel class some time between now and then to give you time to work on the exam - I still don't know if/when I have to report for jury duty - I will let you know *which day* we have off as soon as I can ## Office Hour Adjustments I need to cancel office hours *today* and *next Tuesday* - both cancelations are for very important Drake business - I'm very sorry! - I will try to help as much as I can via email/Teams # Review ## Review Discussion - what is pipelining? - what is a pipeline hazard? - what are the different types of pipeline hazards? - what are some ways to deal with pipeline hazards? # Pipeline Hazard Exercises ## Data Hazard Exercise - Consider the following MIPS code: ```mips lw $t0, 40($a3) add $t6, $t0, $t2 sw $t6, 40($a3) ``` - Assuming there is no forwarding implemented, are any stalls necessary? - How many clock cycles are required to execute these three lines of code without forwarding? ## Data Hazard Exercise - Consider the following MIPS code: ```mips lw $t0, 40($a3) add $t6, $t0, $t2 sw $t6, 40($a3) ``` - Assuming there IS forwarding implemented, are any stalls necessary? - How many clock cycles are required to execute these three lines of code with forwarding? # Pipelined CPU Design ## Pipelined Control Complete
### Pipelined Architecture with Hazard Detection and Forwarding
# Caches ## Memory Organization - When a program uses memory, it tends to use it in *predictable ways* - As a result, it is possible to speed up memory usage dramatically by creating a **memory hierarchy** + Register file is small but it's ridiculously fast + SRAM is larger but slower + DRAM is larger still but even slower + Hard disks are HUGE but also the slowest ## Terminology - **Cache:** + An auxiliary memory from which high-speed retrieval is possible - **Block:** + A minimum unit of information that can either be present or not present in the cache - **Hit:** + CPU finds what it is looking for in the cache - **Miss:** + CPU doesn't find what it's looking for in the cache ## Designing a Cache - Having a hierarchy of memories to speed up our computations is great, but we also face several design challenges - If we are looking for a value in memory address `x`, how do we know if it is already in the cache? ## Direct Mapped Cache - One idea is to use a **direct mapped cache** where the address `$x_n$` tells us where to look in the cache - If there are `n` slots in the cache, then we look for `x` in the `x % n` index of the cache ## Direct Mapped Cache  ## Direct Mapped Cache - If the number of blocks in the cache is `$n = 2^k$`, then `(x % n)` is simply the last `k` bits of `x` + Makes it extremely efficient to find where a block is in the cache - Since multiple blocks may have the same index in the cache, how do we know if the block of memory there is the one we're looking for? + Include a **tag** that uniquely identifies the block ## Direct Mapped Cache - Some parts of the cache can be empty and/or underutilized if their index, by chance, doesn't pop up as often - How can we improve utilization? ## Fully Associative Cache - The "extreme" alternative to direct mapped caching is **fully associative** caching + Any block can be stored in *any* index of the cache --- - **Advantage**: Every spot in the cache will be used, and therefore less cache misses will occur - **Disadvantage**: We need to search the entire cache every time, so the hit time will increase ## Set-Associative Cache - The compromise between these approaches is the **set-associative** cache + Blocks are *grouped* into sets of `n` blocks + Block number determines which set - Still requires searching through all `n` blocks in a set - Can be tuned to have a decent balance between hit rate and hit time
## Handling a Cache Miss 1. Check the cache for a memory address + Results in a miss 2. Fetch corresponding block from RAM or disk + Wait for block to be retrieved (stall) + Write block to cache 3. Continue pipeline (unstall)
## Multilevel Caches - Usually, caches are implemented in multiple "levels" for maximal efficiency + L1 is smallest/fastest + L2 is larger/slower + L3 is largest/slowest - Modern multi-core processors typically have dedicated L1 and L2 caches for each core and a shared L3 cache