







# Locality of Reference

- Temporal Locality
  - Programs tend to reference the same memory locations at a future point in time
  - Due to loops and iteration, programs spending a lot of time in one section of code
- Spatial Locality
  - Programs tend to reference memory locations that are near other recently-referenced memory locations
  - Due to the way contiguous memory is referenced, e.g. an array or the instructions that make up a program
  - Sequential Locality
    - · Instructions tend to be accessed sequentially
- Locality of reference does not always hold, but it usually holds



# Dynamic RAM

- Bits stored as charge in semiconductor capacitors
- Charges leak
- · Need refreshing even when powered
- Simpler construction
- Smaller per bit
- Less expensive
- Need refresh circuits (every few milliseconds)
- Slower
- Main memory



- · Bits stored as on/off switches via flip-flops
- No charges to leak
- No refreshing needed when powered
- More complex construction
- Larger per bit
- More expensive
- Does not need refresh circuits
- Faster
- Cache

# So you want fast?

- It is possible to build a computer which uses only static RAM (the memory used to build a cache)
- This would be very fast
- This would need no cache
  - How can you cache cache?
- This would cost a very large amount



# Types of ROM Written during manufacture Very expensive for small runs Programmable (once) PROM Needs special equipment to program Read "mostly" Erasable Programmable (EPROM) Erased by UV Electrically Erasable (EEPROM) Takes much longer to write than read Flash memory

• Erase whole memory electrically



### Cache operation - overview

- CPU requests contents of memory location
- Check cache for this data
- If present, get from cache (fast)
- If not present, read required block from main memory to cache
- Then deliver from cache to CPU
- Cache includes tags to identify which block of main memory is in each cache slot







## Cache Design

- Size
- Mapping Function
- Replacement Algorithm
- Write Policy
- Block Size
- Number of Caches



- Cost
  - More cache is expensive
- Speed
  - More cache is faster
  - Up to a point diminishing returns as cache increases in size
    - Also can take longer to search the more cache there is







address







| Dire                                                   | ect Mapping                                                                  | g Address 64K Cach                                                                                                                                                            | e Example           |  |  |  |
|--------------------------------------------------------|------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------|--|--|--|
| cache onl                                              | у                                                                            | memory address                                                                                                                                                                |                     |  |  |  |
|                                                        | Tag t                                                                        | Line or Slot s                                                                                                                                                                | Word w              |  |  |  |
| 1 1                                                    | 8                                                                            | 14                                                                                                                                                                            | 2                   |  |  |  |
| <ul><li> 2 bit</li><li> Need</li></ul>                 | word identifier (4                                                           | s the cache slot/line                                                                                                                                                         |                     |  |  |  |
| <ul> <li>Chect</li> <li>Also</li> <li>Value</li> </ul> | k contents of cac<br>need a Valid bit a<br>alid – Indicates if th<br>xecuted | ame line have the same Tag field<br>he by finding line and checking Ta<br>nd a Dirty bit<br>e slot holds a block belonging to the p<br>block has been modified while in the b | ag<br>program being |  |  |  |

need to be written back to memory before slot is reused for another block



# Cache Example

- The website for the textbook includes a link to CAMERA, a cache simulator
- Example for direct-mapped cache

## Direct Mapping pros & cons

- Simple
- Inexpensive
- Fixed location for given block
  - If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high – condition called **thrashing**





















# Replacement Policy (1)

- The replacement policy is the technique we use to determine which line in the cache should be thrown out when we want to put a new block in from memory
- Direct mapping
  - No choice
  - Each block only maps to one line
  - Replace that line

### Replacement Algorithms (2) Associative & Set Associative

- Algorithm must be implemented in hardware (speed)
- Least Recently used (LRU)
  - e.g. in 2 way set associative, which of the 2 block is LRU?
    - For each slot, have an extra bit, USE. Set to 1 when accessed, set all others to 0.
  - For more than 2-way set associative, need a time stamp for each slot expensive
- First in first out (FIFO)
  - Replace block that has been in cache longest
  - Easy to implement as a circular buffer
- Least frequently used
  - Replace block which has had fewest hits
  - Need a counter to sum number of hits
- Random
  - Almost as good as LFU and simple to implement





# Write Back

- Updates initially made in cache only
  - Dirty bit is set when we write to the cache, this indicates the cache is now inconsistent with main memory
- Dirty bit for cache slot is cleared when update occurs
- If cache line is to be replaced, write the existing cache line to main memory if dirty bit is set before loading the new memory block





| Cach     | ne P      | erforma           | nce Example                    |
|----------|-----------|-------------------|--------------------------------|
| Sample   | e progra  | am executes fro   | om memory location 48-         |
| 95 once  | e. Then   | it executes fro   | m 15-31 in a loop ten          |
| times b  | efore e   | xiting.           |                                |
| Event    | Location  | Time              | Comment                        |
| 1 miss   | 48        | 2500ns            | Memory block 3 to cache slot 3 |
| 15 hits  | 49-63     | 80ns×15=1200ns    |                                |
| 1 miss   | 64        | 2500ns            | Memory block 4 to cache slot 0 |
| 15 hits  | 65-79     | 80ns×15=1200ns    |                                |
| 1 miss   | 80        | 2500ns            | Memory block 5 to cache slot 1 |
| 15 hits  | 81-95     | 80ns×15=1200ns    |                                |
| 1 miss   | 15        | 2500ns            | Memory block 0 to cache slot 0 |
| 1 miss   | 16        | 2500ns            | Memory block 1 to cache slot 1 |
| 15 hits  | 17-31     | 80ns×15=1200ns    |                                |
| 9 hits   | 15        | 80ns×9=720ns      | Last nine iterations of loop   |
| 144 hits | 16-31     | 80ns×144=12,240ns | Last nine iterations of loop   |
| Total hi | its = 213 | Total misses = 5  |                                |



