Memory

18 Feb

🧠 Memory Basics - Key Concepts Overview This overview introduces fundamental memory timing concepts critical for understanding system performance, especially in relation to DRAM (Dynamic Random-Access Memory).

📌 Topics Covered:

DRAM Operation

How DRAM stores data in rows and columns.
Step-by-step memory access protocol.

Memory Addressing

Row Address Strobe (RAS) and Column Address Strobe (CAS) signals.
Timing of accessing a particular cell in memory using row and column coordinates.

Timing Metrics

Access Time: Delay from request to data availability.
Latency: Time delay involved in memory response (e.g., CAS Latency).
Memory Cycle Time: Time between successive memory accesses.

Performance Metrics

Read Rate: Speed at which data is read.
Bandwidth: Total data transfer capacity of the memory per second.

🔗 Takeaway: Understanding the timing and structure of memory access helps you grasp how memory efficiency influences overall system performance—crucial for optimizing both hardware and software in computing environments.

🧬 DRAM Technologies

🔁 Fast Page Mode (FPM)

Once a row is activated, multiple columns can be accessed without repeating the row activation.
Reduces row access time for sequential data reads.
Improves performance when accessing data within the same row.

📊 Burst Mode

Enables reading or writing a sequence of data words with a single command.
Data is transferred in rapid succession across multiple clock cycles.
Especially effective for block transfers and cache fills.

📈 Timing & Performance Insights:

Timing diagrams compare traditional access vs. FPM and Burst Mode.
Clock cycle analysis shows reduced latency per read.
Throughput increases as fewer cycles are needed for multiple data transfers.

💡 Final Takeaway: These techniques dramatically improve memory bandwidth and reduce average latency, leading to faster data access and enhanced overall system performance — especially in modern CPUs and memory-intensive applications.

📚 Memory Systems: Key Concepts

🧬 1. Historical and Physical Overview

Evolution: Memory has progressed from magnetic cores to semiconductor-based systems (e.g., DRAM, SRAM, flash).
Construction: Memory chips consist of millions (or billions) of microscopic transistors and capacitors.
Integration: Memory is embedded in system architecture—connected to CPUs via buses and managed through controllers.

⚡ 2. Volatile vs. Non-Volatile Memory

Volatile: Requires power to retain data.
- Examples: SRAM (used in CPU caches), DRAM (used in main memory).
Non-Volatile: Retains data even when powered off.
- Examples: Flash memory, SSDs, ROM.

🔄 3. SRAM vs. DRAM

SRAM: Faster, uses 6 transistors per bit, more expensive. Used for Cache (L1, L2, L3). No refresh needed.
DRAM: Slower, uses 1 transistor + 1 capacitor, cheaper. Used for Main memory (RAM). Requires periodic refresh.

Context: Comparison of Static RAM vs. Dynamic RAM features.

🚀 4. Memory Performance Strategies

Memory Hierarchy: Organizes memory into levels (registers → caches → RAM → disk) to balance speed and cost.
Access Time: Faster memory is more expensive and smaller; slower memory is larger but cheaper.
Optimization: Use of caching, pipelining, and predictive access improves performance.

🔁 5. Principle of Locality

Temporal Locality: Recently used data is likely to be reused soon.
Spatial Locality: Data near recently accessed locations is likely to be accessed soon.
Implication: Modern systems optimize memory fetches by anticipating access patterns (e.g., prefetching cache lines).

🌐 6. Future Outlook

Trends: Growth in non-volatile RAM (e.g., MRAM, ReRAM), stacked memory architectures (e.g., HBM), and processing-in-memory.
Impact: Supports AI, big data, and real-time analytics by handling larger datasets faster.

🔁 Understanding Cache Memory: Key Concepts for Performance

🧠 1. Background: Von Neumann Architecture Refresher

Von Neumann Bottleneck: In traditional architecture, both data and instructions share the same bus and memory, causing delays.
Limitation: CPUs operate much faster than memory access speeds, leading to idle CPU cycles while waiting for data from RAM.

⚡ 2. What Is Cache Memory?

A small, high-speed memory located closer to (or inside) the CPU.
Acts as a buffer between the CPU and main memory (RAM).
Stores frequently accessed data or instructions to avoid repeated main memory access.

🎯 3. Why Cache Works (Principle of Locality)

Temporal Locality: Recently accessed data is likely to be used again soon.
Spatial Locality: Data located near recently accessed data is likely to be accessed next.

📏 4. Measuring Cache Performance

Hit: When requested data is found in the cache.
Miss: When data is not found and must be fetched from RAM.
Hit Rate: Ratio of cache hits to total requests (higher = better).
Miss Rate: Ratio of misses to total requests.
Average Memory Access Time: Hit time + (Miss rate × Miss penalty).

Context: Standard metrics for evaluating cache efficiency.

🧱 5. Multi-Level Cache (L1, L2, L3)

L1 Cache: Closest to the CPU (on-chip), very fast, small (~32–128 KB).
L2 Cache: Larger, slightly slower, also on-chip or near CPU (~256 KB–1 MB).
L3 Cache: Shared across CPU cores, bigger (~4–50 MB), slower than L2.

🧩 Hierarchy Benefit: Each level catches more misses from the previous one, greatly reducing access time to RAM.

🧩 6. Memory Mapping and Coherency

Memory Errors and Protection

Most common in DRAM due to electrical noise or cosmic rays.
Detection & Correction: Parity Bits (detection) and ECC (Error-Correcting Code) which can correct single-bit errors.

Memory Maps

A schematic showing how address space is allocated to ROM, DRAM, Cache, and I/O.
Memory-mapped I/O: Devices like GPUs are treated as memory locations.

Cache Coherency

Occurs when multiple caches hold inconsistent copies of the same memory (common in multicore CPUs).
Solutions: Write-through caches, MESI protocol, and memory barriers.

🔐 7. Ensuring Data Integrity in I/O

Use volatile keyword in code to prevent compiler optimization on I/O-mapped addresses.
Mark I/O regions as uncacheable to avoid stale data.
Use interrupts or status register polling to synchronize with device readiness.

🏛️ Foundation: Von Neumann Architecture

In the Von Neumann model, instructions and data share the same memory and bus. This means:

CPU fetches both data and instructions from a unified memory space.
Access to I/O and memory often occurs through the same addressing mechanism.

🧠 Final Takeaway

Modern memory systems are a careful balance between speed, reliability, and correctness. While caches and memory mapping optimize performance, error correction and coherency protocols ensure systems remain stable, accurate, and efficient.

Benjamin Britcliffe