SSD & HDD
π Chapter 7 Summary: Storage Technologies and Data Access
π Key Concepts & Terms
Storage Hierarchy: The layered structure of memory, from fastest/smallest (registers, cache) to slowest/largest (HDDs, cloud).
Latency: The delay between a request and its response. SSDs have much lower latency than magnetic disks.
Throughput: The amount of data transferred per unit time. Crucial for big data workloads.
Block Device: Storage device (like a hard disk) that reads/writes data in blocks.
Fragmentation: When data is scattered across the disk, reducing read performance.
Wear Levelling: Technique in SSDs that spreads writes evenly to extend lifespan.
TRIM Command: SSD command that clears deleted data blocks to prevent slowdowns.
RAID: Redundant Array of Independent Disks β combines multiple drives for redundancy and performance.
NAS vs. SAN: Network-Attached Storage (file-level access) vs. Storage Area Network (block-level access).
Context: Definitions of core storage terminology and performance metrics.
π From Tapes to SSDs: The Evolution of Storage
π§± 1. Magnetic Tapes and Disks
Used sequential access: you had to read in order.
Cheap, durable, and still used for archival backups.
Slow seek times and mechanical failures were common.
βοΈ 2. Hard Disk Drives (HDDs)
Introduced random access.
Faster than tapes, but mechanical parts (platters, arms) make them prone to wear.
Still good for large, low-cost storage.
β‘ 3. Solid State Drives (SSDs)
Use flash memory, no moving parts β faster and more reliable.
Handle random reads/writes efficiently.
Require wear leveling and TRIM to manage memory health and performance.
π§ 4. Impact on System Design
Faster boot times and application loads.
OS file systems adapted to optimize for SSDs.
Enabled new computing models (e.g. ultra-thin laptops, real-time analytics).
π‘ Key Takeaways: Modern Storage Considerations
π 1. Access Speed
SSDs outperform HDDs in speed and reliability.
NVMe SSDs (connected via PCIe) are even faster.
π§° 2. Reliability and Wear
SSDs have limited write cycles β require wear leveling.
HDDs suffer from head crashes and magnetic degradation over time.
π§Ή 3. Fragmentation
Major issue for HDDs (slows read times).
Less critical for SSDs but still relevant to file system design.
βοΈ 4. Remote Storage / Cloud
Adds network latency but provides scalability and redundancy.
Good for backups and distributed applications, but may not suit real-time workloads.
π― Choosing Storage for High-Performance Environments
When selecting a data storage solution for performance-critical applications, prioritize these factors:
Speed (Latency + Throughput): Critical for real-time analytics, AI/ML workloads.
Durability / MTBF: Mean Time Between Failure affects uptime in production systems.
Scalability: System must grow with data. Consider RAID/NAS/SAN or cloud hybrids.
Write Endurance: Especially important for SSDs under heavy write loads.
Fragmentation Resistance: Affects long-term performance stability.
Cost Efficiency: Balance between high-speed SSDs and cheaper HDDs or cloud cold storage.
Context: Selection criteria for high-performance storage environments.
π§ Final Thoughts
Storage technology has evolved from slow, sequential access models to ultra-fast, resilient, and scalable systems. SSDs revolutionized expectations around latency and reliability, while cloud and network storage brought new flexibility.
When designing high-performance systems, think holistically:
Match storage type to workload.
Monitor performance over time (e.g., wear indicators).
Consider hybrid setups (SSD for cache, HDD for bulk, cloud for archival).
πΉ Section 6.2 β RAID: Redundant Array of Independent Disks
RAID is a data storage virtualization technique that combines multiple physical disks into one logical unit, to improve performance, fault tolerance, or both.
π Key RAID Terminology
Striping: Splitting data across multiple disks to increase read/write speed.
Mirroring: Copying identical data to two or more disks for redundancy.
Parity: Extra data used to reconstruct information in case of disk failure.
RAID Controller: Hardware or software that manages RAID logic.
Context: Fundamental concepts behind RAID implementation.
π’ RAID Levels Overview
RAID 0: Striping only (no redundancy). Fast performance. No fault tolerance. Use Case: Video editing, gaming.
RAID 1: Mirroring. High redundancy. 50% storage loss. Use Case: OS drives, critical systems.
RAID 5: Striping + distributed parity. Balanced performance & fault tolerance. Slower writes. Use Case: Web servers, databases.
RAID 6: Like RAID 5, but double parity. Can survive 2 disk failures. Even slower writes. Use Case: High-availability systems.
RAID 10 (1+0): Mirroring + striping. High performance + redundancy. High cost. Use Case: High-transaction systems.
Context: Comparison of RAID configurations, advantages, and ideal use cases.
π§ RAID Strategy Takeaways
Redundancy priority? β Use RAID 1, RAID 5, or RAID 6.
Performance priority? β Use RAID 0 or RAID 10.
Mission-critical with budget? β RAID 10 (best of both, but expensive).
Space efficiency + redundancy? β RAID 5 is a solid balance.
πΉ Section 6.3 β Solid State Drives (SSDs)
SSDs use non-volatile flash memory instead of spinning disks, meaning no moving parts, faster speeds, and less power usage.
π SSD Concepts & Terms
Flash Memory: Type of memory that keeps data without power.
Wear Leveling: Spreads out writes evenly to prevent early cell failure.
TRIM Command: Helps the OS inform the SSD of unused data blocks to clean up.
Garbage Collection: Background process to clear and prepare memory blocks for writing.
Endurance: Number of write/erase cycles before memory cells fail.
Context: Technical mechanics of Solid State Drive longevity and maintenance.
π§ SSD Challenges and Design Implications
Write Wear: Drives need wear leveling to ensure long-term health.
Write Amplification: SSDs may write more data than requested β impacts lifespan.
No In-Place Updates: Entire blocks must be erased before rewriting β requires buffer management.
Data Recovery: Recovery is more complex than HDDs; often requires special tools.
Context: Operational challenges inherent to SSD technology.
Design Adjustments Include:
Using over-provisioning (extra hidden space).
Integrating TRIM and garbage collection.
Choosing RAID levels that minimize write amplification.
π RAID with SSDs vs. HDDs
Wear Concern: Low for HDDs | High for SSDs.
Performance Gain from RAID: Significant for HDDs | Already high without RAID for SSDs.
RAID 5/6 Viability: OK for HDDs | Can increase wear without proper TRIM support for SSDs.
Best Practice: RAID 10 or 1 for HDDs | Avoid parity-heavy setups like RAID 5 for SSDs unless wear handled.
Context: Comparison of RAID effectiveness across different physical media.
π§ Final Analysis: Choosing the Right Setup
π― For Redundancy (e.g., critical financial data)
RAID 1 or 6 (HDDs) β safer against disk failure.
RAID 10 (SSDs) β combines speed + redundancy + lower wear.
π For Performance (e.g., real-time data pipelines)
RAID 0 or RAID 10 (SSDs) β SSDs already reduce latency; striping boosts throughput.
Avoid RAID 5/6 with SSDs unless wear-leveling and TRIM are explicitly supported.
β Summary
SSDs excel in speed, but require smarter wear handling and garbage collection.
RAID provides tailored solutions for redundancy, speed, or bothβchoose based on workload and failure tolerance.
For modern high-performance systems, combining SSDs with RAID 10 offers both reliability and throughputβwith proper management of SSD-specific constraints.
πΉ Section 6.4 β Optical Memory (CDs, DVDs, Blu-ray)
π Key Terms & Definitions
Pits and Lands: Physical indentations (pits) and flat areas (lands) on optical discs that represent binary data when read by a laser.
Track Spiral: Data on an optical disc is stored in a single long spiral track, unlike circular tracks on magnetic disks.
Laser Beam Reading: A laser reflects off the disc; differences in reflection (pit vs land) are interpreted as binary data.
Write Once, Read Many (WORM): Optical media types that can be written to once (e.g., CD-R), but read repeatedly.
Phase Change Technology: Used in rewritable discs (e.g., CD-RW, DVD-RW), which change state between crystalline and amorphous to store data.
Context: Mechanics of optical data storage and retrieval.
π‘ Characteristics of Optical Memory
Cheap and portable β good for media distribution and archival.
Relatively slow β poor random access speed compared to SSDs/HDDs.
Durable β less susceptible to magnetic interference or mechanical shocks.
Low capacity compared to modern flash or hard drives.
Mostly read-oriented today (movies, backups, installations).
πΉ Section 6.5 β Magnetic Tape
Terms & Definitions
Sequential Access: Data is read in the order it was written, not randomly accessible like a disk.
Reel-to-Reel Storage: Tape is wound between spools during reading/writing.
Data Density: The amount of data that can be stored per inch of tape.
Tape Library: Automated system with robotic arms to retrieve and load tapes. Often used for archival data.
LTO (Linear Tape-Open): A modern magnetic tape storage standard used in enterprises.
Context: Key terminology for enterprise-level tape storage systems.
π‘ Magnetic Tape Advantages
Extremely high capacity (now into hundreds of TBs per cartridge).
Inexpensive per gigabyte β great for archival and backup.
Excellent for cold storage (data thatβs rarely accessed).
Drawback: Very slow access time.
π° Related Articles β Key Insights
π§ ITPro: Tape Storageβs Comeback
In 2023, 152.9 exabytes of tape shipped β the highest ever.
AI and compliance (e.g., GDPR) are driving demand for long-term storage.
Tapes are being embraced again for their cost, longevity, and security benefits.
π ACM: The Data Storage Crisis
The volume of data is outpacing current storage density growth.
Researchers are exploring new materials, physics, and optical tricks to expand data storage (e.g., using phase transitions or atomic-scale bits).
Suggests the need for hybrid storage strategiesβnot just speed, but lifespan and cost must be balanced.
βοΈ TechXplore: Wafer-Scale Accelerators
AI workloads are driving the adoption of wafer-scale chips (like Cerebras WSE-3).
These require massive, fast-access memory bandwidthβhighlighting limits of traditional storage when used for AI training.
Suggests a clear divide between βhotβ and βcoldβ data: fast RAM/SSD for real-time, massive tapes for bulk retention.
π Modern Implications: Key Questions Answered
π How have advancements (tapes β SSDs) changed our thinking?
Speed: Sequential, slow (Then) | Instant random access (Now).
Reliability: Good for archiving (Then) | Wear-out risk for SSDs (Now).
Storage Management: Manual, mechanical (Then) | Software-controlled, smart caching (Now).
Access Model: Archival (Then) | Real-time & mobile (Now).
Context: Historical shift in storage paradigms from tape to flash memory.
Modern systems now tier storage:
SSDs/DRAM for real-time processing
HDDs for bulk/active data
Tapes/cloud cold storage for rarely-used backups
βοΈ How should storage be chosen for HPC (High Performance Computing)?
Priorities for HPC:
Latency and IOPS (SSDs)
Concurrent read/write (RAID or NVMe SSD arrays)
Durability of writes (SSDs with wear-leveling)
Cheap backup/archiving (tapes, cloud blog)
So you'd balance performance, cost, redundancy, and lifespan depending on:
Volume
Access frequency
Failure impact
β Final Thoughts
Optical media is declining but still relevant for distribution and backups.
Magnetic tape is back in demand for its role in cheap, high-capacity archival.
Storage now involves strategic layeringβusing the right medium for the right use case.
New AI hardware (like wafer-scale accelerators) emphasizes the growing gap between compute speed and storage throughput.