Next Gen NVMe SD Card Review: The SM2708 Controller Serves it Hot and Fastby Ganesh T S on September 9, 2021 9:00 AM EST
Simulating Extended Usage
The performance of memory cards tends to go down over time as wear and tear on the NAND takes its toll. In order to simulate long-term usage, we subject the card to heavy traffic - similar to what one might do with direct-attached storage devices such as external drives. This traffic is also monitored to estimate performance consistency and relative performance numbers. Thanks to the exposure of the SD Express card as a standard NVMe device, the internal temperature of the SD Express card is also monitored.
AnandTech DAS Suite
Usage scenarios for memory cards may involve transfer of large amounts of photos and videos. Other usage scenarios include the use of the unit as a download or install location for games in portable game consoles, and importing files directly from it into a multimedia editing program such as Adobe Photoshop (for quick edits). Some users may even opt to boot an OS off a memory card in single-board computers.
The AnandTech DAS Suite tackles the first use-case. The evaluation involves processing four different workloads:
- AV: Multimedia content with audio and video files totalling 24.03 GB over 1263 files in 109 sub-folders
- Home: Photos and document files totalling 18.86 GB over 7627 files in 382 sub-folders
- BR: Blu-ray folder structure totalling 23.09 GB over 111 files in 10 sub-folders
- ISOs: OS installation files (ISOs) totalling 28.61 GB over 4 files in one folder
Each data set is first placed in a 29GB RAM drive, and a robocopy command is issue to transfer it to the memory card (formatted in exFAT).
robocopy /NP /MIR /NFL /J /NDL /MT:32 $SRC_PATH $DEST_PATH
Upon completion of the transfer (write test), the contents from the unit are read back into the RAM drive (read test) after a 10 second idling interval. This process is repeated three times for each workload. Read and write speeds, as well as the time taken to complete each pass are recorded. Whenever possible, the temperature of the memory card is recorded during the idling intervals. Bandwidth for each data set is computed as the average of all three passes.
The reads for most passes are well above 500 MBps+. However, in the midst of heavy writes (going beyond the ~5.5 GB SLC cache), the speeds drop below 100 MBps, as evident in the above graphs.
Aspects influencing the performance consistency include SLC caching and thermal throttling / firmware caps on access rates to avoid overheating. This is important for certain use-cases, as the last thing users want see when copying over large amounts of data is the transfer rate going down to USB 2.0 speeds. The graphs below present the recorded instantaneous bandwidth numbers and temperatures (where applicable) while processing the AnandTech DAS Suite
|AnandTech DAS Suite - Performance Consistency|
The DAS suite is started soon after the completion of the CrystalDiskMark and fio workloads, and the SLC cache is already exhaused without enough time for complete reclamation. In addition, the time taken by the host to queue up 5.5GB+ of writes is quite small, so we do not even see the SLC caching burst in these workloads. The more worrisome aspect is the temperature - staying at around 99C - 100C throughout the test (temperature monitoring reports 0 in the graph for 100C+ temperatures). We will cover the thermal aspects in detail further down in the review.
PCMark 10 Storage Bench
There are a number of storage benchmarks that can subject a device to artificial access traces by varying the mix of reads and writes, the access block sizes, and the queue depth / number of outstanding data requests. More serious benchmarks, however, actually replicate access traces from real-world workloads to determine the suitability of a particular device for a particular workload. Real-world access traces may be used for simulating the behavior of computing activities that are limited by storage performance. Examples include booting an operating system or loading a particular game from the disk.
PCMark 10's storage bench (introduced in v2.1.2153) includes four storage benchmarks that use relevant real-world traces from popular applications and common tasks to fully test the performance of the latest modern drives:
- The Full System Drive Benchmark uses a wide-ranging set of real-world traces from popular applications and common tasks to fully test the performance of the fastest modern drives. It involves a total of 204 GB of write traffic.
- The Quick System Drive Benchmark is a shorter test with a smaller set of less demanding real-world traces. It subjects the device to 23 GB of writes.
- The Data Drive Benchmark is designed to test drives that are used for storing files rather than applications. These typically include NAS drives, USB sticks, memory cards, and other external storage devices. The device is subjected to 15 GB of writes.
- The Drive Performance Consistency Test is a long-running and extremely demanding test with a heavy, continuous load for expert users. In-depth reporting shows how the performance of the drive varies under different conditions. This writes more than 23 TB of data to the drive.
Despite the data drive benchmark appearing most suitable for testing direct-attached storage, we opt to run the full system drive benchmark as part of our evaluation flow. This allows for simulation of extended usage on the memory card.
The Full System Drive Benchmark comprises of 23 different traces. For the purpose of presenting results, we classify them under five different categories:
- Boot: Replay of storage access trace recorded while booting Windows 10
- Creative: Replay of storage access traces recorded during the start up and usage of Adobe applications such as Acrobat, After Effects, Illustrator, Premiere Pro, Lightroom, and Photoshop.
- Office: Replay of storage access traces recorded during the usage of Microsoft Office applications such as Excel and Powerpoint.
- Gaming: Replay of storage access traces recorded during the start up of games such as Battlefield V, Call of Duty Black Ops 4, and Overwatch.
- File Transfers: Replay of storage access traces (Write-Only, Read-Write, and Read-Only) recorded during the transfer of data such as ISOs and photographs.
PCMark 10 also generates an overall score, bandwidth, and average latency number for quick comparison of different drives. The sub-sections in the rest of the page reference the access traces specified in the PCMark 10 Technical Guide.
Booting Windows 10
The read-write bandwidth recorded for each drive in the boo access trace is presented below.
In SD Express mode, the numbers are equivalent to a high-end SATA SSD. This is expected as the boot workload is mostly read-intensive, and the numbers for read workloads are not influenced by the SLC caching effects.
The read-write bandwidth recorded for each drive in the sacr, saft, sill, spre, slig, sps, aft, exc, ill, ind, psh, and psl access traces are presented below.
The SD Express card performance is equivalent to that of a 256GB-class SATA SSD in both read- and write-intensive workloads.
The read-write bandwidth recorded for each drive in the exc and pow access traces are presented below.
The read-write bandwidth recorded for each drive in the bf, cod, and ow access traces are presented below.
Gaming workloads are read-intensive and the SD Express mode has no problems in delivering results similar to that of high-end SATA SSDs and low-end DRAM-less PCIe 3.0 x2 NVMe SSDs in its performance class.
Files Transfer Workloads
The read-write bandwidth recorded for each drive in the cp1, cp2, cp3, cps1, cps2, and cps3 access traces are presented below.
Simultaneous reads and writes tend to bring down the performance to SATA level for the SD Express mode, but, thanks to the sequential nature of the workloads, the numbers are quite good for a memory card.
PCMark 10 reports an overall score based on the observed bandwidth and access times for the full workload set. The score, bandwidth, and average access latency for each of the drives are presented below.
Equivalent numbers for external flash drives can be found in this review featuring both NVMe and SATA SSDs behind a bridge chip. The numbers in the SD Express mode closely track the ADATA SC680 960GB sample - which happens to be a SATA SSD using the Silicon Motion SM2259XT DRAM-less controller but with more flash packages. Given that the reference design sampled to us only uses two packages and has 25% of the capacity (not much parallelism to exploit), it is only the NVMe interface / SD Express operation that allows the memory caard to reach the SC680's performance level.