At Intel Developer Forum this week in San Francisco, Intel is sharing a few more details about its plans for their Optane SSDs using 3D XPoint memory.

The next milestone in 3D XPoint's journey to being a real product will be a cloud-based testbed for Optane SSDs. Intel will be giving enterprise customers free remote access to systems equipped with Optane SSDs so that they can benchmark how their software runs with 3D Xpoint-based storage and optimize it to take better advantage of the faster storage. By offering cloud-based access before even sampling Optane SSDs, Intel can keep 3D XPoint out of the hands of their competitors longer and perhaps make better use of limited supply while still enabling the software ecosystem to begin preparing for the revolution Intel is planning. However, this won't do much for customers who want to integrate and validate Optane SSDs with their existing hardware platforms and deployments.

The cloud-based Optane testbed will be available by the end of the year, suggesting that we might not be seeing any Optane SSDs in the wild this year. But at the same time, the testbed would only be worth providing if its performance characteristics are going to be pretty close to that of the final Optane SSD products. Having announced the Optane testbed like this, Intel will probably be encouraging their partners to share their performance findings with the public, so we should at least get some semi-independent testing results in a few months time.

In the meantime, Intel and ScaleMP will be demonstrating a use that Optane SSDs will be particularly well-suited for. ScaleMP's vSMP Foundation software family provides virtualization solutions for high performance computing applications. One of their specialities is providing VMs with far more virtual memory than the host system has DRAM, by transparently using NVMe SSDs—or even the DRAM and NVMe storage of other systems connected via Infiniband—to cache what doesn't fit in local DRAM. The latency advantages of 3D XPoint will make Optane SSDs far better swap devices than any flash-based SSDs, and the benefits should still be apparent even when some of that 3D XPoint memory is at the far end of an Infiniband link.

ScaleMP and Intel have previously demonstrated that flash-based NVMe SSDs can be used as a cost-effective alternative to building a server with extreme amounts of DRAM, and with a performance penalty that can be acceptably small. With Optane SSDs that performance penalty should be significantly smaller, widening the range of applications that can make use of this strategy.  

Intel will also be demonstrating Optane SSDs used to provide read caching for cloud application or database servers running on Open Compute hardware platforms.

Comments Locked

36 Comments

View All Comments

  • emn13 - Wednesday, August 17, 2016 - link

    billions sounds like a really low estimate. Software often uses specific memory locations as counters; and if you're unlucky and caching won't do, then you might see up to around 30 million writes *a second* to the same location. That's perhaps a bit of a corner case; but DRAM has no way of dealing with write-amplification; it's pretty much direct access with some fairly static address remapping (TLB). That's what makes rowhammer bugs possible!

    You could hit a trillion writes a day, and I bet there's some workload out there running on a real server today that's written orders of magnitudes more than that *in practice* to a single memory location.
  • emn13 - Wednesday, August 17, 2016 - link

    It's speed is not 8x to 4x slower, at least not by any metric intel or micron have ever claimed. DRAM has a latency of 4-12ns (nano, not micro!), and it doesn't look like xpoint is anywhere near that. Also, realize that practical memory latencies *including* controller overheads are only a small multiple of that; 20-30ns is normal. Those great XPoint numbers all exclude the controller; as soon as it's included, latency (necessarily) skyrockets since it doesn't sound like XPoint can be directly addressed as simply as DRAM.

    Intel hasn't released a lot of data on this stuff, so it's hard to be sure, but my guess is that optane will be around 1000x slower than DRAM in practice via the NVMe interface. And that no matter what they do, they'll be around 10x slower - a very, very significant difference.

    And don't forget that DRAM isn't standing still either; HBM2 isn't nearly as uncertain as XPoint, and that's already slated to deliver 1TB/s bandwidth at even lower latencies, and with less power consumption than DDR4.

    I'm not expecting miracles from XPoint. It's going to be one hell of a fast SSD, and it'll be brilliant for swap and cache, but it's not going to dramatically change computer architecture. DRAM isn't going away; nor are larger cheaper SSDs.

    Unless the pricing can be really competitive, it's likely to remain an small, niche product.
  • MrSpadge - Tuesday, August 16, 2016 - link

    I see this mentioned pretty much anytime a new non-volatile memory is being talked about. But usually not from the companies themselves. Replacing DRAM is only going to happen if we can find something faster (which is cheap enough), or for some very low end applications.
  • wumpus - Thursday, August 18, 2016 - link

    You'll notice that DRAM is still even in Intel's marketing drawings. Best guess it will start out in enterprise 32G (or more) DRAM "caches" and allow terabytes of xpoint to be used as "memory".

    Check the latency figures on the recent POWER-8 article: main memory latency was 90ns. While xpoint latency might be roughly that by itself (never mind navigating all the way to the registers), that implies as long as you don't wear it out (and with enterprise-sized arrays with any kind of wear leveling it *can't* we worn out: the system simply can't throw that many writes over its lifetime). Give it a decent DRAM cache and watch it run.

    What I really want to see is an 8G HBM (or whatever size HBM will be by the time xpoint is made in volume). That should make a great front to xpoint and likely allow plenty* more threads to run (because HBM can supply a ton of outstanding requests. You won't get "high bandwidth" with just a few threads).
    * think replacing the wasted space of graphics parts in K chips with cores, not slapping hundreds of ARM cores down. Amdahl's law will remain in effect as long as these things do general purpose computing and single thread performance is important (although cranking up main memory access should change things a bit).
  • FunBunny2 - Tuesday, August 16, 2016 - link

    mmap() on steriods will rule the earth. that will power real RDBMS system.
  • fangdahai - Tuesday, August 16, 2016 - link

    XPOINT is far more expensive than NAND, much slower than DRAM, endurance is just very, very poor compare to DRAM

    I don't see how it could be useful, except for BIG DATA and super computers.
  • ddriver - Tuesday, August 16, 2016 - link

    but...but.... it is 1000x better! It must be great really ;)
  • nandnandnand - Tuesday, August 16, 2016 - link

    Store the entire OS and applications in XPoint, without needing constant writes, then see how bad it is.

    Hint: XPoint is useful if you use it right.
  • jwcalla - Tuesday, August 16, 2016 - link

    32 GB of DRAM is cheap.
  • Zertzable - Wednesday, August 17, 2016 - link

    Yes, but this targets a market where we're talking about 256, 512 or event 1TB of memory.

Log in

Don't have an account? Sign up now