Some of you may remember AMD announcing the "Torrenza" technology 10 years ago. The idea was to offer a fast and coherent interface between the CPU and various types of "accelerators" (via Hyper Transport). It was one of the first initiatives to enable "heterogeneous computing".

We now have technology that could be labeled "heterogeneous computing", the most popular form being GPU computing. There have been also encryption, compression and network accelerators, but the advantages of those accelerators were never really clear, as shifting data back and forth to the CPU was in many cases less efficient than letting the CPU process it with optimized instructions. Heterogeneous computing was in the professional world mostly limited to HPC; in the consumer world a "nice to have".

But times are changing. The sensors of the Internet of Things, the semantic web and the good old www are creating a massive and exponentially growing flood of data that can not be stored and analyzed by traditional means. Machine learning offers a way of classifying all that data and finding patterns "automatically". As a result, we witnessed a "machine learning renaissance", with quite a few breakthroughs. Google had to deal with this years ago before most other companies, and released some of those AI breakthroughs of the Google Brain Team in the Open Source world, one example being "TensorFlow". And when Google releases important technology into the Open Source world, we know we got to pay attention. When Google released the Google File System and Big Table back in 2004 for example, a little bit later the big data revolution with Hadoop, HDFS and NoSQL databases erupted.

Big Data thus needs big brains: we need more processing power than ever. As Moore's law is dead (the end of CMOS scaling), we can not expect much from process technology advancements. The processing power has to come from ASICs (see Google's TPU), FPGAs (see Microsoft's project Catapult) and GPUs.

Those accelerators need a new "Torrenza technology", a fast, coherent interconnect to the CPU. NVIDIA was first with NVLink, but an open standard would be even better. IBM on the other hand was willing to share the CAPI interface.

To that end, Google, AMD, Xilinx, Micron and Mellanox have joined forces with IBM to create a "coherent high performance bus interface" based on a new bus standard called "Open Coherent Accelerator Processor Interface" (OpenCAPI). Capable of a 25Gbits per second per lane data rate, OpenCAPI outperforms the current PCIe specification, which offers a maximum data transfer rate of 8Gbits per second for a PCIe 3.0 lane. We assume that the total bandwidth will be a lot higher for quite a few OpenCAPI devices, as OpenCAPI lanes will be bundled together.

It is a win, win for everybody besides Intel. It is clear now that IBM's OpenPOWER initiative is gaining a lot of traction and that IBM is deadly serious about offering an alternative to the Intel dominated datacenter. IBM will implement the OpenCAPI interface in the POWER9 servers in 2017. Those POWER9s will not only have a very fast interface to NVIDIA GPUs (via NVLink), but also to Google's ASICs and Xilinx FPGAs accelerators.

Meanwhile this benefits AMD as they get access to an NVLink alternative to link up the Radeon GPU power to the upcoming Zen based server processors. Micron can link faster (and more profitable than DRAM) memory to the CPU. Mellanox can do the same for networking. OpenCAPI is even more important for the Xilinx FPGAs as a coherent interface can make FPGAs attractive for a much wider range of applications than today.

And guess what, Dell/EMC has joined this new alliance just a few days ago. Intel has to come up with an answer...

Update: courtesy of commenter Yojimbo: "NVIDIA is a member of the OpenCAPI consortium, at the "contributor level", which is the same level Xilinx has. The same is true for HPE (HP Enterprise)".

This is even bigger than we thought. Probably the biggest announcement in the server market this year.


Source: OpenCAPI

Comments Locked


View All Comments

  • SarahKerrigan - Friday, October 14, 2016 - link

    The major difference between CAPI and PCIe is that CAPI is fully coherent and requires no additional address translation between the CPU and the attached device. This significantly reduces latency. Today's CAPI actually just uses the PCIe physical layer, but runs its own protocol on top of it, so it's not just about bandwidth. Some cards, as far as I know, can autodetect CAPI and switch into CAPI mode when it's available and run into PCIe when it's not; afaik Mellanox's CAPI-compatible HBA's work like this.

    That being said, I expect OpenCAPI (which IBM called New CAPI in the P9 presentation) to be a net bandwidth win too, via having more lanes per slot - since it appears, per the Power9 presentation, to be using the same physical layer as NVlink 2.0.
  • fanofanand - Friday, October 14, 2016 - link

    Thank you macfan and Sarah, I appreciate the insight! With the ability to couple lanes together, I can see where this would be an order of magnitude faster, and if they get a latency reduction then yeah this is a big deal.
  • p1esk - Friday, October 14, 2016 - link

    What does it mean "CAPI is fully coherent"?
  • fallaha56 - Sunday, October 16, 2016 - link

  • Yojimbo - Friday, October 14, 2016 - link

    This article fails to mention the fact that NVIDIA is a member of the OpenCAPI consortium, at the "contributor level", which is the same level Xilinx has, and seems to be a level below "board level", which is the level of the founders of the consortium: AMD, Google, IBM, Mellanox, and Micron. HPE is also a contributor level member, and Dell EMC is an "observer level" member. Since NVIDIA has a lion's share of the accelerator market it's a rather significant fact. It's quite glaring that NVIDIA is not a member of the CCIX consortium.

    The CCIX consortium contains many of the same players, but seems to have formed around Xilinx. The OpenCAPI consortium presumably formed around IBM. I haven't tried to figure out what the difference of their proposals are, yet.
  • JohanAnandtech - Friday, October 14, 2016 - link

    Thanks, a very good addition. I updated the article.
  • dave102 - Friday, October 14, 2016 - link

    It is pretty clear the author of this article does not know what he is talking about. These players are all individual contributors to the data center space and thus must come up with an open interconnect to allow they to play together at high enough performance. The claim that Intel must "catch-up" is total bullshit. Intel has 97% market share in the data center.

    Intel has a fabric product in omnipath, and guess what, it's integrated. They have an HBM solution, and it's integrated. They have a GPU competitor in the knights series, and guess what it's bootable and coherent with all of the products I just mentioned. The fact that all of these come standard on a Xeon or Xeon Phi makes the argument that they are "behind" for not coming up with a PCIe next generation standard totally irrelevant.

    Way to miss the boat on this one Johan.
  • JohanAnandtech - Friday, October 14, 2016 - link

    The point is that OpenCAPI has wide industry support, Omnipath is mostly Intel networking and MICs. So Intel has to answer this, despite having OmniPath. Intel excels in CPU design, but it is very unlikely they will have the best solution for every accelerator. In fact, as far as I know, Tesla is a lot more succesfull than Phi. And I do not know who ran over your cat today, but can we please keep the tone a bit more civil? Immediately stating "that I don't know what I am talking about" is not really a good way to start a discussion.
  • name99 - Friday, October 14, 2016 - link

    The correct analogy would be to something like what happened in the 90s as more and more tasks moved off the mainframe and onto servers. IBM was in the position you give for Intel today --- solutions to all sorts of things, with names like SAA and SNA, that today mean something to only a few cognoscenti.

    Having a portfolio of solutions is not a panacea. What matters includes things like total system cost, and the ability to mix and match alternative plugin components.
    One would expect that Intel is as aware of this history as IBM is (though sometimes I wonder, given the way they behave). Which means the outcome is not necessarily as determined as twenty years ago.
    BUT Intel has definite weaknesses. Their on-going fragmentation of the user base does them little favor (what's value in supposedly having binary compatibility between CPUs if the most relevant part of the ISA, the various AVX flavors, are not so compatible?) And they're going to become ever more vulnerable on cost. Not from IBM at the high end, but with a thousand cuts as Chinese POWER clones and ARM servers come online.
    (Remember, the Chinese government can't be sold high-end Xeons:
    which means they are GOING to develop their own equivalents in this performance sector, regardless of the startup costs. And those startups are likely going to use ARMv8 or POWER ISA --- they're damn well not going to use x86 for both technical and business reasons. We've already seen the very first fruits of this with Phytium's initial offerings --- which are very much learning chips, not the end-point.)

    People analyzing this effort as a waste of time are living in the last ten years. They need to broaden their horizons both backwards (to how Intel got to its position in the first place) and forwards (not to next quarter, but to say the 2020 timeframe).
  • Michael Bay - Saturday, October 15, 2016 - link

    Thousand cuts by a lame duck must be made a new meme.
    And should POWER become something a little more tangible than IBM bullshit marketing, you can bet your everything at it being blocked for sale just as well.

Log in

Don't have an account? Sign up now