It has been a full seven months since AMD released detailed information about its Opteron A1100 server CPU, and twenty two months since announcement. Today, at the Hot Chips conference in Cupertino, CA, AMD revealed the final pieces about its ARM powered server strategy headlining the A1100. 

The Case for Low Power Server CPUs

Before we discuss the new Opteron A1100 details, let us review the background of why AMD designed an ARM powered CPU. It all comes down to the devices and services we now take for granted: cell phones, tablets, cloud storage, and cloud services. AMD presented a slide about a year ago that summed it up nicely.

The amount of internet users is growing by 8 to 12% every year. Apple, Google, Microsoft, Facebook, you-name-it, all invest huge sums of money into server farms to provide the services we have come to rely on. This trend gains more and more momentum as software companies like Microsoft try to emulate the success of Apple and Google by selling hardware (Apple) and providing free services (Google) that are ad-supported.

Building the infrastructure to support all these devices and users is a massive undertaking. Typically, companies buy traditional high powered servers (read: Intel Xeon) and partition their computing power up between many tasks as needed. However, this isn’t always the best strategy. For IO tasks, you are always bottlenecked by something other than the CPU, so there is not a reason to throw high cost high power CPUs at the problem. For webserver tasks, response time is paramount. However, with the huge number of users connecting, webservers have become an ‘embarrassingly parallel’ problem you can address with multi core CPUs - as long as there is enough muscle behind each CPU.

The ‘enough muscle’ issue has hindered previous low power high density webserver attempts. When we tested the Calxeda ARM compute cluster, there were only certain edge cases where it was more efficient than a dual core Xeon server running virtual machines. Calxeda themselves admitted that their processors, utilizing ARM Cortex A9s, were in the early adopter phase of ARM powered webservers. Calxeda stated it wouldn’t be until ARMv8 (where virtualization is supported) and Cortex A57 that ARM based servers would ‘cross the chasm’ and enter the mainstream.

With the Opteron A1100, AMD skipped the early adopter phase and chose something with a higher chance of initial success. 

Meet the A1100: CPUs and IO

There are three types of ARM licenses: POP, processor, and architecture. A POP license stands for Processor Optimization Pack and provides the licensee with everything they need to send a chip to the fab. A processor license provides the details of an ARM core like Cortex A9 so you can implement it into your own SoC, but you are not allowed to customize it. Finally, there is the ultimate license, an architecture license. An architecture license provides all details of ARM instruction set (ISA) and CPU implementation so a licensee can implement their own custom CPU core using the ARM ISA however they see fit. AMD is a processor and architecture licensee. If AMD decides it can be competitive by shipping an SoC with an ARM designed CPU (processor license), they can do so without the effort designing their own ARM ISA CPU. If AMD wants to differentiate itself with a custom designed CPU using the ARM ISA, AMD can use its architecture license to do that, similar to Qualcomm’s Krait CPU cores. AMD has decided to do both. Today we discuss its processor license.

AMD’s first SoC containing an ARM CPU is code named Seattle, the Opteron A1100. Seattle features no less than eight 64-bit ARMv8 ISA, Cortex A57 cores. Depending on availability, this could be the first Cortex A57 CPU to hit any market, not just the server market. AMD will follow up in 2015 with a lower power version that is pin compatible with another x86 CPU, both of which are part of Project Skybridge. In 2016 AMD will leverage its architecture license and ship K12, a fully custom CPU design using the ARMv8 ISA.

Each pair of Cortex A57s in the A1100 shares a 1MB L2 cache (totaling to 4MB of L2), and they all roll up to a shared 8MB L3 cache. To address the server market, all caches are ECC protected except for the L1 instruction cache, which is parity protected instead. Instruction cache protection is not quite as important (invalid instruction just means a pipeline stall). AMD utilizes ARM bus interfaces and debugging support throughout the design. The Cortex A57 also implements cryptography extensions that are quoted by ARM to accelerate things like https by 3-10x over previous ARM designs.

The SoC has a dual channel (2x64-bit) DDR3/4 interface to up to 128GB of 1866MHz memory. Just like the caches, the memory path also supports ECC of the single-bit error correct / double-bit error detect variety. Registered (RDIMM), unregistered (UDIMM), and small-outline (SODIMM) memory modules are support by the A1100 SoC, but actual motherboards will likely support only one type of memory. The same goes for DDR3 vs. DDR4. 

As the A1100 is a SoC, it integrates IO directly into the single chip instead of relying on an off-chip IO hub. Integrated components include 8 SATA 3 (6Gb/s) ports, two 10 Gbit Ethernet (10GBASE-KR) ports, one 10/100/1000 Ethernet port, 8 lanes of Gen3 PCI-Express (supporting 8x, 4x/4x, and 4x/2x/2x), I2C, SPI, and UART. The inclusion of this breadth of storage IO (8 SATA3 ports) along with the 2x10 Gbit Ethernet is particularly interesting as it gives us hints of how AMD will position the Opteron A1100 on the market. More on this later.

AMD’s Special Sauce: A1100’s Co-Processors
Comments Locked

28 Comments

View All Comments

  • Gigaplex - Monday, August 11, 2014 - link

    I'm not sure why you'd want Storage Spaces. The marketing sounds great, but there are many complaints all over the web that it just doesn't work as well as advertised. Parity mode in particular has unusably slow write speeds, and you can't expand the pool in an ad hoc fashion as was originally advertised - you have to add a collection of drives simultaneously which effectively just builds a second pool. And don't even ask about rebalancing. Conventional RAID5 is actually more flexible in practice.
  • hechacker1 - Tuesday, August 12, 2014 - link

    The latest Server2012 updates have largely fixed it. Yes, it's slow, but that's for data consistency. If you enable SSD tiering, I can max out my network connection. Or you can enable the ram write cache, if you have a UPS.

    The biggest problem is rebalancing, which it doesn't do. It also can't shrink a volume to safely remove a drive. But a lot of other RAID schemes don't support that either.
  • easp - Tuesday, August 12, 2014 - link

    Keep in mind that Microsoft has quite a few servers of their own for Azure, Bing, etc. Thats the initial target market for this stuff, orgs that run their own software on big infrastructure.
  • En1gma - Monday, August 11, 2014 - link

    usb?
  • davegraham - Monday, August 11, 2014 - link

    Stephen, you also have to remember that AMD has separate divisions now (retail and embedded). the Embedded side absolutely could and will integrate into SeaMicro. ;) so, just remember, a dev kit isn't anything more than that....a dev kit. the final integration stages remain to be seen.
  • Stephen Barrett - Monday, August 11, 2014 - link

    I agree with you completely. I was just hoping they would make a big announcement on that simultaneously instead of the weak reference system shown.
  • BMNify - Monday, August 11, 2014 - link

    amd seem to have killed mass consumer/prosumer/SME uptake before they even start as these reference system designed 2U kit are being sold for 3000+ USD.

    selling your initial reference boards for to high a price massivly limits AMD's ability to mass produce these soc quickly at a good mass consumer price ($200 per chip for testing and product OEM development etc) so no initial mass uptake, and far longer time scales to get parity with other mass produced customer hardware, remember its an untested soc, and they need to prove its viability PDQ to make their ROI back and pay their bills etc...
  • iwod - Monday, August 11, 2014 - link

    I must have missed it, why is it not targeting webserver application market?

    I find myself asking the same question whenever there is an ARM Server SoC article. What exactly does A1100 does better then Intel's Airmont based Server SoC? Yes, Airmont, the "14nm" version of Silvermont, compared to 28nm of A1100.
  • Wilco1 - Monday, August 11, 2014 - link

    We can only compare it with Avoton for now as details for Airmont are not known yet. And it should be a good deal faster than Avoton (especially given it has 3x the amount of L2/L3 cache) and at better perf/W. Whether Airmont increases performance is unclear, power is likely reduced, but so 20 or 16nm versions of Seatle might be available next year as well, and those would significantly increase clockspeed.
  • tuxRoller - Monday, August 11, 2014 - link

    Wow. If this a video accelerator, this would be a tremendous soc for a media server/nas.
    All those sata ports, 10Gbe ports(!!!), ecc ram, and better performance/clock than silvermont when using the new isa.

Log in

Don't have an account? Sign up now