As the semiconductor industry continues to evolve, Arm stands at the forefront of innovation for its core and IP architecture, especially in the mobile space, by pushing the boundaries of technology to deliver cutting-edge solutions for end users. For 2024, Arm's year-on-year strategic advancements focus on enhancing last year's Armv9.2 architecture with a new twist. Arm has rebranded and re-strategized its efforts by introducing Arm Compute Subsystem (CSS), the direct successor to last year's Total Compute Solutions (TSC2023) platform.

Arm is also transitioning its latest IP and Cortex core designs, including the largest Cortex X925, the middle Cortex A725, and the refreshed and smaller Cortex A520 to the more advanced 3 nm process technology. Arm promises that the 3 nm process node will deliver unprecedented performance gains compared to last year's designs, power efficiency and scalability improvements, and new front and back-end refinements to its Cortex series of cores. Arms' new solutions look to power the next-generation mobile and AI applications as Arm, along with its complete AArch64 64-bit instruction execution and approach to solutions geared towards mobile and notebooks, look set to redefine end users' expectations within the Android and Windows on Arm products.

Arm Compute Subsystem (CSS): CSS is the New TCS

The introduction of Arm Compute Subsystem (CSS) marks a significant milestone in Arm's strategy to deliver a holistic and full-rounded computing solution for partners to implement in their new yearly cycle of mobile devices. CSS is a comprehensive platform that integrates hardware, software, and tools to optimize the performance and efficiency of client devices. It is designed to provide a seamless computing experience across various devices, from smartphones and tablets to laptops and even desktop PCs.

When launched last year, the Armv9.2 architecture represented a significant step forward in Arm's roadmap. Still, this year, it's about building on the successes of its predecessors while introducing a host of new features and improvements. One of the key highlights of the revamped Armv9.2 family is the use of enhanced security features, which include memory tagging extensions (MTE) and confidential compute architecture (CCA). These features provide robust protection against various security threats, making devices more secure.

CSS leverages the latest Armv9.2 cores designed for 2024, including the high-performance Cortex X925, the balanced Cortex A725, and the power-efficient and refreshed Cortex A520. These cores are complemented by Arm's new Immortalis G925 GPU, designed to deliver exceptional graphics performance and efficiency in a mobile-sized package. Together, these components form the foundation of what is now called the CSS platform, which aims to provide a powerful and versatile computing solution for modern devices in the mobile sphere.

One of CSS's key features is its robust scalability for different markets, such as mobile and notebooks. The platform is designed to scale across different device form factors and performance requirements, making it suitable for many tasks and applications. Whether it's high-end gaming, professional content creation, or everyday productivity tasks, CSS can be tailored to meet the needs of various use cases.

Arm's Arm Compute Subsystem (CSS) platform represents a significant step forward in IP design and architectural improvements, offering multiple and claiming serious levels of enhancements in performance and efficiency. With the introduction of the second-generation Armv9.2 Cortex CPU cluster, including the new Cortex-X925 (big), Cortex-A725 (middle), and a refreshed Cortex-A520 (little) cores, the CSS platform is designed to deliver the ultimate in mobile computing performance when licensed out to their partners.

Additionally, the CSS platform includes a comprehensive reference software stack for Android, optimized AI backed by new Arm computer vision libraries (KleidiAI and KleidiCV), and robust tooling environments through Arm Performance Studio. This typically holistic approach ensures that Arm's physical implementations achieve speeds greater than 3.6 GHz and offer optimal power, performance, and area (PPA) metrics on the 3 nm node. Speaking of the 3 nm mode, Arm stated that TSMC and Samsung 3 nm are the key options for their CSS core cluster, although it's most likely to be a case of getting fab allocations with TSMC as we are unsure if any will use Samsung over TSMC.

In addition to security enhancements, Armv9.2 on 3 nm also promises substantial performance improvements, especially with the new big core, the Cortex X925, which Arm believes is the new IPC king of the mobile. The architecture has been optimized for higher clock speeds and better levels of efficiency, which in turn should deliver more compute power per watt. This is achieved through several architectural innovations, including wider execution pipelines, improved branch prediction, and enhanced out-of-order execution capabilities. These enhancements boost the cores' Instructions Per Cycle (IPC), ensuring they can easily handle the most demanding workloads.

Transitioning to 3 nm Process Technology

The move to 3 nm process technology represents a significant leap in semiconductor manufacturing, offering substantial improvements in performance, power consumption, and chip density. This transition allows Arm to deliver more powerful and efficient processors capable of handling the most demanding applications efficiently.

One of the primary benefits of the 3 nm process is its ability to pack more transistors into a smaller area, resulting in higher performance and lower power consumption. This is crucial for mobile and portable devices, where battery life and thermal management are critical considerations. The 3 nm process also enables Arm to push out higher clock speeds on the Cortex X925 core, up to 3.8 GHz, to be exact. This enables faster and more responsive computing experiences and pushes overall IPC performance above and beyond what was already achievable.

The combination of the updated Armv9.2 architecture, the new CSS platform, and the jump to the 3 nm process technology, as Arm claims, is designed to roll out significant performance and efficiency enhancements across the board. This should theoretically enable various implementations of their reference CPU Core Cluster designs for devices of all ilks, with two Cortex X cores being the go-to norm now, as opposed to just one from last year's reference design. Benchmarks and real-world tests conducted and presented by Arm, which should be taken with a grain of salt, show substantial gains in single-threaded and multi-threaded performance, making these new solutions ideal for various applications. Arm is even claiming single-threaded IPC leadership with the largest core, the Cortex X925, surpassing what both Intel and AMD are capable of, which is a bold claim.

Regarding power efficiency, the new cores are designed to deliver more compute power per watt, reducing energy consumption and extending battery life. This is particularly important for mobile devices, where users demand long battery life without compromising performance. The improved power efficiency also translates to better thermal management, ensuring that devices remain cool and responsive even under heavy workloads.

In addition to performance and efficiency improvements, the new solutions also bring enhanced security and AI capabilities. The Armv9.2 architecture's memory tagging extensions (MTE) and confidential compute architecture (CCA) provide robust protection against various security threats, ensuring that data and applications remain secure.

The enhanced AI capabilities of the new cores and GPUs are also noteworthy. With AI's increasing importance in modern applications, the new solutions are designed to accelerate AI workloads, delivering faster and more efficient AI processing. This is achieved through dedicated AI accelerators and optimizations that leverage the full potential of the new architecture and process technology.

Process technology migration to 3 nm presents many opportunities and challenges for semiconductor manufacturing. For soft IP, bigger and more complex microarchitectures need stronger voltage regulation and mitigation to ensure stability and performance. The key objective is to optimize the right PPA (Power, performance, area) on the target node. For physical IP, process complexity brings its own challenges, including scaling limitations and the requirement to support a wider dynamic voltage and frequency scaling (DVFS) spectrum. Furthermore, with extreme power density, this should mitigate thermal issues, and ensuring things are running efficiently, which is very important in a mobile device

To address these challenges, Arm takes a holistic view of RTL and physical implementation co-development. This ensures that their compute IP can meet performance expectations while overcoming the challenges of advanced process technologies.

Advancements in Armv9.2, CSS, and 3 nm technology open up new possibilities for various applications, including developers accessing the new Arm Kleidi libraries. In the mobile space, these solutions enable more powerful and efficient smartphones and tablets to handle complex tasks such as AI-powered photography, gaming, and productivity.

The new solutions deliver desktop-class performance in portable form factors for the PC market, making them ideal for laptops and 2-in-1 devices. The improved performance and efficiency also benefit professional content creation, allowing for faster rendering, editing, and multitasking.

In the AI and machine learning space, the new solutions provide the compute power needed for advanced AI applications, from natural language processing and computer vision to autonomous systems and robotics. The enhanced AI capabilities ensure these applications run efficiently and effectively, delivering faster and more accurate results.

As Arm continues to push the boundaries of semiconductor technology, the focus on enhancing the Armv9.2 architecture, introducing the CSS platform, and transitioning to the 3 nm process technology marks a significant step forward. These advancements substantially improve performance, power efficiency, and security, enabling a new generation of devices that can easily handle the most demanding applications.

Combining these technologies provides a powerful and versatile computing solution that can scale across different device form factors and use cases. Whether it's high-end gaming, professional content creation, or everyday productivity tasks, Arm's latest solutions are designed to deliver the best possible computing experience.

Good Hardware Benefits From Good Software

Arm's hardware advancements are bolstered by a sophisticated software ecosystem designed to exploit the full potential of its processors. At the heart of this ecosystem are the new Kleidi libraries, which play a crucial role in optimizing artificial intelligence (AI) and computer-based applications. These libraries provide developers with tools tailored to maximize the performance and efficiency of Arm's latest cores.

KleidiAI is a key component that focuses on accelerating AI workloads. It includes a comprehensive set of computational kernels optimized for Arm's architecture, enabling efficient execution of various AI tasks such as machine learning, natural language processing, and data analytics. By offering highly optimized routines for common AI operations, KleidiAI allows developers to achieve significant performance gains while maintaining energy efficiency. This is increasingly important as AI applications become more prevalent in mobile devices, smart home systems, and industrial automation.

KleidiCV, on the other hand, targets computer vision workloads. This library offers optimized functions for tasks such as image processing, object detection, and scene recognition. Integrating KleidiCV with Arm's architecture ensures that applications can handle visual data quickly and efficiently, making it ideal for use in augmented reality, autonomous vehicles, and intelligent surveillance systems. By leveraging these optimized libraries, developers can build sophisticated applications that run smoothly on Arm-based hardware, fully utilizing the performance and power efficiency improvements provided by the 3 nm process technology.

In addition to the Kleidi libraries, Arm provides a robust set of development tools and platforms. The Arm Compute Subsystem (CSS) platform includes reference software stacks and performance optimization tools like Arm Performance Studio, which offers detailed insights into application performance and helps developers fine-tune their software for maximum efficiency. This comprehensive support system ensures that developers can quickly and effectively bring innovative applications to market, taking full advantage of Arm's latest architectural advancements.

Over the next few pages, we'll break down Arm's improvements within its 2024 CPU cluster, including the new Cortex X925 and Cortex A725 cores and refinements made with the smallest core, the Cortex A520.

Arm Cortex X925: Leading The Way in Single-Threaded IPC
Comments Locked

55 Comments

View All Comments

  • StormyParis - Wednesday, May 29, 2024 - link

    Do these processors have anything to prevent exploits such as RowHammer etc ... ? Those & variants have been a big story, then disappeared, but we were never told about an actual solution ?
  • GeoffreyA - Wednesday, May 29, 2024 - link

    Are these companies so lame with their AI desperation?
  • abufrejoval - Wednesday, May 29, 2024 - link

    Memory tagging extensions have been around since ARM 8.5. When you say ARM 9.2 MTE, does that mean they have been significantly upgraded e.g. in the direction of what CHERI does for RISC-V?

    I've been trying to find out if ARM has an "AVX-512" issue with their different big/middle/small designs, too. That is if these distinct cores might actually differ in the range of instruction set extensions they support. And I can't get a clear picture, either.

    So if say the big cores support some clever new vector formats for AI and the middle or small cores won't, how will apps and OS deal with the issue?
  • GeoffreyA - Wednesday, May 29, 2024 - link

    I don't know enough about ARM to comment, but should think that there are compatibility issues, with instructions, spanning different models and generations. Perhaps there's a feature-level type of method?
  • Findecanor - Wednesday, May 29, 2024 - link

    ARM MTE is much cruder than CHERI. It can be described as "memory colouring": Every allocation in memory is tagged with one of 16 colours. Two adjacent allocations can't have the same colour. When you use a pointer the colour bits in otherwise unused top bits of the pointer have to match the colour of the allocation it points into.

    With SVE both E and P cores need to have the same vector length, yes. The vector length is usually no larger than a cache line which have to be the same size anyway.

    I don't know specifically about SME but many extensions have to first be enabled by the OS on a core to be available to user-mode programs. If not all cores have an extension, the OS may choose to not enable it on any.
  • mode_13h - Thursday, May 30, 2024 - link

    > The vector length is usually no larger than a cache line which have to be the same size anyway.

    Cache lines are usually 64 bytes, which is 512 bits. Presumably, the A520 has only SVE2 @ 128 bits. So, don't let ARM off the hook *that* easily!
  • eastcoast_pete - Wednesday, May 29, 2024 - link

    That is very much something I am wondering, too. Reports/rumors have it that, for example, Qualcomm chose not to enable SVE in the big cores of their SD 8 Gen3. Qualcomm isn't exactly forthcoming with information about that, not that I would expect them to comment.
  • name99 - Wednesday, May 29, 2024 - link

    That's a silly response. It's like being present at the birth of Mac or Windows and saying "why are these stupid hardware companies trying so hard to make their chips run graphics fast?"

    The hope of LLMs is that they will provide a substantial augmentation to existing UI. So instead of having to understand a complicated set of Photoshop commands, you'll be able to say something like "Highlight the subject of the photo. Now move it about an inch left. Now remove that power line in the background".
    This is not a trivial task; it requires substantial replumbing on existing apps, along with a fair degree of rethinking app architecture. Well, no-one said it would be easy to convert Visicalc to Excel...

    But that is where things are headed. And because ARM (and Apple, and QC, and MS) are not controlled by idiots who think only in terms of tweets and snark, each of these companies is moving heaven and earth to ensure that they will not be irrelevant during this shift.

    (Oh, you thought the entire world consisted of LLMs answering questions did you? Strange how the QUESTION-ANSWERING COMPANY, ie Google, has created that impression...
    Try thinking independently for once. Everything in the world happens along a dozen dimensions at once. All it takes to be a genius is to be able to hold *more than one* dimension in your head simultaneously.)
  • FunBunny2 - Wednesday, May 29, 2024 - link

    Well, no-one said it would be easy to convert Visicalc to Excel..

    well... Mitch did it, in assembler at first, and called it Lotus 1-2-3
  • GeoffreyA - Thursday, May 30, 2024 - link

    You can go on with your ad hominen and air of superiority; it won't change that these companies are tripping over themselves, in insecurity and desperation, to grab dollars from the AI pot or not be left behind.

    You make assumptions about my ideas based on one sentence. In fact, AI is quite interesting to me. Not how it's going to help someone in Photoshop or Visual Studio, but where LLMs eventually lead; whether they end up being the language faculty in strong AI, or more; what's missing from today's LLMs (state, being trained in real-time, connecting to the sense modalities, using little power in a small space, etc.); and whether consciousness, the last great riddle, will ever be solved, how, and the moral implications. That's of interest to me.

    But when one see Microsoft and Intel making an "AI PC," or AMD calling their CPU "Ryzen AI," and so on, it is little about true AI and more about money, checklists, and the bandwagon. Independence of thought is about seeing past fashion and the in-thing. And no thank you: I have got no desire, nor the insecurity, to want to be a genius.

Log in

Don't have an account? Sign up now