The Challenges with Computing Today
The computing industry is approaching a formidable obstacle course where anyone wishing to drive advances in computing technology must carefully negotiate several key trade-offs. First, reducing power consumption is increasingly critical across all segments of computing. Consumers want improved battery life, size, and weight for their laptops, tablets, and smartphones. Likewise data center power demands and cooling costs continue to rise.
At the same time, we demand constantly improving performance to enable compelling new user experiences. We want to access our devices through more natural interfaces (speech and gesture), and we want devices to manage ever-expanding volumes of data (home movies, pictures, and a world of content available in the cloud).
To deliver these new user experiences, programmer productivity is another essential element that must be delivered. It must be easy for software developers to tap into new capabilities by using familiar, powerful programming models.
Finally, it is increasingly important that software be supported across a broad spectrum of devices. Developers cannot sustain today’s trend of re-writing code for an ever expanding number of different platforms.
To navigate this complex set of requirements, the computer industry needs a different approach – a more efficient approach to computer architecture. We need an approach that promises to deliver improvement across all 4 of these vectors: power, performance, programmability and portability.
Introducing Heterogeneous System Architecture (HSA)
Since their earliest days, computers have contained central processing units (CPUs) designed to run general programming tasks very well. But in the last couple of decades, mainstream computer systems typically include other processing elements as well. The most prevalent is the graphics processing unit (GPU), originally designed to perform specialized graphics computations in parallel. Over time, GPUs have become more powerful and more generalized, allowing them to be applied to general purpose parallel computing tasks with excellent power efficiency.
But current CPUs and GPUs have been designed as separate processing elements and do not work together efficiently…
Today, a growing number of mainstream applications require the high performance and power efficiency achievable only through such highly parallel computation. But current CPUs and GPUs have been designed as separate processing elements and do not work together efficiently – and are cumbersome to program. Each has a separate memory space, requiring an application to explicitly copy data from CPU to GPU and then back again.
A program running on the CPU queues work for the GPU using system calls through a device driver stack managed by a completely separate scheduler. This introduces significant dispatch latency, with overhead that makes the process worthwhile only when the application requires a very large amount of parallel computation. Further, if a program running on the GPU wants to directly generate work-items, either for itself or for the CPU, it is impossible today!
HSA creates an improved processor design that exposes the benefits and capabilities of mainstream programmable compute elements, working together seamlessly.
To fully exploit the capabilities of parallel execution units, it is essential for computer system designers to think differently. The designers must re-architect computer systems to tightly integrate the disparate compute elements on a platform into an evolved central processor while providing a programming path that does not require fundamental changes for software developers. This is the primary goal of the new HSA design.
HSA creates an improved processor design that exposes the benefits and capabilities of mainstream programmable compute elements, working together seamlessly. With HSA, applications can create data structures in a single unified address space and can initiate work items on the hardware most appropriate for a given task. Sharing data between compute elements is as simple as sending a pointer. Multiple compute tasks can work on the same coherent memory regions, utilizing barriers and atomic memory operations as needed to maintain data synchronization (just as multi-core CPUs do today).
The HSA team at AMD analyzed the performance of Haar Face Detect, a commonly used multi-stage video analysis algorithm used to identify faces in a video stream. The team compared a CPU/GPU implementation in OpenCL™ against an HSA implementation. The HSA version seamlessly shares data between CPU and GPU, without memory copies or cache flushes because it assigns each part of the workload to the most appropriate processor with minimal dispatch overhead. The net result was a 2.3x relative performance gain at a 2.4x reduced power level*. This level of performance is not possible using only multicore CPU, only GPU, or even combined CPU and GPU with today’s driver model. Just as important, it is done using simple extensions to C++, not a totally different programming model.
- 4GB RAM; Windows 7 (64-bit); OpenCL™ 1.1
- APU: AMD A10 4600M with Radeon™ HD Graphics
- CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz)
- GPU: AMD Radeon HD 7660G, 6 compute units, 685MHz
Taking HSA to the Industry
To reach beyond mere niche adoption, it is essential to provide a deployment path beyond the realm of any single hardware vendor. The ultimate goal for software developers is “write once, run everywhere” which requires a unified install-base across all platforms and devices. This is the HSA vision. Thus, the HSA Foundation (HSAF) was formed as an open industry standards body to unify the computing industry around a common approach. The founding members of HSA were announced at the 2012 AFDS event: AMD, ARM, Imagination Technologies, MediaTek and Texas Instruments.
The HSA Foundation aims to help system designers integrate different kinds of computing elements (such as CPUs and GPUs) in a way that eliminates the inefficiencies of sharing data and sending work items between them. The HSA design allows multiple hardware solutions to be exposed to software through a common standard low-level interface layer, called HSA Intermediate Language (HSAIL). HSAIL provides a single target for low-level software and tools. HSAIL is sufficiently flexible and yet low-level enough to allow each hardware vendor to efficiently map to its individual underlying hardware design. And HSAIL frees the programmer from the burden of tailoring a program to a specific hardware platform – the same code runs on target systems with different CPU/GPU configurations.
Transparent to Software
An important key to the success of HSA is its ability to simplify the process of getting applications to run on the architecture. As seen in the past, it is not sufficient to ask application vendors to change their software to fit a new kind of hardware – that path leads to niche success at best. This is especially true for proprietary (non-standard) platforms. To reach the mainstream, it must be easy for everyone to participate. The HSA approach is simple: bring the hardware to the application programmer. HSA includes the hardware, interfaces, common intermediate language, and standard runtime components to do all the necessary work. HSA maintains memory coherency and manages work queues under the hood, without exposing the underlying system complexity to the application developer.
This means providing mainstream programming languages and libraries targeting HSA. This will provide a transparent path for millions of developers (along with their existing code) to directly benefit from the efficiencies of HSA. AMD is starting this process by delivering HSA optimized programming tools for today’s most widely available heterogeneous languages: OpenCL™ and C++ AMP. Going forward, AMD along with the HSAF members will expand the set of developer tools to encompass many other languages and libraries across multiple software domains and segments.
Getting Involved with HSA
HSA is all about delivering new, improved user experiences through advances in computing architectures that deliver improvements across all four key vectors: improved power efficiency; improved performance; improved programmability; and broad portability across computing devices.
To achieve this vision, the HSA Foundation is open to contributions from like-minded professionals across the computing industry – IHVs, OEMs/ODMs, OSVs, language and tools vendors, library and middleware vendors, and application vendors – who want to help realize the next era in computer system architecture and innovation. For information on HSA, HSAF, foundation membership, and contact information, please visit the HSA Foundation.