Sandy Bridge: the Next Generation of Intel Core i Processors
by Mark W. Hibben
Something for Both Mac and PC Users to Look Forward to in 2011
Since Intel introduced the Nehalem architecture in late 2008 with the first Core i7 and Xeon 5000 series processors, I've been underwhelmed by their subsequent offerings. Core i7 in the LGA 1366 form factor offered the highest performance desktop processing. Subsequent, less expensive LGA 1156 variants may have offered a better price/performance point, but nothing revolutionary. For 2011, Intel will offer yet another evolution of Nehalem, in yet another form factor, LGA 1155 in a 32 nm process, and yes, that one missing contact will require a different socket and motherboard. Although technically evolutionary, the changes are extensive and the performance boost impressive. Imagine a new slender MacBook Air with the processing capability of a current generation Core i7 desktop, while maintaining its great battery life. No wonder rumors are flying about Apple adoption of Sandy Bridge for its MacBooks.
The SOC Goes Mainstream
The Systems-on-a-Chip (SOCs) that power Android and Apple smart phones and tablets combine many of the functions that would be handled by a chip set in a desk top or evesn note book computer. Most importantly, the graphics processing unit (GPU), normally contained on a separate graphics card is integrated onto the SOC silicon die. Intel is borrowing the SOC approach for Sandy Bridge of integrated graphics in order to produce a high performance, low power processor for mobile and even desktop applications. In addition, Intel has completely consolidated traditional North Bridge functions such as memory interface and PCIe interface, a process that had begun with Nehalem, with a so-called System Agent for the SOC that performs power, memory, and cache management functions. In effect, the System Agent serves as the executive for the rest of the processor, powering up or down other sections of the SOC as needed. Even the display driver electronics, logically part of the GPU, are physically located and directly managed by the System Agent. The rationale for this turns out to be power efficiency.
When the display image isn't being changed, no graphics processing is required, so the GPU can be powered down by the System Agent while it handles the mundane task of refreshing the screen.
On Board GPU
Heretofore, Intel’s on-board GPU offerings have been utilitarian and uninspiring. The graphics capability was adequate for network appliances, netbooks, and low end notebooks, but hardly competitive with dedicated GPUs, even in mobile form. It remains to be seen how successful this latest version of built in graphics will be for Intel, but the new approach has much to recommend it. Intel claims roughly 2x throughput per execution unit in the GPU, taking pains to point out that “number of execution units isn’t everything”. I suppose this is by way of preparing us for the fact that the number of EUs in the GPU will not be as large as in a standalone GPU. Fair enough. But the GPU enjoys architectural advantages not available to a separate GPU. The GPU sits behind the lowest level cache LLC, with which it communicates via the new Ring Based Interconnect (RBI), which can provide to each processor or the GPU a 96 GB/s connection.
In effect, the GPU operates as an equivalent processor core, but Intel acknowledges that the bandwidth required by the GPU is often equivalent to all the other cores put together. Graphics performance also benefits from the new AVX media instructions discussed in the processor enhancements section.
The processor core architecture has been significantly revised to provide more performance while reducing or at least not increasing power consumption. Perhaps the most interesting is the uop cache, intended to allow the x86 decode pipeline to be powered down for repetitive operations. This speaks to the disadvantage that Intel Architecture processors have versus their RISC counterparts such as the ARM processors that serve as the core of the iPad and smartphone SOCs. Intel acknowledges that the logic to decode the traditional x86 machine code into the micro-code of the Sandy Bridge internal processing does consume a lot of power. Intel have developed a clever mitigation of this through the decoded uop cache. When a repetitive section of x86 code is decoded, the uop cache allows the uop code to continue executing while the x86 decode pipeline is turned off. Does this approach completely eliminate the power penalty of maintaining x86 compatibility? We'll have to see.
Another significant enhancement of the processor is the implementation of 256 bit Intel AVX Floating Point Vector Instructions, which feeds directly into graphics and media performance. To handle these new instructions, floating point processing data paths were widened to 32 bytes while maintaining roughly equivalent area utilization compared with previous 128 bit FP units.
Also supporting the higher performance processor core was a modification to the memory interface the doubled the bandwidth for memory load operations.
As mentioned above, the System Agent performs active power management on the chip. This is done by dynamically allocating power between the core/LLC zone and the GPU zone. Intel engineers observe that usually one or the other zone may require higher power, but not usually both.
In addition to dynamic power allocation between SOC zones, the System Agent also employs more sophisticated c-state algorithms to take advantage of idle states to maximize power usage.
As icing on the power management cake, there is even a “Next Generation” Turbo Boost technology that allows the processor to exceed its thermal design point (TDP) limit for as much as 30 seconds.
This accelerated turbo boost is intended to enhance “responsiveness” in performing brief but computationally intensive tasks such as the initial launch of an application.
Sandy Bridge Implications
With Sandy Bridge, Intel is proclaiming to the world that it knows how to play in the mobile SOC space. There is much here for ARM processor vendors and OEMs such as Apple, Motorola, Samsung and HTC to be concerned about. For the better part of a year, it’s been apparent that a convergence (or perhaps a collision course) was underway in which the low-power RISC-based ARM SOCs became more computationally powerful (usually by adding processing cores) while the high-power CISC-based Intel processors became more energy efficient. Kind of like Ents and trees. Next year should see Intel enter the War for Mobile Internet Supremacy in earnest, with their Moorestown SOCs. Moorestown and Sandy Bridge will offer a unifying architecture that will be very appealing to vendors such as Apple, since it could allow Intel processors (all x86 based) to be promulgated across a broad product line of “mobile” devices including tablets, phones, and notebooks.