Sunday, August 1, 2010

Intel's G965 Express chipset

By: Geoff Gasior

AS AMD AND NVIDIA trade blows in a seemingly perpetual but always animated
battle for graphics dominance, it's easy to forget that the 800-pound gorilla sitting in the corner still commands the lion's share of the market.
This unlikely king of the jungle has risen to power not on the strength
of ultra-high-end GPUs strapped to elaborate cooling systems, nor on the
back of popular mid-range products that offer unparalleled value for money.
No, it's the ubiquity of Intel's integrated graphics chipsets that have allowed it to carve out the largest share of the desktop graphics market.

The latest addition to Intel's integrated graphics arsenal is the Graphics
Media Accelerator X3000, which can be found in the company's G965 Express chipset. This isn't your average integrated graphics core, though. Intel went
all out with the X3000, crafting a graphics core with a unified shader
architecture that sports eight Shader Model 3.0-compliant scalar execution
units and a blistering 667MHz clock speed. Combine that with a Clear Video processing engine and support for HDMI output with HDCP, and you have
quite an attractive graphics proposition for budget systems.

Can the X3000-equipped G965 Express hold its own against competing
chipsets from AMD and Nvidia? Has Intel produced its first truly
competitive integrated graphics core? Read on to find out.




A unified approach to integrated graphics
Intel's GMA X3000 graphics core sits at the heart of the G965 Express north bridge, and it's quite a departure from IGPs of old. Like the G80 graphics processor that powers Nvidia's high-end GeForce 8800 series, the X3000 has a unified shader architecture populated with eight scalar execution units that can perform both pixel and vertex operations. In such an architecture, dynamic load balancing can ensure the most efficient use of the chip's execution units based on the demands of a given scene, be it biased toward pixel shading calculations, vertex calculations, or a balance of the two.

Intel says it designed the GMA X3000 to be compliant with DirectX 10's Shader Model 4.0. That said, its status as a DX10-compliant part is questionable. For now, the GMA X3000's internal architecture manifests itself as a DirectX 9-class part that's quite fully compliant with the Shader Model 3.0 spec. Vertex texture fetch, instancing, and flow control are all implemented in hardware. 32 bits of floating point precision are available throughout, and shader programs are supported up to 512 instructions in length.

Integrated graphics processors typically lack dedicated vertex processing hardware, instead preferring to offload those calculations onto the CPU. As a unified architecture, the GMA X3000 is capable of performing vertex processing operations in its shader units, but it doesn't do so with Intel's current video drivers. Intel has a driver in the works that implements hardware vertex processing (which we saw in action at GDC), but it's not yet ready for public consumption.

Intel says the question of DirectX 10 support for the GMA X3000 is a driver issue, as well. Intel could release a driver to enable DX10 support, but may never do so. Although this may sound like a brewing scandal at first blush, it's almost assuredly not.

Intel hasn't sold the G965 as a DX10-ready solution, and even if the IGP could replicate the behavior and produce the output required to meet the DX10 specification, it's probably not powerful enough to do so in real time.

Given that, we would be surprised to see Intel release DirectX 10 drivers for the GMA X3000 to the public. When addressing the DX10 question, Intel simply points out that this shader architecture is a good basis for future products with proper DX10 support.

Here's a quick look at how the GMA X3000 compares with the current DX9-class competition, with some caveats to follow:



The first caveat we should mention involves shader execution units, which we've not even included in the table above because simple comparisons between the GMA X3000 and the others are tricky. The eight shader execution units in the GMA X3000 may sound like a lot, but those execution units are scalar—they can only operate on one pixel component at a time. A typical pixel has four components (red, green, blue, and alpha), so the GMA X3000 can really only process two complete pixels per clock cycle. The GeForce 6150 has two traditional pixel shader processors, so it can handle just as many pixels per clock, and the Radeon X1250 IGP in the AMD 690G has four pixel shader processors, for twice the per-clock capacity.

These things get even more complex when you look under the covers, and a whole host of qualifications and mitigating circumstances become apparent. For instance, the GMA 3000's scalar architecture could allow it to allocate execution resources more efficiently than the two more traditional architectures, giving it a performance edge. On the flip side, the individual pixel shader processors in the Nvidia and AMD IGPs are relatively rich in both programmable and special-purpose execution resources, and they may deliver more FLOPS per clock than the GMA X3000, depending on the instruction mix. Also, according to an intriguing discussion here, Intel looks to be using the GMA X3000's execution units to handle triangle setup, a chore assigned to dedicated hardware in the other IGPs. Sharing can be good, but too much sharing can drift into pinko-commie excess. Sharing execution resources with both vertex shading and triangle setup could overtax the X3000's pixel shading capacity.
Then again, the chip does have more clock cycles to work with. Running at 667MHz, the GMA's graphics core is clocked a full 40% higher than the Radeon X1250 and close to 30% higher than the fastest GeForce 6100.

We expect, though, that not all of the GMA X3000 runs at 667MHz, as the strange numbers in the "pixels per clock" and "textures per clock" entries in the table above suggest. Intel says the G965 can compute two raster operations per clock maximum, but only for clears. For any other 3D raster op, it's limited to 1.6 pixels per clock. Similarly, it can process depth operations at 4 pixels per clock, but is limited to 3.2 pixels per clock for single, bilinear-filtered textures. What we may be seeing here is the result of different clock domains for the shader processors and the IGP's back end; the GeForce 8800 has a similar arrangement. Whatever the case, these numbers work out to theoretical fill rates of 1067 Mpixels/s and 2133 Mtexel/s. That puts the G965 ahead of the AMD 690G (1600 Mtexels/s) and the GeForce 6150 (950 Mtexels/s) in peak texturing capacity.

The X3000 looks impressive in the output department, as well, packing support for DVI, HDMI, and VGA outputs alongside a TV encoder. Additional outputs are also supported via the chip's sDVO (Serial Digital Video Output) interface, although motherboard makers will ultimately decide which of the X3000's various output options will be made available to end users.

Complementing the X3000's generous assortment of video outputs is a Clear Video processing engine that offers advanced de-interlacing algorithms and a measure of color correction. Clear Video can also accelerate VC-1 high-definition video decoding, allowing it to shoulder some of the burden associated with WMV HD video playback. Hardware assist is supported for high-definition MPEG2 video playback, as well.

Dynamic Video Memory Technology (DVMT) rounds out the X3000's feature set, enabling the chip to dynamically allocate system memory as needed. DVMT works by dedicating a small portion (in this case 1MB or 8MB, configured through the BIOS) of system memory to the graphics core at all times. Users can then elect to cordon off an additional chunk of system memory to the graphics core or allow DVMT to allocation additional video memory as needed on its own.

No comments:

Post a Comment