Geforce 9800 gt series product. Determining the product series of Nvidia video cards

technology (nm)90 80 65/55 transistors (M)681 289 210 754 505 314 universal processors128 32 16 128 64 32 texture blocks32 16 8 64 32 16 blending blocks24 8 16 8 memory bus384 (64x6)128 (64x2)256 (64x4)128 (64x2) memory typesDDR, GDDR2, GDDR3, GDDR4 chip system busPCI-Express 16xPCI-Express 2.0 16x RAMDAC2 x 400MHz interfacesTV-Out
TV-In (requires capture chip)
2 x DVI Dual Link
HDTV-OutTV-Out
TV-In (requires capture chip)
2 x DVI Dual Link
HDTV-Out
HDMITV-Out
TV-In (requires capture chip)
2 x DVI Dual Link
HDTV-Out
HDMI
DisplayPort vertex shaders4.0 pixel shaders4.0 pixel calculation accuracyFP32 vertex calculation accuracyFP32 texture formatsFP32)
FP16
I8
DXTC, S3TC
3Dc rendering formatsFP32
FP16
I8
10
other MRTThere is AntialiasingTAA (Transparent Polygon AA)
CSAA 2x-16x
generation Z2x in no color mode template bufferbilateral shadow technologyhardware shadow maps
optimization of geometric shadows

Specifications of reference cards based on the G8X family

mapchip
tire
ALU/TMU blockscore frequency (MHz)memory frequency (MHz)memory capacity (MB)PSP (GB)Texel rate (Mtex)fill
rate (Mpix)
GeForce 8500 GTG86
PEG16x
16/8 450 400(800) 256 DDR212.8
(128)
3600
GeForce 8600 GTG84
PEG16x
32/16 540 700(1400) 256 GDDR322.4
(128)
8600 4300
GeForce 8600 GTSG84
PEG16x
32/16 675 1000(2000) 256 GDDR332.0
(128)
10800 5400
GeForce 8800 GTS 320MBG80
PEG16x
96/24 500 800(1600) 320GDDR364.0
(320)
12000 10000
GeForce 8800 GTS 640MBG80
PEG16x
96/24 500 800(1600) 640 GDDR364.0
(320)
12000 10000
GeForce 8800 GTXG80
PEG16x
128/32> 575 900(1800) 768 GDDR386.4
(384)
18400 13800
GeForce 8800 UltraG80
PEG16x
128/32 612 1080(2160) 768 GDDR3104.0
(384)
19600 14700
GeForce 8800 GT 256MBG92
PEG16x
112/56 600 700(1400) 256 GDDR344.8
(256)
33600 9600
GeForce 8800 GT 512MBG92
PEG16x
112/56 600 900(1800) 512 GDDR357.6
(256)
33600 9600
GeForce 8800 GTS 512MBG92
PEG16x
128/64 650 1000(2000) 512 GDDR364.0
(256)
41600 10400
GeForce 8800GSG92
PEG16x
96/48 550 800(1600) 384 GDDR338.4
(192)
26400 6600
GeForce 9400 GTG96
PEG16x
16/8 550 800(1600) 256/512 GDDR225.6
(128)
4400 4400
GeForce 9500 GTG96
PEG16x
32/16 550 800(1600) 256/512 GDDR2/GDDR325.6
(128)
8800 4400
GeForce 9600 GSOG92
PEG16x
96/48 550 800(1600) 384 GDDR338.4
(192)
26400 6600
GeForce 9600 GTG94
PEG16x
64/32 650 900(1800) 512 GDDR357.6
(256)
20800 10400
GeForce 9800 GTG92
PEG16x
112/56 600 900(1800) 512 GDDR357.6
(256)
33600 9600
GeForce 9800 GTXG92
PEG16x
128/64 675 1100(2200) 512 GDDR370.4
(256)
43200 10800
GeForce 9800 GTX+G92
PEG16x
128/64 738 1100(2200) 512/1024 GDDR370.4
(256)
47200 11800
GeForce 9800 GX22xG92
PEG16x
2x(128/64)600 1000(2000) 2x512 GDDR32x64.0
(2x256)
76800 19200
GeForce GTS 250G92
PEG16x
128/64 738 1100(2200) 512/1024 GDDR370.4
(256)
47200 11800
mapchip
tire
ALU/TMU blockscore frequency (MHz)memory frequency (MHz)memory capacity (MB)PSP (GB)Texel rate (Mtex)fill
rate (Mpix)

Details: G80, GeForce 8800 family

G80 Specifications

  • The official name of the GeForce 8800 chip
  • Codename G80
  • 90 nm technology
  • 681 million transistors
  • Unified architecture with an array of shared processors for stream processing of vertices and pixels, as well as other possible types of data
  • Hardware support for the latest DirectX 10 innovations, including the new shader model - Shader Model 4.0, geometry generation and recording intermediate data from shaders (stream output)
  • 384-bit memory bus, 6 independent controllers 64-bit wide, GDDR4 support
  • Core frequency 575 GHz (GeForce 8800 GTX)
  • 128 scalar floating-point ALUs (integer and floating formats, IEEE 754 32-bit precision FP support, MAD+MUL without clock loss)
  • ALUs operate at more than double the frequency (1.35 GHz for 8800 GTX)
  • 32 texture units, support for FP16 and FP32 components in textures
  • 64 bilinear filtering units (i.e., free fair trilinear filtering is possible, as well as anisotropic filtering that is twice as fast in speed)
  • - planning block size - 8x4 (32) pixels.
  • 6 wide ROP blocks (24 pixels) with support for antialiasing modes up to 16 samples per pixel, including with FP16 or FP32 frame buffer format (i.e. HDR+AA possible). Each block consists of an array of flexibly configurable ALUs and is responsible for generating and comparing Z, MSAA, and blending. Peak performance of the entire subsystem is up to 96 MSAA samples (+ 96 Z) per clock cycle, in the mode without color (Z only) - 192 samples per cycle.
  • All interfaces are provided on an external additional NVIO chip (2 RAMDAC, 2 Dual DVI, HDMI, HDTV)
  • Very good scalability of the architecture, you can block or remove memory and ROP controllers one at a time (6 in total), shader units (8 TMU+ALU units in total)

GeForce 8800 GTX reference card specifications

  • Core frequency 575 MHz
  • Universal processor frequency 1350 MHz
  • Number of texture blocks - 32, blending blocks - 24
  • Memory capacity 768 megabytes
  • Memory bandwidth 86.4 gigabytes per second.
  • Theoretical maximum fill rate is 13.8 gigapixels per second.
  • Theoretical texture sampling speed is 18.4 gigatexels per second.
  • SLI connector
  • PCI-Express 16x bus
  • RRP $599

GeForce 8800 GTS reference card specifications

  • Core frequency 500 MHz
  • Universal processor frequency 1200 MHz
  • Number of universal processors 96
  • Number of texture blocks - 24, blending blocks - 20
  • Memory type GDDR3, 1.1 ns (standard frequency 2*900 MHz)
  • Memory capacity 640 megabytes
  • Theoretical maximum fill rate is 10.0 gigapixels per second.
  • Theoretical texture sampling speed is 12.0 gigatexels per second.
  • Two DVI-I connectors (Dual Link, supports output resolutions up to 2560x1600)
  • SLI connector
  • PCI-Express 16x bus
  • TV-Out, HDTV-Out, HDCP support
  • RRP $449

Architecture

We have been waiting a long time for the transition to unified graphics architectures. Now we can state a fact - with the advent of the GeForce 8800, this transition happened, and the critical peak has already been passed. This will be followed by a gradual descent of similar architectures into the mid-range and budget segments and their further development, up to their merger with multi-core processor architectures in the long term. So, let's get acquainted with the first unified architecture from NVIDIA:

We have the entire diagram of the chip in front of us. The chip consists of 8 universal computing units (shader processors) and although NVIDIA talks about 128 processors, stating that each ALU is one, this is somewhat incorrect - the command execution unit is such a processor unit, in which 4 TMUs and 16 ALUs are grouped. In total, therefore, we have 128 ALUs and 32 TMUs, but the execution granularity is 8 blocks, each of which at one moment can do its own thing, for example, execute part of a vertex, or pixel, or geometry shader over a block of 32 pixels (or block of the corresponding number of vertices and other primitives). All branches, transitions, conditions, etc. are applied entirely to one block and thus it is most logical to call it a shader processor, albeit a very wide one.

Each such processor is equipped with its own first-level cache, which now stores not only textures, but also other data that can be requested by the shader processor. It is important to understand that the main stream of data, for example pixels or vertices, which are processed, moving in a circle under the control of the gray cardinal (the block marked on the Thread Processor diagram) - are not cached, but flow, which is the main beauty of today's graphics architectures - lack of completely random access at the level of processed primitives.

In addition to the control unit and 8 compute shader processors, there are 6 ROP units that perform visibility detection, writing to the frame buffer and MSAA (blue, next to the L2 cache blocks) grouped with memory controllers, write queues and a second-level cache.

Thus, we received a very wide (8 blocks processing portions of 32 pixels each) architecture capable of smoothly scaling in both directions. Adding or removing memory controllers and shader processors will scale the throughput of the entire system accordingly without unbalancing or creating bottlenecks. This is a logical and beautiful solution that implements the main advantage of the unified architecture - automatic balance and high efficiency in using available resources.

In addition to shader blocks and ROPs, there is a set of control and administrative blocks:

  • Blocks that launch data of certain formats (Vertex, Geometry and Pixel Thread Issue) for execution are a kind of gatekeepers that prepare data for the number crusher in shader processors in accordance with the data format, the current shader and its state, branching conditions, etc.
  • Setup/Raster/ZCull - a block that turns vertices into pixels - installation is performed here, rasterization of the triangle into blocks of 32 pixels, preliminary block HSR.
  • Input Assembler is a block that selects geometric and other initial data from system memory or local memory, collecting initial data structures from streams that will go from outside to the input of our “carousel”. And at the end, after many circles under the control of the vertex, geometry, pixel shader and blending settings, we will get ready-made (and smoothed, if necessary) pixels from ROP blocks.

By the way, a small digression: it is clear that in the future these blocks will become more general in nature and will not be so tied to specific types of shaders. Those. will simply turn into universal blocks that launch data for calculation and conversion of formats - for example, from one shader to another, from vertex to pixel, etc. This will not introduce any fundamental changes to the architecture; the diagram will look and work almost the same, with the exception of a smaller number of special “gray” blocks. Already, all three Thread Issue blocks are most likely (really) one block with common functionality and contextual additions:

Shader processor and its TMU/ALU

So, in each of the 8 shader units there are 16 scalar ALUs. Which, again, gives us the potential opportunity to increase their load efficiency up to 100%, regardless of the shader code. The ALUs operate at double the frequency and thus match or exceed (depending on shader operations) 8 old-style four-way vector ALUs (G70) at the same base core frequency. NVIDIA provides the following peak performance calculation:

However, it is valid for the most disadvantageous option for others, when two multiplications take place. In real life, it is worth dividing this advantage by one and a half times or so. But, in any case, these scalar ALUs, due to their higher clock speed and their number, will outperform all previously existing chips. With the possible exception of the SLI configuration of the G71, in the case of shaders that are not the most advantageous for the new architecture.

Interestingly, the precision of all ALUs is FP32 and, given the new architecture, we do not foresee any benefit for FP16 shaders with reduced precision. Another interesting point is the support for calculations in integer format. This item is required to implement SM4. The implementation of arithmetic complies with the IEEE 754 standard, which makes it suitable for serious non-game calculations - scientific, statistical, economic, etc.

Now about the interaction of texture units and ALUs within one shader unit:

The operation of sampling and filtering textures does not require ALU resources and can now be performed completely in parallel with mathematical calculations. Generation of texture coordinates (in the diagram - A) still takes up some of the ALU's time. This is logical if we want to use the chip’s transistors 100%, because generating texture coordinates requires standard floating operations and it would be imprudent to have separate ALUs for it.

The texture modules themselves have the following configuration:

There are 4 modules for addressing TA textures (determining the exact address for sampling by coordinates) and twice as many modules for TF bilinear filtering. Why is this so? This allows, with moderate transistor consumption, to provide free, honest trilinear filtering or to halve the speed drop during anisotropic filtering. Speed ​​at regular resolutions, with regular filtering and without AA has long been meaningless - and the previous generation of accelerators copes well in such conditions. The new chip supports both FP16/FP32 texture formats, as well as SRGB gamma correction at the input (TMU) and output (ROP).

Here are the specifications of the shader model of the new processors that meet the SM4 requirements:

There are significant quantitative and qualitative changes - fewer and fewer restrictions for shaders, more and more in common with the CPU. So far, without any special random access (such an operation appeared in SM4 - the Load Op item in the diagram, but its effectiveness for general purposes is still doubtful, especially in the first implementations), but there is no doubt that this aspect will soon be developed, as Support for FP formats has been developed over these 5 years - from the first samples in the NV30 to a total, end-to-end FP32 pipeline in all modes now in the G80.

As we remember, in addition to 8 shader units, there are 6 ROP units:

The diagram shows two separate paths for Z and C, but in reality it's just a single set of ALUs that split into two groups when processing color pixels, or act as one group when processing in Z-Only mode, thus doubling the throughput. Nowadays, there is no point in counting individual pixels - there are already enough of them; it is more important to calculate how many MSAA samples can be processed per clock cycle. Accordingly, with MSAA 16x the chip can produce 6 full pixels per clock cycle, with 8x - 12, etc. Interestingly, the scalability of working with the frame buffer is excellent - as we remember, each ROP unit works with its own memory controller and does not interfere with neighboring ones.

And finally, there is full support for FP32 and FP16 frame buffer formats along with antialiasing, now there are no restrictions on the imagination of developers, and HDR throughout the entire pipeline does not require changing the overall sequence of frame construction, even in AA mode.

CSAA

A new smoothing method has also appeared - CSAA. A detailed study of it will be on the site soon, but for now we note that this method is in many ways similar to the ATI approach and also deals with pseudo-stochastic patterns and the spread of samples to adjacent geometric zones (the pixel is smeared, the pixels do not have a sharp boundary, but seem to move one to the other with TZ AA, covering a certain area). Moreover, the colors of the samples and the depth are stored separately from information about their location, and thus there can be 16 samples per pixel but, for example, only 8 calculated depth values ​​- which further saves bandwidth and clock cycles.

It is known that classic MSAA in modes greater than 4x becomes very demanding in terms of memory, while the quality grows less and less. The new method corrects this, allowing 16x anti-aliasing to be noticeably better than 16x MSAA, with a computational cost comparable to 4x MSAA.

NVIO

Another innovation in the G80 is interfaces placed outside the main accelerator chip. A separate chip called NVIO is now responsible for them:

This chip integrates:

  • 2 * 400 MHz RAMDAC
  • 2 * Dual Link DVI (or LVDS)
  • HDTV-Out

The output subsystem looks like this:

The accuracy is always 10 bits per component. Of course, in the mid-range segment, and especially in budget solutions, a separate external chip may not be retained, but for expensive cards such a solution has more advantages than disadvantages. Interfaces occupy a significant area of ​​the chip, are highly dependent on interference, and require special power supply. By eliminating all these problems with an external chip, you can gain in output signal quality and configuration flexibility, and also not complicate the design of an already complex chip by taking into account optimal modes for on-chip RAMDACs.

Details: G84/G86, GeForce 8600 and 8500 families

G84 Specifications

  • The official name of the chip is GeForce 8600
  • Code name G84
  • 80 nm technology
  • 289 million transistors
  • Core clock up to 675 MHz (GeForce 8600 GTS)
  • ALUs operate at more than double the frequency (1.45 GHz for GeForce 8600 GTS)
  • 16 texture units, support for FP16 and FP32 components in textures
  • 16 bilinear filtering blocks (compared to the G80, there is no free trilinear filtering and anisotropic filtering that is more efficient in speed)
  • Possibility of dynamic branches in pixel and vertex shaders
  • Record results from up to 8 frame buffers simultaneously (MRT)

GeForce 8600 GTS reference card specifications

  • Core frequency 675 MHz
  • Universal processor frequency 1450 MHz
  • Memory type GDDR3
  • Memory capacity 256 megabytes
  • Memory bandwidth 32.0 gigabytes per second.
  • Theoretical maximum fill rate is 5.4 gigapixels per second.
  • Theoretical texture sampling speed is 10.8 gigatexels per second.
  • Power consumption up to 71 W
  • SLI connector
  • PCI-Express 16x bus
  • TV-Out, HDTV-Out, HDCP support
  • Recommended price $199-229

GeForce 8600 GT reference card specifications

  • Core frequency 540 MHz
  • Universal processor frequency 1180 MHz
  • Number of universal processors 32
  • Number of texture blocks 16 (see synthetics), blending blocks 8
  • Memory type GDDR3
  • Memory capacity 256 megabytes
  • Memory bandwidth 22.4 gigabytes per second.
  • Theoretical maximum fill rate is 4.3 gigapixels per second.
  • Theoretical texture sampling speed is 8.6 gigatexels per second.
  • Power consumption up to 43 W
  • SLI connector
  • PCI-Express 16x bus
  • Recommended price $149-159

G86 Specifications

  • The official name of the chip is GeForce 8500
  • Code name G86
  • 80 nm technology
  • 210 million transistors
  • Unified architecture with an array of shared processors for stream processing of vertices and pixels, as well as other types of data
  • Hardware support for DirectX 10, including the new shader model Shader Model 4.0, geometry generation and recording intermediate data from shaders (stream output)
  • 128-bit memory bus, two independent 64-bit wide controllers
  • Core clock up to 450 MHz (GeForce 8500 GT)
  • ALUs operate at double frequency (900 MHz for GeForce 8500 GT)
  • 16 scalar floating-point ALUs (integer and floating formats, IEEE 754 32-bit precision FP support, MAD+MUL without clock loss)
  • 8 texture units, support for FP16 and FP32 components in textures
  • 8 bilinear filtering blocks (compared to G80, there is no free trilinear filtering and more speed-efficient anisotropic filtering)
  • Possibility of dynamic branches in pixel and vertex shaders
  • 2 wide ROP blocks (8 pixels) with support for antialiasing modes up to 16 samples per pixel, including with FP16 or FP32 frame buffer format. Each block consists of an array of flexibly configurable ALUs and is responsible for generating and comparing Z, MSAA, and blending. Peak performance of the entire subsystem up to 32 MSAA samples (+ 32 Z) per clock, in Z only mode 64 samples per clock
  • Record results from up to 8 frame buffers simultaneously (MRT)
  • All interfaces (two RAMDAC, two Dual DVI, HDMI, HDTV) are integrated on the chip (unlike those placed on an external additional NVIO chip in the GeForce 8800)

GeForce 8500 GT reference card specifications

  • Core frequency 450 MHz
  • Universal processor frequency 900 MHz
  • Effective memory frequency 800 MHz (2*400 MHz)
  • Memory type DDR2
  • Memory capacity 256/512 megabytes
  • Memory bandwidth 12.8 gigabytes per second.
  • Theoretical maximum fill rate is 3.6 gigapixels per second.
  • Theoretical texture sampling speed is 3.6 gigatexels per second.
  • Power consumption up to 40 W
  • Two DVI-I Dual Link connectors, supports output resolutions up to 2560x1600)
  • SLI connector
  • PCI-Express 16x bus
  • TV-Out, HDTV-Out, optional HDCP support
  • Recommended price $89-129

G84 and G86 architecture

Already from the specifications it is clear that the G84 is something between one-fourth and one-third of the flagship of the G80 line. In terms of the number of universal processors, it’s a quarter, and in terms of the number of ROP units and memory controllers, it’s a third. It’s more difficult with texture units; it doesn’t seem to be a quarter, but not a half either, we’ll talk about this below. The G86, in turn, is generally something interesting - in terms of computing power it is only 1/8 of the G80, and in terms of ROP it is still the same 1/3. Obviously, NVIDIA is in no hurry to launch low-end chips that are computationally fast.

The main question here is: will this same quarter and 1/8 be enough to compete with current solutions and future AMD chips? Did NVIDIA cut the number of blocks too much? Moreover, it cannot be said that both chips are too small in terms of the number of transistors... G84 has almost half of G80 transistors, and G86 has almost a third. It seems that the solution is a compromise; if they had left half of the G80 blocks, the chip would have been too expensive to produce, and would have been a successful competitor to its own GeForce 8800 GTS.

In the near future, most likely, based on 65 nm technology it will be possible to make more productive chips for the middle and lower price ranges, but for now this is what has happened. We will look at the performance of the new chips in synthetic and game tests, but we can already say that the G84 and G86 may not be too fast due to the small number of ALUs; they will most likely be approximately on par with current solutions of similar prices.

We will not dwell on the architecture of the G84 and G86 in too much detail; there are few changes compared to the G80; everything said in the GeForce 8800 review, adjusted for quantitative characteristics, remains valid. But still, we will describe the main points that are worth our attention and present several slides dedicated to the architectural specifications of the new chips.

The G80 consists of eight universal computing units (shader processors); NVIDIA prefers to talk about 128 processors. The command execution unit, apparently, is an entire processor unit in which 4 TMUs and 16 ALUs are grouped. Each of the blocks at one moment can execute part of a vertex, pixel or geometry shader over a block of 32 pixels, vertices or other primitives, and can also perform physical calculations. Each processor has its own level 1 cache, which stores textures and other data. In addition to the control unit and compute shader processors, there are six ROP units that perform visibility detection, writing to the frame buffer and MSAA, grouped with memory controllers, write queues and a second-level cache.

This architecture is capable of scaling in both directions, which is what was done in the new solutions. We have already mentioned this beautiful solution, which implements the main advantage of the unified architecture - automatic balance and high efficiency of using available resources in the article on the GeForce 8800. It was also assumed that the mid-level solution would consist of half of the computing units, and the solution would be based on two shader processors and one ROP will become budgetary. Unfortunately, while the GeForce 8800 had eight processors, making up 32 TMUs and 128 ALUs, the new chips have reduced their number more than we originally expected. Apparently, the G84 circuit looks like this:

That is, everything remained unchanged, except for the number of blocks and memory controllers. There are some minor changes to the texture blocks that are noticeable in this image, but we'll talk about that later. Curious, where did so many transistors go if only 32 processors were left in the G84? The G84 has almost half the transistors compared to the G80, with a significantly reduced number of memory channels, ROPs and shader processors. And the G86 has a lot of transistors, with only 16 processors...

It is also interesting how well in real applications the load will be balanced between the execution of vertex, pixel and geometry shaders, since the number of universal execution units has now become significantly smaller. Moreover, the unified architecture itself poses new challenges for developers; when using it, they will have to think about how to effectively use the common power between vertex, pixel and geometry shaders. Let's give a simple example, focusing on pixel calculations. In this case, an increase in the load on vertex blocks in a traditional architecture will not lead to a drop in performance, but in a unified architecture it will cause a change in the balance and a decrease in the amount of resources for pixel calculations. We will definitely look into the issue of performance, and now we will continue to study changes in the architecture of the G84 and G86.

Shader processor and TMU/ALU

The scheme of shader units and an assessment of their peak computing performance of the G80 was given in the corresponding article; for the G84 and G86 the scheme has not changed, and their performance is easy to recalculate. The ALUs in the chips also operate at double the frequency and are scalar, which allows for high efficiency. There are no differences in functionality, the accuracy of all ALUs is FP32, there is support for calculations in integer format, and the implementation complies with the IEEE 754 standard, which is important for scientific, statistical, economic and other calculations.

But the texture modules have changed compared to those used in the G80; NVIDIA assures that architectural changes have been made in the new chips to increase the performance of unified processors. In the G80, each texture engine could calculate four texture addresses and perform eight texture filtering operations per clock cycle. It is claimed that in the new chips the first number has been doubled, and it is capable of twice the number of texture samples. That is, the G84 and G86 texture modules have the following configuration (for comparison, the diagram of the G80 block is shown on the left):

According to NVIDIA, now each of the blocks has eight texture addressing modules (determining the exact address for sampling by coordinates) TA and exactly the same number of bilinear filtering (TF) modules. The G80 had four TA and eight TF modules, which made it possible to provide “free” trilinear filtering with reduced transistor consumption or halve the speed drop during anisotropic filtering, which is useful specifically for top-level accelerators, where anisotropic filtering is almost always used by users. We will check the correctness of this information in the practical part; be sure to look at the analysis of the corresponding synthetic tests, as they contradict this data.

All other functionality of texture units is the same, texture formats FP16/FP32 and others are supported. Only if on the G80 FP16 texture filtering was also at full speed due to the doubled number of filtering units, this is no longer the case in mid- and low-level solutions (again, with provided that the above changes actually exist).

ROP blocks, framebuffer writes, anti-aliasing

The ROP blocks, of which there were six in the G80, and two in the new chips, have not changed:

Each block processes four pixels (16 subpixels), for a total of 8 pixels per clock for color and Z. In Z-only mode, twice as many samples are processed per clock. At MSAA 16x, the chip can produce two pixels per clock cycle, at 4x × 8, etc. Like the G80, there is full support for FP32 and FP16 frame buffer formats along with antialiasing.

The new anti-aliasing method known from the GeForce 8800 is Coverage Sampled Antialiasing (CSAA), which was described in detail in the corresponding material:

Briefly, the essence of the method is that sample colors and depth are stored separately from information about their location; one pixel can have 16 samples and only 8 calculated depth values, which saves bandwidth and clock cycles. CSAA allows you to get away with transmitting and storing a single color or Z value per subpixel, refining the average value of a screen pixel with more detailed information about how that pixel overlaps the edges of triangles. As a result, the new method allows us to obtain a 16x anti-aliasing mode, which is noticeably higher quality than MSAA 4x, with computational costs comparable to it. And in the rare cases in which the CSAA method does not work, the result is normal MSAA of a lesser degree, rather than no anti-aliasing at all.

PureVideo HD

Let's move on to the most interesting changes. It turns out that the G84 and G86 have innovations that set them apart even from the G80! This concerns the built-in video processor, which in the new chips has received expanded support for PureVideo HD. It is stated that these chips completely relieve the system’s central processor when decoding all types of common video data, including the most “heavy” H.264 format.

The G84 and G86 use a new model of programmable PureVideo HD video processor, more powerful than that used in the G80, and including the so-called BSP engine. The new processor supports decoding H.264, VC-1 and MPEG-2 formats with resolutions up to 1920x1080 and bitrates up to 30-40 Mbps; it does all the work of decoding CABAC and CAVLC data in hardware, which allows you to play all existing HD-DVDs and Blu-ray discs even on medium-power single-core PCs.

The video processor in the G84/G86 consists of several parts: the second generation Video Processor (VP2), which performs the tasks of IDCT, motion compensation and blocking artifact removal for MPEG2, VC-1 and H.264 formats, supporting hardware decoding of the second stream; stream processor (BSP), which performs CABAC and CAVLC statistical decoding tasks for the H.264 format, and these are some of the most time-consuming calculations; AES128 protected data decoding engine, the purpose of which is clear from its name - it decrypts video data used in copy protection on Blu-ray and HD-DVD discs. This is what the differences in the degree of hardware support for video decoding look like on different video chips:

The tasks performed by the video chip are highlighted in blue, and the tasks performed by the central processor in green. As you can see, if the previous generation helped the processor only with some tasks, then the new video processor used in the latest chips does all the tasks itself. We will check the effectiveness of the solutions in future materials on the study of the effectiveness of hardware video decoding; NVIDIA provides the following figures in the materials: when using a modern dual-core processor and software data decoding, playing Blu-ray and HD-DVD discs consumes up to 90-100% of processor time, with hardware decoding on a previous generation video chip on the same system up to 60-70%, and with the new engine that they developed for the G84 and G86 only 20%. This, of course, does not look like the claimed full hardware decoding, but it is still very, very effective.

At the time of the announcement, the new features introduced in PureVideo HD only work in the 32-bit version of Windows Vista, and support for PureVideo HD in Windows XP will appear only in the summer. As for the quality of video playback, post-processing, deinterlacing, etc., NVIDIA improved its performance even in the GeForce 8800, and the new chips are no different in this regard.

CUDA, non-game and physics computing

The article on the GeForce 8800 mentioned that the increased peak performance of floating arithmetic in the new accelerators and the flexibility of the unified shader architecture have become sufficient for calculating physics in gaming applications and even more serious tasks: mathematical and physical modeling, economic and statistical models and calculations, image recognition , image processing, scientific graphics and much more. For this purpose, a special computing-oriented API was released, which is convenient for adapting and developing programs that transfer calculations to the GPU CUDA (Compute Unified Device Architecture).

More information about CUDA is written in the article about the G80; we will focus on another trendy trend recently - support for physical calculations on the GPU. NVIDIA calls its similar technology Quantum Effects. It is declared that all new generation video chips, including the G84 and G86 being considered today, are well suited for calculations of this kind, allowing part of the load to be transferred from the CPU to the GPU. Specific examples include simulations of smoke, fire, explosions, hair and clothing dynamics, fur and liquids, and much more. But for now I want to write more about something else. The fact that so far we are only shown pictures from test applications with a large number of physical objects calculated by video chips, and there is not even a hint of games with such support yet.

Support for external interfaces

As we remember, the GeForce 8800 was somewhat surprised by another unexpected innovation - an additional chip that supports external interfaces outside the main one. In the case of top-end video cards, these tasks are handled by a separate chip called NVIO, which integrates: two 400 MHz RAMDACs, two Dual Link DVI (or LVDS), HDTV-Out. Even then we assumed that a separate external chip was unlikely to survive in the middle and lower segments, and this is actually what happened. In G84 and G86, support for all these interfaces is built into the chip itself.

The GeForce 8600 GTS has two Dual Link DVI-I outputs with HDCP support; this is the first video card on the market with similar capabilities (HDCP and Dual Link together). As for HDMI, support for this connector is fully implemented in hardware and can be implemented by manufacturers on specially designed cards. But the GeForce 8600 GT and 8500 GT support for HDCP and HDMI is optional, but they may well be implemented by individual manufacturers in their products.

Details: G92, GeForce 8800 family

G92 Specifications

  • Chip codename G92
  • 65 nm technology
  • 754 million transistors (more than G80)
  • Unified architecture with an array of shared processors for stream processing of vertices and pixels, as well as other types of data
  • Core frequency 600 MHz (GeForce 8800 GT)
  • ALUs operate at more than double the frequency (1.5 GHz for GeForce 8800 GT)
  • 112 (this is for the GeForce 8800 GT, but probably 128 in total) scalar floating-point ALUs (integer and floating formats, support for FP 32-bit precision within the IEEE 754 standard, MAD+MUL without clock loss)
  • 56 (64) texture addressing units with support for FP16 and FP32 components in textures (see explanation below)
  • 56 (64) bilinear filtering units (like G84 and G86, no free trilinear filtering and more efficient anisotropic filtering)
  • Possibility of dynamic branches in pixel and vertex shaders
  • Record results from up to 8 frame buffers simultaneously (MRT)
  • All interfaces (two RAMDAC, two Dual DVI, HDMI, HDTV) are integrated on the chip (unlike those placed on an external additional NVIO chip in the GeForce 8800)

GeForce 8800 GT 512MB reference card specifications

  • Core frequency 600 MHz
  • Effective memory frequency 1.8 GHz (2*900 MHz)
  • Memory type GDDR3
  • Memory capacity 512 megabytes
  • Power consumption up to 110 W
  • Two DVI-I Dual Link connectors, supports output resolutions up to 2560x1600
  • SLI connector
  • PCI Express 2.0 bus
  • TV-Out, HDTV-Out, HDCP support
  • RRP $249

GeForce 8800 GT 256MB reference card specifications

  • Core frequency 600 MHz
  • Universal processor frequency 1500 MHz
  • Number of universal processors 112
  • Number of texture blocks 56, blending blocks 16
  • Effective memory frequency 1.4 GHz (2*700 MHz)
  • Memory type GDDR3
  • Memory capacity 256 megabytes
  • Memory bandwidth is 44.8 gigabytes per second.
  • Theoretical maximum fill rate is 9.6 gigapixels per second.
  • Theoretical texture sampling speed up to 33.6 gigatexels per second.
  • Power consumption up to 110 W
  • Two DVI-I Dual Link connectors, supports output resolutions up to 2560x1600
  • SLI connector
  • PCI Express 2.0 bus
  • TV-Out, HDTV-Out, HDCP support
  • RRP $199

GeForce 8800 GTS 512MB reference card specifications

  • Core frequency 650 MHz
  • Number of universal processors 128
  • Effective memory frequency 2.0 GHz (2*1000 MHz)
  • Memory type GDDR3
  • Memory capacity 512 megabytes
  • Memory bandwidth 64.0 gigabytes per second.
  • Theoretical texture sampling speed up to 41.6 gigatexels per second.
  • Two DVI-I Dual Link connectors, supports output resolutions up to 2560x1600
  • SLI connector
  • PCI Express 2.0 bus
  • TV-Out, HDTV-Out, HDCP support
  • Recommended price $349-399

G92 chip architecture

Architecturally, the G92 is not very different from the G80. From what we know, we can say that the G92 is the flagship of the line (G80), transferred to a new technological process, with minor changes. NVIDIA indicates in its materials that the chip has 7 large shader units and, accordingly, 56 texture units, as well as four wide ROPs, the number of transistors in the chip raises suspicions that they are not telling something. The initially announced solutions do not involve all the blocks that physically exist in the chip; their number in the G92 is greater than that active in the GeForce 8800 GT. Although the increased complexity of the chip is explained by the inclusion of a previously separate NVIO chip, as well as a new generation video processor. In addition, the number of transistors was also influenced by the more complex TMU units. It is also likely that the caches were enlarged to increase the efficiency of using the 256-bit memory bus.

This time, in order to compete with the corresponding AMD chips, NVIDIA decided to leave a fairly large number of blocks in the mid-end chip. Our assumption from the review of the G84 and G86 was confirmed that much more powerful chips for the mid-price range will be released based on 65 nm technology. There are few architectural changes in the G92 chip, and we will not dwell on this in detail. Everything said above about solutions from the GeForce 8 series remains in force; we will repeat only some of the main points devoted to the architectural specifications of the new chip.

For the new solution, NVIDIA provides the following diagram in its documents:

That is, of all the changes, only a reduced number of blocks and some changes in the TMU, which are described below. As indicated above, there are doubts that this is physically the case, but we give a description based on what NVIDIA writes. The G92 consists of seven universal computing units (shader processors), NVIDIA traditionally talks about 112 processors (at least in the first GeForce 8800 GT solutions). Each of the blocks, in which 8 TMUs and 16 ALUs are grouped, can execute part of a vertex, pixel or geometry shader over a block of 32 pixels, vertices or other primitives, and can also perform other (non-graphical) calculations. Each processor has its own level 1 cache, which stores textures and other data. In addition to the control unit and compute shader processors, there are four ROP units that perform visibility detection, writing to the frame buffer and MSAA, grouped with memory controllers, write queues and a second-level cache.

General Purpose Processors and TMUs

The diagram of shader units and an assessment of their peak computing performance of the G80 was given in the corresponding article; for the G92 it has not changed; their performance is easy to recalculate based on changes in the clock frequency. The ALUs in the chips operate at more than double the frequency, they are scalar, which allows for high efficiency. It is still unknown about the functional differences, whether the accuracy of FP64 calculations is available in this chip or not. There is definitely support for calculations in integer format, and the implementation of all calculations complies with the IEEE 754 standard, which is important for scientific, statistical, economic and other calculations.

The texture units in the G92 are not the same as those in the G80, they follow the TMU solution in the G84 and G86, which were made architectural changes to increase performance. Let us recall that in the G80 each texture unit could calculate four texture addresses and perform eight texture filtering operations per clock, and in the G84/G86 TMUs are capable of twice as many texture samples. That is, each of the blocks has eight texture addressing modules (determining the exact address for sampling by coordinates) TA and exactly the same number of bilinear filtering (TF) modules:

Don't think that the 56 blocks of the GeForce 8800 GT in real applications will be stronger than the 32 blocks of the GeForce 8800 GTX. With trilinear and/or anisotropic filtering enabled, the latter will be faster since they can do a little more work filtering texture samples. We will check this information in the practical part by analyzing the results of the corresponding synthetic tests. All other functionality of texture blocks has not changed; texture formats FP16, FP32 and others are supported.

ROP blocks, framebuffer writes, anti-aliasing

The ROP blocks themselves have not changed either, but their number has changed. The G80 had six ROPs, and in the new solution there are four of them, to reduce the cost of producing chips and PCBs for video cards. This cut may also be to avoid creating too much competition with existing top-tier solutions.

Each block processes four pixels or 16 sub-pixels, for a total of 16 pixels per clock for color and Z. In Z-only mode, twice as many samples are processed per clock. At MSAA 16x, the chip can produce two pixels per clock cycle, at 4x × 8, etc. Like the G80, FP32 and FP16 frame buffer formats are fully supported along with antialiasing.

The new anti-aliasing method, Coverage Sampled Antialiasing (CSAA), known from previous chips in the series, is supported. Another innovation is that the GeForce 8800 GT has updated the transparency antialiasing algorithm. The user was offered two options: multisampling (TRMS) and supersampling (TRSS), the first had very good performance, but did not work effectively in all games, and the second was of high quality, but slow. The GeForce 8800 GT introduces a new method of multisampling translucent surfaces, which improves its quality and performance. This algorithm gives almost the same quality improvement as supersampling, but has high performance - only a few percent worse for the mode without anti-aliasing of translucent surfaces enabled.

PureVideo HD

One of the expected changes in the G92 was the built-in second-generation video processor, known from the G84 and G86, which received expanded support for PureVideo HD. It is already known that this version of the video processor almost completely relieves the CPU when decoding all types of video data, including the “heavy” H.264 and VC-1 formats.

Like the G84/G86, the G92 uses a new model of programmable PureVideo HD video processor, which includes the so-called BSP engine. The new processor supports decoding H.264, VC-1 and MPEG-2 formats with resolutions up to 1920x1080 and bitrates up to 30-40 Mbps, performing the work of decoding CABAC and CAVLC data in hardware, which allows you to play all existing HD-DVD and Blu -ray disks even on medium-power single-core PCs. VC-1 decoding is not as efficient as H.264, but it is still supported by the new processor.

You can read more about the second generation video processor in the part dedicated to the G84 and G86 chips. The performance of modern video solutions was partially tested in the latest material on the study of the effectiveness of hardware video decoding.

PCI Express 2.0

Among the real innovations in the G92 is support for the PCI Express 2.0 bus. The second version of PCI Express doubles the standard bandwidth, from 2.5 Gb/s to 5 Gb/s, resulting in the x16 connector can transfer data at speeds of up to 8 GB/s in each direction, as opposed to 4 GB/s for version 1.x. It is very important that PCI Express 2.0 is compatible with PCI Express 1.1, and old video cards will work in new motherboards, and new video cards with support for the second version will remain functional in boards without its support. Provided there is sufficient external power and without increasing the interface bandwidth, of course.

To ensure backward compatibility with existing PCI Express 1.0 and 1.1 solutions, the 2.0 specification supports both 2.5 Gbps and 5 Gbps transfer rates. PCI Express 2.0 backwards compatibility allows legacy 2.5 Gb/s solutions to be used in 5.0 Gb/s slots that will operate at lower speeds, and a device designed to version 2.0 specifications can support both 2.5 Gb/s and 5 Gb/s speeds . In theory, compatibility is good, but in practice, problems may arise with some combinations of motherboards and expansion cards.

Support for external interfaces

As one would expect, the additional NVIO chip available on GeForce 8800 boards, which supports external interfaces located outside the main one (two 400 MHz RAMDAC, two Dual Link DVI (or LVDS), HDTV-Out), in this case was included in the chip itself , support for all of these interfaces is built into the G92 itself.

GeForce 8800 GT video cards usually have two Dual Link DVI-I outputs with HDCP support. As for HDMI, support for this connector is fully implemented; it can be implemented by manufacturers on specially designed cards that may be released a little later. Although the presence of an HDMI connector on a video card is completely optional, it can be successfully replaced by an adapter from DVI to HDMI, which is included with most modern video cards.

Unlike AMD's RADEON HD 2000 series video cards, the GeForce 8800 GT does not contain a built-in audio chip required to support DVI audio transmission using an HDMI adapter. This ability to transmit video and audio signals over one connector is in demand primarily on mid- and low-end cards that are installed in small media center cases, and the GeForce 8800 GT is hardly suitable for this role.

Details: G94, GeForce 9600 family

G94 Specifications

  • Chip codename G94
  • 65 nm technology
  • 505 million transistors
  • Unified architecture with an array of shared processors for stream processing of vertices and pixels, as well as other types of data
  • Hardware support for DirectX 10, including shader model Shader Model 4.0, geometry generation and recording intermediate data from shaders (stream output)
  • 256-bit memory bus, four independent 64-bit wide controllers
  • Core frequency 650 MHz (GeForce 9600 GT)
  • ALUs operate at more than double the frequency (1.625 GHz for GeForce 9600 GT)
  • 64 scalar floating point ALUs (integer and floating point formats, IEEE 754 32-bit precision FP support, MAD+MUL without clock loss)
  • 32 texture addressing units with support for FP16 and FP32 components in textures
  • 32 bilinear filtering units (as in G84 and G92, this gives an increased number of bilinear samples, but without free trilinear filtering and effective anisotropic filtering)
  • Possibility of dynamic branches in pixel and vertex shaders
  • 4 wide ROP blocks (16 pixels) with support for antialiasing modes up to 16 samples per pixel, including with FP16 or FP32 frame buffer format. Each block consists of an array of flexibly configurable ALUs and is responsible for generating and comparing Z, MSAA, and blending. Peak performance of the entire subsystem up to 64 MSAA samples (+ 64 Z) per clock, in Z only mode 128 samples per clock
  • Record results from up to 8 frame buffers simultaneously (MRT)

GeForce 9600 GT reference card specifications

  • Core frequency 650 MHz
  • Universal processor frequency 1625 MHz
  • Number of universal processors 64
  • Number of texture blocks 32, blending blocks 16
  • Effective memory frequency 1.8 GHz (2*900 MHz)
  • Memory type GDDR3
  • Memory capacity 512 megabytes
  • Memory bandwidth is 57.6 gigabytes per second.
  • Theoretical maximum fill rate is 10.4 gigapixels per second.
  • Theoretical texture sampling speed up to 20.8 gigatexels per second.
  • Two DVI-I Dual Link connectors, supports output resolutions up to 2560x1600
  • SLI connector
  • PCI Express 2.0 bus
  • Power consumption up to 95 W
  • Recommended price $169-189

G94 architecture

From an architectural point of view, G94 differs from G92 only in quantitative characteristics; it has a smaller number of execution units: ALU and TMU. And there are not many differences from the G8x. As was written in previous materials, the G9x line of chips is a slightly modified G8x line, transferred to a new process technology with minor architectural changes. The new mid-end chip has 4 large shader units (64 ALUs in total) and 32 texture units, as well as four wide ROPs.

So, there are few architectural changes in the chip, almost all of them are described above, and everything previously said for previous solutions remains valid. And here we present only the main diagram of the G94 chip:

The texture units in G94 are exactly the same as in G84/G86 and G92, they can select twice as many bilinearly filtered samples from textures compared to G80. But 32 texture units of the GeForce 9600 GT in real applications will not work faster than 32 units of the GeForce 8800 GTX only because of the higher operating frequency of the GPU. This can only be observed when trilinear and anisotropic filtering are turned off, which is extremely rare, only in those algorithms that use unfiltered samples, for example, in parallax mapping.

Another advantage of the G9x and GeForce 9600 GT in particular, NVIDIA considers a certain new compression technology implemented in ROP units, which, according to their estimates, works 15% more efficiently than that used in previous chips. Apparently, these are exactly the same architectural modifications in the G9x, designed to ensure greater efficiency of the 256-bit memory bus compared to the 320/384-bit one that we wrote about earlier. Naturally, in real applications there will not be such a big difference; even according to NVIDIA itself, the increase from innovations in ROP is most often only about 5%.

Despite all the changes in the G9x architecture that add complexity to the chip, which we'll talk about below, the number of transistors on the chip is quite large. Probably, this complexity of the GPU is explained by the inclusion of a previously separate NVIO chip, a new generation video processor, the complication of the TMU and ROP blocks, as well as other hidden modifications: changes in cache sizes, etc.

PureVideo HD

The G94 has the same second-generation video processor known from the G84/G86 and G92, which features improved support for PureVideo HD. It almost completely offloads the CPU when decoding most common video data types, including H.264, VC-1 and MPEG-2, with resolutions up to 1920x1080 and bitrates up to 30-40 Mbps, doing the decoding work entirely in hardware. And although NVIDIA's VC-1 decoding is not as efficient as H.264, a small part of the process uses the power of the central processor, but it still allows you to play all existing HD DVDs and Blu-Ray discs even on average computers. You can read more about the second generation video processor in our reviews of the G84/G86 and G92, links to which are given at the beginning of the article.

Well, we will note the software improvements to PureVideo HD, which were timed to coincide with the release of the GeForce 9600 GT. The latest innovations in PureVideo HD include dual-stream decoding, dynamic changes in contrast and color saturation. These changes are not exclusive to the GeForce 9600 GT, and in new driver versions, starting with ForceWare 174, they are introduced for all chips that support full hardware acceleration using PureVideo HD. In addition to the video card we are considering today, this list includes: GeForce 8600 GT/GTS, GeForce 8800 GT and GeForce 8800 GTS 512.

Dynamic Contrast Enhancement is quite common in consumer electronics, TVs and video players and can improve images with sub-optimal exposure (shutter speed and aperture combination). To do this, after decoding each frame, its histogram is analyzed, and if the frame has poor contrast, the histogram is recalculated and applied to the image. Here is an example (on the left is the initial image, on the right is the processed image):

Much the same applies to the dynamic enhancement of color saturation introduced in PureVideo HD. Home appliances have also been using some image-enhancing algorithms for a very long time, unlike computer monitors, which reproduce everything as is, which in many cases can cause the picture to be too dull and lifeless. Automatic balance of color components in video data, also calculated every new frame, improves human perception of the picture by slightly adjusting the saturation of its colors:

Dual-stream decoding allows you to speed up the decoding and post-processing of two different video streams simultaneously. This can be useful in output modes such as picture-in-picture, which are used in some Blu-Ray and HD DVD discs (for example, the second image may show the director of the film giving his commentary on the scenes shown in the main window ), the editions of the films WAR and Resident Evil: Extinction are equipped with such capabilities.

Another useful innovation in the latest version of PureVideo HD is the ability to simultaneously run the Aero shell in the Windows Vista operating system while playing hardware-accelerated video in windowed mode, which was not previously possible. I can’t say that this worries users very much, but it’s a nice opportunity.

Support for external interfaces

Support for external interfaces on the GeForce 9600 GT is similar to the GeForce 8800 GT, with the exception of integrated DisplayPort support, perhaps. The additional NVIO chip available on GeForce 8800 boards, which supports external interfaces outside the main one in the G94, was also included in the chip itself.

The reference GeForce 9600 GT video cards have two Dual Link DVI outputs with HDCP support. HDMI and DisplayPort support is implemented in hardware on the chip, and these ports can be implemented by NVIDIA partners on specially designed cards. Moreover, as NVIDIA assures, unlike the G92, DisplayPort support is now built into the chip and external transmitters are not required. In general, HDMI and DisplayPort connectors on a video card are optional; they can be replaced by simple adapters from DVI to HDMI or DisplayPort, which are sometimes included with modern video cards.

Details: G96, GeForce 9400 and 9500 families

G96 Specifications

  • Chip codename G96
  • 65 nm technology
  • 314 million transistors
  • Unified architecture with an array of shared processors for stream processing of vertices and pixels, as well as other types of data
  • Hardware support for DirectX 10, including shader model Shader Model 4.0, geometry generation and recording intermediate data from shaders (stream output)
  • 128-bit memory bus, two independent 64-bit wide controllers
  • Core frequency 550 MHz
  • ALUs operate at more than double the frequency (1.4 GHz)
  • 32 scalar floating-point ALUs (integer and floating formats, IEEE 754 32-bit precision FP support, MAD+MUL without clock loss)
  • 16 texture addressing units with support for FP16 and FP32 components in textures
  • 16 bilinear filtering units (as with the G92, this gives an increased number of bilinear samples, but without the free trilinear filtering and effective anisotropic filtering)
  • Possibility of dynamic branches in pixel and vertex shaders
  • 2 wide ROP blocks (8 pixels) with support for antialiasing modes up to 16 samples per pixel, including with FP16 or FP32 frame buffer format. Each block consists of an array of flexibly configurable ALUs and is responsible for generating and comparing Z, MSAA, and blending. Peak performance of the entire subsystem up to 32 MSAA samples (+ 32 Z) per clock, in Z only mode 64 samples per clock
  • Record results from up to 8 frame buffers simultaneously (MRT)
  • All interfaces (two RAMDAC, two Dual DVI, HDMI, DisplayPort) are integrated on the chip

GeForce 9500 GT reference card specifications

  • Core frequency 550 MHz
  • Number of universal processors 32
  • Number of texture blocks 16, blending blocks 8
  • Effective memory frequency 1.6 GHz (2*800 MHz)
  • Memory type GDDR2/GDDR3
  • Memory capacity 256/512/1024 megabytes
  • Theoretical texture sampling speed up to 8.8 gigatexels per second.
  • Two DVI-I Dual Link connectors, supports output resolutions up to 2560x1600
  • SLI connector
  • PCI Express 2.0 bus
  • TV-Out, HDTV-Out, HDMI and DisplayPort support with HDCP

GeForce 9400 GT reference card specifications

  • Core frequency 550 MHz
  • Universal processor frequency 1400 MHz
  • Number of universal processors 16
  • Number of texture blocks 8, blending blocks 8
  • Effective memory frequency 1.6 GHz (2*800 MHz)
  • Memory type GDDR2
  • Memory capacity 256/512 megabytes
  • Memory bandwidth 25.6 gigabytes per second.
  • Theoretical maximum fill rate is 4.4 gigapixels per second.
  • Theoretical texture sampling speed up to 4.4 gigatexels per second.
  • Two DVI-I Dual Link connectors, supports output resolutions up to 2560x1600
  • SLI connector
  • PCI Express 2.0 bus
  • TV-Out, HDTV-Out, HDMI and DisplayPort support with HDCP

G96 architecture

Architecturally, the G96 is exactly half of the G94 chip, which, in turn, differs from the G92 only in quantitative characteristics. The G96 has half the number of all execution units: ALU, TMU and ROP. The new video chip is designed for solutions in the lowest price range, and has two large shader units (32 ALUs in total) and 16 texture units, as well as eight ROPs. It also has a reduced memory bus, from 256-bit to 128-bit, compared to the G94 and G92. All hardware capabilities remain unchanged, the only differences are in performance.

Details: G92b, GeForce GTS 200 family

GeForce GTS 250 reference video card specifications

  • Core frequency 738 MHz
  • Universal processor frequency 1836 MHz
  • Number of universal processors 128
  • Number of texture blocks 64, blending blocks 16
  • Effective memory frequency 2200 (2*1100) MHz
  • Memory type GDDR3
  • Memory capacity 512/1024/2048 megabytes
  • Memory bandwidth 70.4 GB/s
  • Theoretical maximum fill rate is 11.8 gigapixels per second.
  • Theoretical texture sampling speed up to 47.2 gigatexels per second.
  • Two DVI-I Dual Link connectors, supports output resolutions up to 2560x1600
  • Dual SLI connector
  • PCI Express 2.0 bus
  • TV-Out, HDTV-Out, HDCP, HDMI, DisplayPort support
  • Power consumption up to 150 W (one 6-pin connector)
  • Two-slot version
  • RRP $129/$149/$169

In general, this “new” video card based on the 55 nm G92 chip is no different from the GeForce 9800 GTX+. The release of the new model can be partially justified by the installation of not 512 megabytes of video memory, like the 9800 GTX+, but a gigabyte, which greatly affects performance in heavy modes with maximum quality settings, high resolutions with full-screen anti-aliasing enabled. There are also two-gigabyte options, but this is more of a marketing advantage than a real one.

In such conditions, older versions of the GeForce GTS 250 should really be noticeably faster than the GeForce 9800 GTX+ due to the increased memory capacity. And some of the most modern games will benefit from not even the highest resolutions. Everything would be fine, but some card manufacturers released the GeForce 9800 GTX+ with a gigabyte of memory even earlier...

The production of G92b video chips using 55 nm technology standards and a noticeable simplification of the PCB design allowed NVIDIA to create a solution similar to the GeForce 9800 GTX in terms of characteristics, but with a lower price and reduced power consumption and heat dissipation. And now, in order to provide the GeForce GTS 250 with power, only one 6-pin PCI-E power connector is installed on the board. That's all the main differences from the 9800 GTX+.

The 9800 GT video card from NVIDIA is a logical continuation of the 8800GT board. The two technological products have almost identical parameters. The main difference between the 9800GT and the previous model was its support for HybridPower technology. There are no other improvements. The card's GPU is labeled G92-270. The model 8800 had a similar one. The microchip has revision A2, as before. The clock frequency characteristics of the 9800 GT video card remained at the same level: 601/1512 MHz.

9800 GT Specifications

Technically, the 9800 GT has not undergone any major changes since the 8800.

Video card parameters:

  • GPU: G92.
  • Video memory: 512 MB.
  • Memory bus: 256bit.
  • GPU frequency: 601/1512 MHz.
  • Texture blocks: 56.
  • ROP blocks: 16.
  • Effective frequency at which the video card memory operates: 1800 MHz.
  • Universal processors (cores): 112.
  • Supported unique technologies: Hybrid Power.
  • System bus and other communication interfaces: PCI-E 2.0x16/2xDVI/S-Video. HDMI is supported with an adapter.

What tasks can the 9800 GT video card solve?

The presented video card copes well with games of the previous generation. If the user does not chase new products, then the 9800 GT will undoubtedly suit him. The characteristics of the card allow you to easily run games such as The Witcher 2, S.T.A.L.K.E.R, Crysis 2, Dead Space 3 and others. Fallout New Vegas, by the way, also runs without problems with this board. But it will no longer be possible to launch the fourth version of the legendary project.

The video card will also not support modern shooters and car simulators released after 2013. There are exceptions, but very rarely. The user feels quite comfortable when working with graphics and video information, watching movies in high definition. If a person is not a professional photographer or 3D designer who needs maximum speed, then the 9800 GT video card is quite suitable for him.

Pros and cons of a video card

The board in question has a number of advantages that make its use still relevant. Although this solution also has disadvantages.

What advantages does the 9800 GT have? The characteristics of the model indicate that there are many of them.

  • The video card supports SLI mode. You can buy 4 boards at once and combine them into a group, thereby achieving a significant increase in performance.
  • The board provides support for PhysX technology. It serves to reproduce additional special effects in games. It is worth noting that this significantly reduces the overall performance of the video card. To level out this effect, the manufacturer recommends using an additional dedicated PhysX accelerator, which will complement the main board.
  • Using special utilities, you can improve the standard performance of the Nvidia 9800 GT, raising it by 5-15%. The specific indicator depends on the user’s desires and the capabilities of the card’s cooling system. When overclocking, you need to carefully monitor the operating temperature of the device to prevent excessive overheating and, as a result, breakdown.

Flaws:

  • is an outdated solution;
  • has limited efficiency in general-purpose computing;
  • the smooth playback of Blu-Ray discs and HD quality videos posted on the Internet will be significantly influenced by the power of the central CPU (in addition to the video card processor);
  • low performance 9800 GT, board specifications do not allow running games published after 2013;
  • relatively high energy consumption;
  • Insufficient video card performance when working with additional PhysX effects.

Before the release of the 9800 GT video card, many analysts and journalists believed that the news about the appearance of this graphics adapter was fictitious. After the official release, the information became clearer. Many in advance gave the laurels of primacy to this video card as a flagship, but NVIDIA engineers again assigned a new number to the old architectural solutions.

GeForce 9800 GT. Video card characteristics

The graphics accelerator is an almost complete copy of its predecessor - and some tests show that the previous generation remains more productive. The new product is equipped with the same processor - G92, even the technical process has not changed. It remained 65 nm, although many believed that the 9800 GT would use 55 nm. GPU frequencies have not changed.

On the official website page presenting the GeForce 9800 GT, the characteristics are as follows:

  • GPU: G92. 112 universal processors, 64 texture units.
  • Video memory: GDDR3, its volume is 512 MB.
  • Memory bus width: 256 bits.
  • GPU frequency: 600 MHz.
  • Shader unit frequency: 1500 MHz.
  • Memory frequency: 1800 (900) MHz.
  • Ports: 2xDVI-I, TV-Out.

The only thing that distinguishes the video card in question from the 8800 GT is support for HybridPower technology. It allows you to switch between integrated graphics and a discrete card in automatic mode, which makes it possible to reduce power consumption and

This update cannot be considered important for the GeForce 9800 GT; it does not change the characteristics, and besides, HybridPower cannot work unless one condition is met. The motherboard must also support this technology and at the same time have an integrated graphics core.

Equipment

The video card comes in a fairly large box, which is decorated with a predominance of blue colors.

Inside it you can find the following:

  • The video card itself.
  • Set S-Video-tulip.
  • Additional power cable.
  • Laser disk with drivers and programs.
  • Some revisions include the game Civilization IV.
  • User's Guide.

The texts printed on the packaging are mainly advertising. However, among the laudatory odes to the technologies used in the GeForce 9800 GT product, the technical characteristics directly indicate that the graphics accelerator is based on the 8800 GT. Such honesty is enviable.

The presence of a large number of adapters and power cords indicates that NVIDIA cares about its customers. If you need to connect non-standard equipment or multiple monitors, you won’t have to buy additional components; you just need to take them out of the box.

Design

It can be noted that when comparing the GeForce 8800 GT and GeForce 9800 GT, their characteristics coincide much more than their design. Of course, in general terms one board resembles another, but it is impossible to say that they are identical.

Both video cards have the same dimensions, location of the power connector, SLI contacts covered with a rubber plug, as well as the position of the GPU and memory IC. The chips are manufactured by SAMSUNG, and the sampling time is 1 ns.

The location of the remaining chains is completely different. Moreover, the engineers decided not to save money and used the highest quality devices. You can see solid capacitors on the video card. The service life of such products is much longer than traditional electrolytic ones. Choking coils with ferrite cores also have longer operating times than standard ones.

Even if the specialists did their best with the installation of semiconductors, the number of ports for connecting output devices clearly does not reach the top representatives. There are two DVI-I connectors and one TV-Out. But this disadvantage is compensated by all the necessary adapters.

Cooling system

Upon closer inspection of the 9800 GT, its performance doesn't seem impressive. However, there is an element on the graphics accelerator that can give a head start to many other video cards - this is the radiator. First of all, it should be noted that the installed cooling system was produced by the famous German company Zalman, which is one of the market leaders.

It is made as simply as possible, but it works very quietly and as efficiently as possible. The core on which thermal paste is applied touches the surface of the GPU. A pair of six-millimeter tubes are passed through it; their shape is similar to the letter “U”. They are made of copper. On top of the tubes are thin aluminum plates.

This entire design is logically completed by a low-speed screw with a diameter of 8 cm. Its rotation speed is selected automatically depending on the load level of the graphics core. The only drawback of the cooling system is that it obscures the second PCI EXPRESS slot. If you need to install two video cards in your computer at once, you will have to think about replacing the radiator.

Overclocking

The GT, which has modest specs, can be overclocked using the included software. The GamerHUD application can change frequencies while the operating system is running without unnecessary reboots. In addition, the program allows you to manipulate the voltage supplied to the GPU, however, using this feature is not recommended, so that the video processor does not fail.

After overclocking, the GeForce 9800 GT video card continues to operate stable, the frequency characteristics of which have been increased to 700 MHz for the GPU, 1700 MHz for the shader unit and 2000 MHz for the memory. The temperature after overclocking increases slightly, for which we can thank the cooling system.

NVIDIA GeForce 9800 GT

4 (80%) 2 vote[s]

The NVIDIA GeForce 9800 GT video card is based on the 65 nm process technology and is based on the G92-270 (G92) graphics processor. The card supports Directx 10. NVIDIA has placed 512 megabytes of GDDR3 memory, which is connected using a 256-bit memory interface.
The GPU operates at 600 MHz. The number of CUDA cores is 112, with a speed of 1800 Mbps and a throughput of 57.6 Gbps.

The power consumption of the video card is 105 W, and the recommended power supply is 400 W.

NVIDIA GeForce 9800 GT supports Microsoft DirectX 10 and OpenGL 3.3.

Characteristics of the NVIDIA GeForce 9800 GT video card

Technologies and capabilities:
CUDA:Yes
SLI:Yes
PhysX:Yes
3D Vision:Yes
3D games:Yes
DirectX:10
OpenGL:3.3
Tire:PCI-Express 2.0 x16
OS support:Microsoft Windows 7-10, Linux, FreeBSDx86

Please note:: The table shows the reference characteristics of the video card; they may differ from one manufacturer to another.

Download drivers for NVIDIA GeForce 9800 GT video card:

Select operating system:

For Windows 10: Download 32-bit 342.01 WHQL Download 64-bit 342.01 WHQL
For Windows 7/8/8.1: Download 32-bit 342.01 WHQL Download 64-bit 342.01 WHQL

Driver information:

Driver version:340.52 WHQL
Published:July 29, 2014
Driver language:Russian
Size:220 MB
CUDA Toolkit:6.5
Driver information:Release Notes (v340.52) (PDF)

GeForce Experience

Download the driver for the NVIDIA GeForce 9800 GT video card from the official website!

Or use the GeForce Experience program - it will automatically select the necessary driver for your video card.

Downloading the driver for the NVIDIA GeForce 9800 GT video card. Made from the official website!

Video reviews of the NVIDIA GeForce 9800 GT video card:

The site administration may not share the opinions of the authors of video reviews!

Frequently asked questions and answers regarding the NVIDIA GeForce 9800 GT video card: Question: What series is this video card?Answer: Desktop Question: Which DirectX does it support?Answer: Video card supports DirectX 10 Question: What is the power consumption of the video card?Answer: Maximum power consumption is 105 W Question: What power supply is needed for the video card?Answer: Recommended 400 W PSU Question: Are there additional power connectors?Answer: Two 6-pin Question: What is the maximum permissible temperature?Answer: No more than 105℃ Question: Where can I download the driver?Answer: