GeForce 400 Series

The Geforce 400 series is a series of desktop graphics chip company Nvidia. All graphics processors for the first time this series support Shader Model 5.0 (SM 5.0 ) to DirectX 11, as well as OpenCL, CUDA and PhysX with it.

4.1 Performance
4.2 Power consumption data

5.1 Measurement of power consumption of a whole system

History

With the Geforce 400 series Nvidia introduced the support of Shader Model 5.0 DirectX 11 after. Originally this in the 4th quarter of 2009, match ( which was released DirectX 11 ) 7 for Windows startup, will be presented. In contrast to the competitor AMD, Nvidia had decided to introduce a completely new architecture, but which needed more development time. This meant that AMD could bring the whole of the Radeon HD 5000 - series line-up on the market before the development of the first graphics processor, the GeForce 400 series was completed. In this case, it was produced in the 40 - nm manufacturing process GF100, which based the first on the Fermi architecture. Consisting of around three billion transistors GPU Nvidia introduced before 18 January 2010, without, however, presenting appropriate graphic cards. As a result of architectural changes increases as compared to the previous GT200, the number of stream processors on the GF100 per cluster from 24 to 32 since the GF100 has a total of 16 shader clusters, then he is 512 stream processors available. Through the new relationship with shaders to TMUs of 8:1, reduces the number of texture units of 80 to 64 In addition, the GF100 has 48 Raster Operation Processors, which are divided into six partitions. Each ROP partition is attached to a 64 -bit memory controller for GDDR5 memory, resulting in a 384 -bit memory interface results. This enables us to expand the memory to 1.5 GB and 3 GB. Theoretically, the expansion to 6 GB is possible.

Early February 2010, Nvidia announced that the terms for the first graphics cards based on the GF100 GeForce GTX 470 and GTX 480 are loud. Previously it was widely expected that the cards of the Geforce 300 series are assigned. The official launch took place on 27 March 2010. Although the GeForce GTX 480 turned out to be when it was launched as the fastest single - GPU card on the market, they got in the trade press for their high power consumption in the criticism, the Geforce GTX 480 a new negative record aufstellte. This resulted also critical values in the range of temperature and noise development. For observers was unexpected that the GF100 graphics processor is operated on the GeForce GTX 480 with a disabled shader clusters, which is probably due to problems with the 40nm manufacturing process at TSMC. On the other hand, two Geforce GTX 470 shader clusters are disabled, so this reached the performance of the presented 6 months ago AMD Radeon HD 5870 competitors. Since they possessed in the area of power consumption and noise on better values than the GeForce GTX 480, in the press received the GTX 470 even better reviews. The launch of both models was officially on 12 April 2010, before 31 May 2010, the GeForce GTX 465 was followed. This continued to use a GF100 GPU, but with five disabled shader clusters and a 256 bit memory interface. Thus, the Geforce GTX 465 placed in terms of performance between the AMD Radeon HD 5830 competitors and 5850th

On 12 July 2010, Nvidia introduced the GeForce GTX 460 before the first card based on the GF104 GPU. Compared to Nvidia GF100 GF104 halved when the number of shader clusters and reduced the ROPs to four partitions, whereby a maximum of a 256 -bit memory interface can be installed. At the same time, there are now 48 instead of 32 stream processors per cluster, with the number of TMUs and SFUs has doubled. Since the GF104 is not intended for products of Quadro and Tesla series, Nvidia reduced the use of options in the area of GPU computing. Thus, the computing power was severely curtailed double-precision, which is however irrelevant for 3D applications. This was around one billion transistors GF100 can be saved. This contributed to the fact that the power consumption and the heat and noise on the GTX 460 again was significantly lower, than was the case with the criticized cards with the GF100. Nvidia introduced the GTX 460 in two memory expansion stages: 768 and 1024 MB Vram. The version with 768 MB Vram placed from the performance point between the Radeon HD 5830 and the GeForce GTX 465 1024 MB The stage was less affected before the GeForce GTX 465 set, although the naming suggests otherwise. Despite the official price of recommendations 199 and 229 U.S. $ for release Nvidia ordered the card with the symbol " GTX " the high-end sector. For the OEM market brought a version of Nvidia Geforce GTX 460 with reduced clock rates, as well as a " Second Edition " or "Special Edition" in which an additional shader cluster was disabled and a second version of the 1024 MB Vram variant (as v2 GTX 460 hereinafter) in which the clock rate increases, however, the memory interface has been reduced to 192 bits. Thus, the memory controllers are equipped asynchronously, as is the case with the GeForce GTX 550 Ti.

On September 13, 2010, Nvidia introduced the GeForce GTS 450. This uses the GF106 graphics processor, which in the broadest sense is with 192 shaders and 32 texture units a halved version of the GF104 GPU. Although the GF106 has three grid partitions, so a 192 bit memory interface would be possible, Nvidia is at the GTS 450 only 128 bits. Thus, the memory in the reference design is 1024 MB large, and also 512 and 2048 MB are possible. From the 3D performance point reaches the GeForce GTS 450, which GTX 460 was praised for its low noise in the trade press as the Geforce, about the performance of the Radeon HD 5750th In a direct comparison with this they achieved better values in the area of Power Idle, whereas this consumes less power under load. For the OEM market Nvidia gave birth to a adapted version of the Geforce GTS 450 out, in which a shader clusters disabled, but the memory has been increased to 1536 MB. Also a variant of GTS 450 for the OEM sector is the GeForce GT 440, which must also do without a shader cluster and instead GDDR5 DDR3 memory. In February 2011, Nvidia GT 440 also brought the still for the retail market out, but this was based on the GF108 graphics processor, the GeForce GT 430

On October 11, 2010 Nvidia introduced the Geforce GT 430 This was based on the GF108 graphics processor, which is a divided version of the GF106 GPU, bringing the GT 430 from the 3D performance forth between the AMD Radeon HD 5550 and 5570 competitors placed. Thus, the produced low-profile format card aimed primarily at " casual gamers ", or for use in multimedia and HTPCs. Compared to AMD 's competitors, it has a higher power consumption under load, but has advantages when playing Blu -ray media. Already 3 September 2010 listed Nvidia, coinciding with the launch of a number of products in the GeForce 400M series, GeForce GT 420 for the OEM market on their website. A few days later it was again removed from the site, which was initially unclear whether the card was really " launched ". Since the presentation of Geforce GT 430 GT 420 is also listed again.

Technology

Fermi architecture

The Geforce 400 series Nvidia used the first time the newly developed " Fermi " architecture, which is also used on the Quadro and Tesla cards. Fermi is the successor to the unified shader architecture of the G80 graphics processor. The primary improvements are related to the support of DirectX 11, as well as the extended applications in the field of GPU computing.

The graphics processors based on the " Fermi " architecture consist primarily of the " Graphics Processing Clusters" (GPC). This " Graphics Processing Clusters" house next to the "Grid Engine" four shader clusters or "Streaming Multiprocessors ". Each shader clusters, in turn, has 32 to 48 stream processors, four to eight texture units and a " PolyMorph Engine". Added to this are each 16 "Load / Store " units, which calculate the source and destination addresses of 16 threads in one clock and write the results in the cache or VRAM. Furthermore, even four to eight "Special Function Units" (SFU ) for sine and cosine computation available. Each SFU can execute one instruction per clock per thread, with eight clocks are required for a warp.

In the " CUDA cores " ( an allusion to the CUDA API from Nvidia ) is easy - scalar stream processors, which is still from a full fledged " Arithmetic Logic Unit " (ALU ) and a "Floating Point Unit " ( FPU) put together. To improve the GPU computing capabilities have the GPUs " Fermi architecture " as the first ever with a complete support for C and are, just like the Radeon HD 5000 series from AMD, with the IEEE -754 -2008- standard fully compatible. The latter was necessary in order to improve the double-precision capabilities (Convert double-precision ), which to use against MAD accurate FMA ( Fused Multiply - Add). Each stream processor per clock a Fused Multiply -ADD (FMA ) to calculate whether it is a single-precision or double-precision operation. In contrast to the previous generation, multiplication operations (MUL ) on the " Fermi " architecture no longer possible.

So far, the texture units were summarized at the G80 and GT200 in so-called " Texture Processing Cluster". In the " Fermi " architecture eliminates these clusters completely. Instead, each shader cluster four to eight texture units available. As a result, although the ratio of shaders to TMUs deteriorated to 8:1 or 6:1 ( previously 2:1 and 3:1, respectively ), but also a dedicated 12 KB L1 texture cache per shader cluster is now available.

The raster operation processors ( ROPs ) have been partially reorganized at the Fermi architecture. As before, these are grouped into partitions, which are still attached to the memory controller, which now per partition up to eight grid amplifiers may be present. A ROP may, in accordance with a clock, a 32 -bit integer - pixels, to output a 16 -bit floating-point pixel to the two bars, or a 32-bit FP pixel after four clock cycles. The maximum number of pixels to be processed but will be limited by the fact that each shader cluster per clock only 2 ( GF100 ) or 4 pixels ( GF104, GF106, GF108 and ) can pass on to the ROPs. Have been published in the Fermi models, therefore, the full number of ROPs in the majority processing, only by 16 - and 32 -bit floating point pixels are used, which limits the maximum pixel fill rate. When using formats greater than 32 - bit pixels, however, can be fully utilized due to other assignment of the data paths also not all ROPs. However, this restriction does not apply to the Z- filling rate.

To improve the GPU computing capabilities, the " Fermi " architecture next to the shared memory even on an L1 and L2 cache. Each shader cluster has a 76 -KB cache and 12 KB L1 cache Texture are specialized for the texture units. The remaining 64 KB are freely configurable, so that either the L1 cache 48 KB and the shared memory cache 16 KB can be assigned or vice versa. In addition, as a " Unified " is laid out L2 cache is at the " Fermi " architecture nor a global, present, which is 128 KB in size per memory controller and thus in the case of the GF100 is a total of 768 KB ( GT200: 256 KB). Through the unified interpretation, it is possible to dispense with the L2 texture cache, ROP cache and on-chip FIFOs previous architectures. The L2 cache is responsible for all load, store and take texture requests, which now can access all units at the same time on this.

Nvidia has reorganized the rendering pipelines at the " Fermi " architecture. The GPU receives the commands from the CPU over the first so-called host interface. The " GigaThread Engine" then copies the data from system memory into its own video memory and divides it into thread blocks. These are then passed on the " Graphics Processing Clusters" or their "Grid Engine" on the shader clusters, which are now referred to by NVIDIA as " Streaming Multiprocessors ". Each block is then divided into 32 threads or warps, each shader cluster can process 48 warps before they are forwarded to the stream processors.

GPUs

Naming

The Geforce 400 series, the naming scheme is used, which was first introduced with the Geforce 200 - series desktop. All graphics chips are identified by a letter abbreviation for the classification of the power sector as well as a three-digit number that generally begins with a "4" ( for Geforce 400). The last two digits are used for further differentiation within each service sector.

Abbreviation:

GT or no prefix - low budget
GTS - Mainstream
GTX - high-end and performance

Given the general slump in market prices, as well as currency fluctuations exceed the original classifications of Nvidia not in principle.

Model data

Notes:

The specified clock rates are the established recommended by Nvidia or. However, is the final determination of the clock rates in the hands of the respective graphics cards manufacturers. Therefore, it is quite possible that there are graphics cards models or will be having different clock speeds.
With the stated date of the appointment of the public presentation is given, not the date of availability of the models.

Performance

For each model, the following theoretical power data:

Notes:

The above performance data for the computing power of the stream processors, the pixel fill rate, the Texelfüllrate and memory bandwidth are the theoretical maximum values. The overall performance of a graphics card depends, among other things, on how well the available resources can be exploited or utilized. In addition, there are other not listed here, factors that affect the performance.
The computing power of the stream processors is not directly comparable with the performance of the Radeon series, since this is based on a different architecture, which scales differently.

Power consumption data

The measured values listed in the table refer to the pure power of graphics cards that comply with the Nvidia reference design. In order to measure these values , it requires a special measuring apparatus; depending on the employed measurement technique and given measurement conditions, including the shared program, with which the 3D load is generated, the values between different equipment may vary. Therefore, here measured value ranges are specified, each representing the lowest and highest measured values from different sources.

Far more common than the measurement of the consumption of the graphics card is to determine the power consumption of a whole system. To this end, a reference system is put together, in which the various graphics cards to be installed; then finds the measurement using an energy cost meter or similar apparatus rather than directly to the outlet. However, the significance of the measurements is limited: it is not clear which consumption comes from the graphics card and what is attributable to the rest of the PC system. The difference in consumption between idle and load 3D operation of this measuring method is not only dependent on the program with which the load was produced; the utilization and efficiency of the rest of the PC system, including the power supply, motherboard and processor affect the measured difference also. Since distinguish the tested systems usually of one's own home PC system, the values given there can not be mapped to its own system. Only data from otherwise identical systems are good (conditional) for comparison with each other. Because of this dependence overall system measurement values are not listed in the local table. But since they can provide a better picture of the practical power of a concrete system with a certain graphics card, pages are listed under the links that are undertaking such measurements.

Desktop-Computer Windows 7 Evergreen (GPU family) Texture mapping unit Tesla (microarchitecture) Double-precision floating-point format Floating-point unit Single-precision floating-point format Cache (computing) Byte OpenGL Electric power GeForce 100 Series

363797