Tesla (microarchitecture)

Tesla is a processor with highly parallelized design, also called stream processor, the company Nvidia. The GPU -based technique processor can be addressed by the in-house CUDA and OpenCL API. The product is in direct competition with AMD FireStream competitor AMD.

After the first cards were introduced in mid-2007 based on the G80 GPU, a year later followed Tesla cards with the GT200 graphics chip, which is also used for desktop graphics cards of the GeForce 200 series.

Under the code name " Fermi" presented Nvidia on 30 September 2009 on the house " GPU Technology Conference " the graphics processor of the next generation ago, the (eg cropped double precision ) in products such as Tesla, Quadro cards and in modified form Tesla cards is used in the Geforce 400 series. based on the Fermi GPU Nvidia announced to the supercomputing Fair 09 for the second and third quarter of 2010.

Technology

G80

The G80 graphics processor was the first processor from Nvidia, which was based on the newly developed unified shader architecture. After the G80 was built on the Geforce graphics card 8800 GTX and GTS since the end of 2006, presented before Nvidia Tesla first models in mid-2007. Here, the G80 A3 stepping is used primarily as it was built on the GeForce 8800 Ultra.

GT200

The GT200 was the second processor chip, which built on the NVIDIA Tesla series. Unlike the G80 Nvidia planned from the beginning with the use of the Tesla models (hence the T in the identifier ) and implemented the double-precision capabilities over 30 additional MADD unit according to the IEEE - 754R specification, which for the Geforce graphics cards would not have been necessary.

Fermi

The Fermikern is manufactured in the 40nm manufacturing process and has about three billion transistors. He is, in contrast to its predecessor, the GT200, to a large extent a new design based on the unified shader architecture of the G80 graphics processor. Fermi shader cluster is divided into 16, wherein each cluster has 32 stream processors. Thus, a total of 512 stream processors available. The Fermi chip has 16 "Load / Store " units, as well as four separate "Special Function Units" for sine and cosine computation. Furthermore, available on the Fermikern six 64 -bit memory controller for GDDR5 memory, resulting in a 384 -bit memory interface results. This enables us to expand the memory to 1.5 GB, 3 GB and 6 GB. The memory controller can now also deal with ECC memory, which has its own error correction.

Nvidia measures the GPU computing is now becoming increasingly important, which is why many architectural changes in the Fermikerns have been carried out to improve performance in this area. So Fermi becomes the first graphics processor ever a complete support for C and is equipped with the IEEE 754-2008 standard fully compatible ( before or IEEE -754 1985). The latter was necessary in order to improve the double-precision capabilities (Convert double-precision ) over the MAD accurate FMA to use ( Fused Multiply - Add). This allows each shader cluster of Fermikerns 16 double-precision operations per clock cycle to execute. Thus, Fermi can execute a total of 256 double-precision calculations per clock cycle, whereas only 30 were possible on the GT200. Also to improve the GPU computing capabilities, the Fermi GPU in addition to the shared memory also has an L1 and L2 cache.

Processors

Model data

611410
de