CUDA ( Compute Unified formerly also called Device Architecture ) is a technology developed by Nvidia technology that allows programmers to develop parts of the program by the graphics processing unit (GPU ) are executed on the graphics card. In the form of additional processing capacity of the GPU, there is provided, wherein the GPU is generally in highly parallelizable program sequences ( high data parallelism) significantly operates faster than the CPU. CUDA is used primarily for scientific and technical calculations.

Technical details

The used only for graphics calculations GPU comes by means of the CUDA API as a coprocessor for use. As application examples, the solution seismological or geological problems, or the simulation of electromagnetic fields can be mentioned. Applies CUDA, among others in the project SETI @ home as part of the Berkeley Open Infrastructure for Network Computing ( BOINC). In general, it can be applied efficiently there, where algorithms can be highly parallelized.

Can be used CUDA technology with a graphics card from the " GeForce 8 " series and on the Quadro cards from the Quadro FX 5600th The Tesla cards from Nvidia have been used for high performance computing optimized and are mainly addressed with CUDA support but also open standards such as OpenCL. Some are even missing the connections for monitors.

Since the acquisition of Ageia PhysX technology from NVIDIA continues to develop this technology and has rewritten the CUDA. PhysX is used in many new games.

In October 2012, the CUDA version 5.0 was released by Nvidia.


Programmers use C for CUDA currently (C with Nvidia extensions). There are also wrappers for the programming languages ​​Perl, Python, Java, Fortran and. NET or links to MATLAB and Mathematica. Nvidia CUDA created with the optimizing C compiler Open64. Since the Fermi architecture and C may be used.


Examples of other GPGPU solutions:

  • OpenCL is initiated by the Khronos Group open standard that works for all graphics cards and is available for most operating systems.
  • DirectCompute is integrated into the DirectX API interface for GPGPUs.


One of the first programs that support CUDA, is the client of folding @ home, which multiplies the speed of biochemical calculations. On 17 December 2008 the client SETI @ home, which speeds up the search for extraterrestrial life by a factor of 10 was followed. Nvidia released the software " Badaboom ", a video converter which can convert video up to 20 times faster than by a calculation with the CPU. Other programs that use CUDA, are " TMPGEnc " Sorenson Squeeze 7, Adobe Photoshop CS4 from - in this case the use of filters is accelerated - Adobe Premiere Pro CS5.5 and from Mathematica 8 .

Criticism, disadvantages

Graphics Processing Units ( GPUs) are processors with an application specific design, so they know GPUs more exotic types such as 9-bit or 12-bit fixed-point location, however, do without but often customary for general-purpose CPUs and NPUs register widths of 32, 48, 64 or 80 bits ( etc.). Thus, calculations, such as the IEEE 754 precision (64-bit double precision ), often not provided in the instruction set of the GPU and have to be emulated by relatively complex software. Therefore, GPUs are particularly suitable for the calculation of data types that work with relatively low bit - widths.

To date (2010) finished first manufacturers already include advanced GPUs, in addition to the data types required by the GPU also universal data types and operations, for example, for direct calculation IEEE 754 -compliant results. As one of the leading manufacturers currently provide Nvidia with the Fermi - generation GPUs ready to support both 32 -bit integer, as well as provide single-and double - precision floating-point data formats natively (float / double).

Another disadvantage is the connection to the computer architecture, it is done in current GPUs usually via PCIe and brings, compared to the direct connection of processors, lower (higher ) latency and reduced I / O throughput rates with them. Therefore, the swap pays only for functions with some computational effort - especially if a GPU for these tasks from the command set forth (eg for large matrices ) is more appropriate.

Further, the tight binding is criticized to a manufacturer. If you use CUDA, in contrast to libraries for CPUs with MMX or SSE extensions (which are practically run on all CPUs of the various manufacturers of x86 processors ), binds to a program on the GPU maker Nvidia and thus the presence of Nvidia hardware. OpenCL is universal and provides an implementation for GPUs from Nvidia, AMD (formerly ATI), VIA, S3 and others. To this end, a CPU support for x86 processors with the SSE3 extensions is implemented on IBM offers an OpenCL implementation for the Power architecture and the Cell Broadband Engine to. From the broader approach of OpenCL results compared with CUDA, OpenCL (on identical Nvidia hardware ), however, a noticeable performance disadvantage. When using OpenCL, fell by between 5 and 50% can be observed depending on the problem.