Intel Tera-Scale

The Terascale processor from Intel is a research project to develop a microprocessor with hundreds of cores. Such an architecture is - referred to as " many-core " - analogous to the multi-core architectures.

The Terascale processor is in tiles - the so-called Tiles - organized, with most tiles perform general computing tasks. The Terascale processor has 100 million transistors, each tile is home to about 1.2 million transistors.

Structure of the tiles

The tiles each have a Processing Engine (PE ) and a crossbar switch. The processing engine takes over the computing tasks with the help of two FMAC units and one floating point unit. In addition, the processing engine has 5 kB of local storage. The crossbar switch is used for communication with neighboring tiles.

Some additional tiles are optimized for specific tasks such as the processing of high definition video, encryption, digital signal processing, physics acceleration or 3D computer graphics. These specialized tiles work in the areas of responsibility efficiently - so faster and more energy efficient - than non-specialized tiles.

Memory configuration

A problem that arises when Terascale is that the connection to the memory is very difficult due to the high number of cores, since on the one hand split the data connection and on the other hand must be coordinated access to the memory. Intel used for this purpose a hierarchical cache memory. In this case, each core gets its own 16 kb to 64 kb L1 cache. The 256 kB to 1 MB L2 cache is shared by a small group of nuclei. The L3 cache is available to all core groups within the processor are available.

In addition comes in Terascale an L4 cache from DRAM memory used, which however is not located on the same processor die but is produced on its own The. L4 is then placed in cache design, adjacent to or in MCP- stacked structure to the processor. In addition, the programs are provided with a QoS prioritization to the memory can be reserved for critical applications. How much memory can claim an application is dynamically determined by a resource monitor, whereby the operating system can move applications to the optimal cache units.

Speed

The Terascale processor achieved with more than one teraflop per second, a speed that is comparable to the ASCI Red supercomputer from 1996, which is made ​​up of 10,000 Pentium Pro processors with 200 MHz clock frequency and a total of 500 kilowatts of electrical power.

765868
de