Bulldozer (microarchitecture)

Bulldozer is a technology developed by AMD microarchitecture for x86 processors with 64- bit extension and successor of AMD K10. First processor models on Bulldozer base were introduced under the brand name AMD FX in October 2011. The most important architectural feature is the so-called " core multithreading " (CMT ), some elements have been adopted also from the AMD K10 architecture. The Bulldozer architecture, including the optimization Piledriver is detached from the Steamroller architecture.

Architectural features

Bulldozer is a completely new development and AMD's largest micro-architecture change since the introduction of AMD64 in 2003 dar. Bulldozer based in contrast to AMD K10 on modules. A module has two 128 -bit floating point units ( FPUs ) that can be combined as needed to a 256- bit wide floating point unit. The FPU Each module contains two integer clusters each with two ALUs and two AGUs ( " address- generation units" ) to the side. For each module there is a shared by all units of the module L2 cache. Operating systems will recognize a module as two logical cores. A system based on Bulldozer The houses up to four modules. Thus, a maximum of eight threads can be processed simultaneously. The thread count per module complies with an Intel processor core with Hyper- Threading.

Module

, Introduced by AMD with Bulldozer clustered integer -core architecture was originally developed by DEC and first introduced in 1996 with the RISC - CPU Alpha 21264.

The module provides a compromise between true dual core, where each thread, all the functional units of the processor core available, and a single core with SMT Represents the concept saves space compared to ordinary dual core. A module is divided into several single and double existing units, which also share some resources. It has two integer ( integer ), and a 256-bit floating-point (float ) unit, which can be divided into two 128-bit FPUs if necessary. The fetch- and - decode unit are also present only once and share the load on the respective units. A module has a 2 MB shared L2 cache, a 16 KB 4-way L1 data cache per integer clusters and a 64 KB 2-way L1 Instructionscache. The two independent integer cluster are each equipped with two ALUs and two AGUs, which allows a maximum of four arithmetic operations per memory module and clock. Each module has two symmetrical 128 -bit FMAC Gleitkommapipelines that can be converted into a 256 -bit wide unit when needed and thus used for a FMA instruction. FMA completes unlike the multiply-add instruction only after the complete calculation the result. All modules of a CPU share the possibly existing L3 cache and the dual-channel interface.

Instruction set extensions

AMD supports the Bulldozer microarchitecture various instructions such as Intel AVX ( " Advanced Vector Extensions" ), SSE4.1, SSE4.2, AES, CLMUL, as well as developed by AMD instructions ( XOP, FMA4 ). Developed by AMD 3DNow! coincides with this generation away the first time.

First implementations

Microprocessors from AMD Bulldozer - base were initially only in the " enthusiast " series ( called AMD FX) and the server scope (as AMD Opteron ) was introduced into the market in 2011. For use in servers both CPUs with two be this (IHS ), code-named Interlagos (up to 16 threads) sold under an Integrated Heat Spreader on the socket G34, as well as CPUs with one die under the IHS codenamed Valencia (4 to 8 threads ) on the socket C32. These are different from the consumer versions, LGA CPUs designed. All previous CPUs Bulldozer base, including the current Piledriver revision are manufactured at Globalfoundries in 32- nanometer SOI HKMG process. A module of the Orochi This, which forms the basis for the types of CPUs Zambezi (FX series) and Valencia ( Opteron series) includes, on an area of ​​30.9 mm ² approximately 213 million transistors.

Piledriver

Piledriver is, the first revision of the Bulldozer microarchitecture. It was introduced in 2012 and should hold in all application areas catchment: in the server segment continues as Opteron, the APU segment codenamed Trinity and as well as a replacement for the first generation of FX CPUs.

In addition to improvements in branch prediction and the utilization of pipelines following new features have been introduced:

  • Support of FMA3, which was introduced by Intel only with the Haswell architecture
  • Instructions for bit ( masks ) manipulation: BMI1 (Intel - compatible) and TBM (AMD specific)
  • Support for floating point half precision: F16C
  • Revised and faster L2 cache
  • New Clock Mesh ( only at Trinity version)
  • Twice as large Level 1 TLB for data ( 64 instead of 32 entries )

In addition to the 2013 Trinity also presented APU Stepping Richland is based on Piledriver.

2929
de