S.M.A.R.T.

The Self-Monitoring, Analysis and Reporting Technology (SMART or SMART ) to German system for self-monitoring, analysis, status reports, is an industry standard that is built into the computer hard drives. It makes for permanent monitoring of important parameters and the early detection of imminent problems.

  • 6.1 Reading from hard drives to RAID controllers

Overview

The analysis of the monitored data is done when the PC is started by the corresponding set BIOS, or other firmware, or by special software that must be installed in addition to the operating system. For example, Microsoft is providing for Windows 95b ( OSR2 ) a driver which is then addressed by this software.

In this case, the program is based on established by the hard drive manufacturers limits for each parameter, as for the temperature. After a long period of time, the software can then predict also expected failures.

The " switching off " of S.M.A.R.T. about in the BIOS settings will not turn off data collection, but only those alerts when thresholds are exceeded. Stores the collected data in a reserved, can not be changed by programs area of ​​the disk.

All monitoring does not slow down the hard disk, because it only logs the event, without corrective intervention. The already do on-disk mechanisms, so the case of shocks, which in turn before SMART existed. Everything else, such as mileage and temperature is detected by specially built-in sensors and smart features. Here, there is a division into "online" parameters that are permanently recorded, and those which are updated in rest breaks when the drive somehow " offline " is.

Meaningfulness

S.M.A.R.T. remains restricted to the hard drive and says nothing about the overall reliability of your computer. A combination of the data obtained multiple hard drives do not exist. Also, the system is not standardized, but leave it to the hard drive manufacturers which parameters within what limits they monitor. Among users, the accuracy of monitoring is discussed. So some temperature sensors are placed as wrong or too optimistic, as they clearly are, for example, at the start of the system under room temperature.

An independent Google trial that went over nine months, all manufacturers and a total of 100,000 hard drives included, 2006 brought the following results: participation of all relevant parameters are 64 % of all failures with SMART predictable. This count includes all others, so noticeable acoustically or as a data error, ignored warning signs. In the remaining one third of all failures, the hard drive itself incorrectly reports as problem free.

The stress of the hard drive had this a far smaller impact on their life than previously thought. Withstands a drive in the first year, plays the idle level up to its regular exchange after four years is no longer relevant. Only in the first and after the fourth year doubled permanent reading and writing the failure rate.

History

1992 IBM recognized that the increasing penetration of personal computers also increased the confidence placed in them in business. Failures were increasingly becoming a financial problem that they wanted to meet with PFA ( Predictive Failure Analysis ). IBM hard disks with that system, the computer with any parameter changes so that its users could react in time for exchange. A little later, was introduced by Compaq IntelliSafe. This filters out irrelevant and reports of the follower software, only the threatening changes and setpoints. Seagate, Quantum, and Conner were involved in the development and adapted it to their products; Compaq itself made ​​no disks. The potential and realizing an industry standard in mind, the disclosure of the system by Compaq and Seagate in particular was forced. Together with Conner, Quantum, Western Digital and IBM also created a fusion of the two approaches under the name SMART

Since 1996 and the start of the ATA - 3 standard, SCSI-3, respectively, four years earlier, it almost invariably belongs to the standard equipment of a hard drive.

The specification for the SMART parameters, however, was before the adoption of the ATA - 3 standard, removed (see links). Accordingly, neither the importance of the stored values ​​nor their scaling is committed ( for the latter see also Common Parameters). Only her location is officially standardized. So there is, strictly speaking, also according to ATA - 7 standard is no way, for example to read the temperature of a plate. But virtually all plates available hold a data format of the ATA-3 design. A auslesendes program complements for clarity or to any parameter ID a name such as "Seek Error Rate ". Over the years emerged as a reliable de facto standard.

Solid State Drives require systemic in many of the previous checkpoints no longer, but others new. For this purpose, however, is still missing coordination between the SSD controller manufacturers. As a result, some new parameter IDs have been added, but sometimes simply provided and existing IDs with a new meaning. From this there arise all sorts of misinterpretations in all SMART programs, which do not yet know the meaning of the new drives. However, a brief analysis is also important SMART parameters in most BIOS versions listed, so when turning on the computer warning messages may appear too defective SSDs. In this case, a shutdown of the SMART self -test function in the BIOS recommended and a manual check with a current program is recommended in the operating system (see SMART programs in comparison ).

Variations on the connection

The implementation of the SMART standard differs depending on the hard drive connector on the PC. Of which there are two: ATA and SCSI standard. Both know the HEALTH STATUS. This indicates the drive firmware, whether it classifies as "okay" or " problematic ". Both standards also support the reading of the temperature and several variants of the self-tests and logbooks.

In ATA hard drives numerous values ​​and their limits can be queried also have a concurrent software. Thus, the software or the user can classify more accurately whether and why an error will occur. However, these parameters are not accurately standardized and differ in scope and interpretation, even between models of a manufacturer.

The commands and data formats for all these functions are implemented, however, completely different for ATA and SCSI.

External hard drives are different from the internal by the separate housing and connecting. In this there are again more customary standards.

SCSI commands are transmitted basically on the USB port. The USB-connected hard drives are but almost without exception, no SCSI, but (S ) ATA disks. Therefore, there is no direct access to the SMART functionality is possible. However, there are "USB ATA Bridges" (adapter), allowing tunneling of the ATA command by the USB connector. The drivers for external hard drives do not support, however. Chip manufacturers like Cypress, JMicron or SunPlusIT use vendor-specific commands. Some programs mastered these commands ( see section SMART programs in comparison ). Recently there is also USB SATA Bridges that support the vendor-independent standard SAT.

The Firewire connection - especially with Apple computers usual - enables the transmission natively, Mac OS X, but does not use the.

Per eSATA connected drives are how their internal SATA counterparts easily readable.

About Serial Attached SCSI ( SAS) connected Serial ATA drives can be checked when the corresponding SAT commands are available.

For tape drives there to S.M.A.R.T. analogous functions called TapeAlert. They are used to warn of worn belts.

Evaluation

Usual parameters

Each value is first stored as raw data. This is then sorted for a better understanding on a scale of values ​​from 0 to 100, 200 or 255. The different scales serve a finer gradation, where the manufacturer is considered useful. Starting with the scale maximum, the value (Value ) failures or age approaches zero. Often the critical limit (threshold ) but already settled much further.

  • Unrecoverable error while reading from the disk, leads to reread.
  • Indication of a positioning problem of the read-write unit.
  • Also unexplained by the manufacturer, fill in some brand new Seagate drives scale values ​​far below 100.
  • Unrecoverable error while reading from the disk, leads to reread.
  • Indication of a problem with the disk surface.
  • Some drives have here very high raw values ​​are not comparable between models of a manufacturer. With newer Seagate drives he is falsely identical to the case of hardware ECC Recovered. Failure are relevant only the scale values ​​.
  • Corrected bit errors when reading.
  • Can be used on problem with the disk surface point.
  • The high data density of today's hard disks has the consequence that when reading the error correction inevitably strikes. [ Evidence ?] Even very high values ​​here are not a cause for alarm.
  • Samsung drives the P80 series bear here often mistakenly very low scale values ​​. Generally very high raw values ​​are common, because of changing from one technology to a newer (English: "technology change" ) are not even comparable between models from the same manufacturer. You climb on reads, since only occurs error correction. Failure are relevant only the scale values ​​. Rarely, the values ​​are also called " ECC On- the-fly ".
  • Unrecoverable error when routine check of the hard disk surface.
  • Indication of a problem with the disk surface.
  • Raw nonzero values ​​tenfold the failure probability. This follows the first "Scan Error" usually within half a year.
  • General data throughput / efficiency of the disk.
  • Strongly suggests braking problems in the back drive.
  • Average of the start time in ( milli) seconds.
  • Indicated for problems with the engine or the disk bearings.
  • In brand-new Maxtor and Quantum drives it came here in the first month often false alarms.
  • Number of start / stop operations of a drive (including standby).
  • Points to wear, since this process hard drives most stress.
  • Number of used replacement sectors.
  • Points to surface problems, because only then automatically a spare sector replaces a previously used.
  • If this counter is non-zero, the default probability is increased fivefold. Such follows the first " Reallocation Event " usually within half a year.
  • Mileage in hours or seconds (including standby).
  • Indicates wear, but says nothing about the circumstances of use in that time from.
  • In some models of Maxtor, for example, at the Maxtor DiamondMax 10 6L250S0 are the minutes.
  • Park operations of read-write unit on the left hand side the plates plastic ramp.
  • Most only for notebook drives. Indicates wear; provided are around 300,000 - the raw value indicates the previous ones.
  • Parked is the read-write unit when turning around or after 10s idle. This creates a sometimes irritating noise. If the notebook case basis, but encounters as nothing more on the magnetic disks. The shock strength is tripled to around 1000 g. Also the Switch on and off is gentler, as the unit is not grinding lowered to a special area of the plates ( " Landing Zone ").
  • Temperature of the drive in ° C.
  • Since some drives also store the maximum and minimum value, a former hypothermia or overheating during operation can be seen. The value provided as the raw value will contain one behind the other all three numbers.
  • High temperatures ( above 40 ° C ) have effect only after three years. This year they doubled the probability of failure. After that, they lose their meaning again. Across all age averaged temperatures are below 25 ° C is far more dangerous than those above 40 ° C. Double 20 ° C, 15 ° C to triple the failure rate; this was measured to 52 ° C. Some manufacturers use inaccurate or wrongly-placed sensors.
  • Number of CRC errors occurred.
  • Cause may be faulty cables, dirty contacts, overclocking or bad hard disk driver. The transmission is repeated more slowly in stages. If this fails, access is blocked to the hard disk.

There are many other parameters, also exclusive manufacturer. Complete lists can be found in the literature section of the web links.

Example

The evaluation of important SMART parameters the example of a Hitachi 250 GB hard drive, connected via Serial ATA and read out with the smartmontools.

Analysis: According to its own assessment of this disk drive is perfectly fine. Nowhere is the limit was approached. Only the 55 is changed sectors are of concern, according to a Google study. This value should be kept in mind. Increases made ​​after the cable change but the " UDMA CRC Error Count" and not the cooling is improved so that about 45 ° C ( Temperature) are not exceeded, the drive is actually easily re-used.

Self-test and error log

In addition to ongoing logging parameter above, there are other tests. Some manufacturers are launching this periodically idle, the other left to the user. He can perform with some of the programs offered. What is finally tested, is just as vendor- determined. Standard is a brief test with testing of all parameters, followed by sampling of the readability of the individual disks. The long version replaces the sample against a complete check.

ATA -6 adds two more variants. The one recommended by a drive transport ( called Conveyance - similar to the short test ) and the other allows the test itself selectable areas of the drive (Selective - similar to the long test).

Errors that occur not only in the parameter values ​​included ( result as: "Error rate: high") since 1999 and the ATA -5 standard, but verbosely logged. Listed here are the errors that the period since the last power- and the five previously executed steps. For the results of the above self-tests, there is even a separate table. In general, apply only current error accumulations as a concern.

The hard disk supports renewing their firmware is the same ( same, with which version ) cleared the error log when rewriting. The parameter values ​​are usually obtained.

S.M.A.R.T. programs compared

In the following table lists the known programs are listed for reading the SMART data.

Reading hard drives to RAID controllers

  • Only the controller manufacturer has the necessary information to read out the SMART status of the RAID array. So he has set this via API function with its driver. However, not all do - and if so, often vendor- specific, and only for selected models. From which manufacturers the program knows the functions is considered in the table.
  • A direct response of the controller without using the driver functions is more successful, but also potentially unstable and therefore acceptable only under DOS.
  • Also referred to in the specifications of the controller a SMART support, which is often only a controller internal. The driver then no passes on the information to programs; some only the one drive.
  • Always read hard drives are in the so-called software RAID (ie composites, which are managed by the operating system) and those that are set to RAID controllers as single drives instead of the composite. Therefore, it will not be counted.

Swell

699132
de