Dragon NaturallySpeaking

Dragon NaturallySpeaking is a speech recognition software that converts the spoken word into text on the screen or control commands for the computer. Dragon NaturallySpeaking is to buy in various editions for residential and business users, independent provider provide additional vocabularies. In addition, supplementary programs are offered with extended instruction sets for controlling the computer by independent providers.

The beginnings of the program and the original manufacturer's based on a prototype of voice recognition software, which had been seventies, developed in the early eighties of the 20th century by James and Janet Baker end, while at first at Carnegie Mellon University, later were an IBM research center operates. The Baker founded Dragon Systems in May 1982. Precursor of Dragon NaturallySpeaking was the DragonDictate software that was written for DOS and still did not allow for continuous speech recognition. Dragon NaturallySpeaking 1.0 appeared in 1997. In 2000, the company was acquired by Lernout & Hauspie. From its insolvency in 2005, the American company ScanSoft acquired the rights to it, which is now known as Nuance Communications.

Operation

Dragon Naturally Speaking is a voice recognition software on the PC. The software requires statements that are spoken into a microphone connected to the computer, to text or control commands. It is a speaker-dependent (adapted to the user erforderndes ) front-end system, ie one in which the implementation of the language is at the user's computer in text and visible immediately after the dictates of the utterance ( " what you say is what you see " ). In proportion as the voice recognition feature of smartphones, at which the reaction of the transmitted via the Internet acoustic information is at central servers and the text is then transmitted back, resulting thereby significant advantages in speed and accuracy of implementation and the ability to adapt to vocabulary and needs of the user. Depending on the issue, the implementation is supported previously ( with a voice recorder or a recording program ) recorded dictations of Dragon NaturallySpeaking.

The acoustic signals are to implement - simply put - digitally sampled and classified enable an approximate assignment to sounds in the context of " acoustic model " according to characteristics. The selection is done randomly using various types of hidden Markov models. This acoustic model is adjusted in an initial training and continually during use, and in particular by the correction of recognition errors, the voice of the particular speaker. The " detected " sounds are then employed statistical hypotheses about the predicted each most likely words. When similar or identical -sounding sounds / words, therefore the software decides on the basis of multi- phrase in the utterance of the speaker, which will appear as text on the screen. This is based on a language model ( linguistic model ) which describes these probabilities. Details are discussed in the article for speech recognition in more detail. The recognition process is running on current hardware, usually as fast in the background that the spoken text appears on the screen almost immediately after the end of the utterance.

On delivery, the software standard language models for each input language based on an analysis of the probability of word sequences within a very large text corpus by the manufacturer contains. When setting up the software on the user's PC - the system of a user profile - may this standard be adapted language model by analyzing existing texts of the respective user to his writing style. This also occurs continuously during use (so-called model optimization ). In particular, the systematic correction of incorrectly recognized words and word combinations by means of the appropriate program functions is important for these continuous improvement of the linguistic model (but also for the acoustic model ). Dragon NaturallySpeaking used at the prevailing version in the language model 11 " BestMatch IV" correlations of up to four words, so-called quad -grams. In version 12 Dragon depends on appropriately powerful PCs ( multi-core processors and RAM of greater than 2 GB RAM), a user profile with the language model " BestMatch V ", which can analyze five- word sequences.

The language model works exclusively on statistical methods, not according to grammatical rules. The recognition accuracy is best due to this operation, if contiguous utterances are spoken, best whole longer sentences. Accordingly, the software is geared to the recognition of well-structured language, such as for dictation of letters, reports and other factual texts typically, but not about the implementation of recorded oral everyday utterances with many set fractures, Auslassungungen and Füllseln, and certainly not for the direct implementation calls multiple speakers in text.

The language model of Dragon NaturallySpeaking is based on a supplied vocabulary (word - lexicon ), which contains approximately 150,000 word forms ( in the foreground active vocabulary) on delivery. Since the software does not use grammatical rules, not just the word stems, but all word forms are stored in the vocabulary. This vocabulary can be supplemented by about another 150,000 word forms user-specific by analyzing one's own texts on unfamiliar words and word forms, but also by the correction of recognition errors. To keep the rate of the reaction in an acceptable range, the vocabulary is divided into various "slots ", ie a vocabulary foreground and a background vocabulary (whose size is estimated at around 250,000-300,000 entries). For active access only the foreground vocabulary is kept in memory, words from the background vocabulary will be added after they once used ( and thereby erroneously detected and corrected ) were.

The language model of the software is geared to a specific language, that is, it is not possible to dictate using the same user profile texts in different input languages. To dictate in another language, but a corresponding separate user profile must be created and accessed. The German version of Dragon NaturallySpeaking lets the system of user profiles in German and English. The software is also available for Spanish, French, Italian, Dutch and Japanese, but not in the form of individual modules, but in the form of separate versions. Common foreign words are contained in the provided vocabulary; the addition and reliable detection of other foreign words whose pronunciation is not the usual phonetics in English, can be done by the user by such words with an onomatopoeic " spoken form " are stored in the lexicon (Example entries: written form of " breakage ", spoken as " brehkitsch ," or written form " CIA " spoken " Iit ei ai ").

The name of the software " Naturally Speaking " is derived from the continuous speech recognition feature. Unlike in speech recognition systems, which were used until the mid- nineties of the 20th century, and also unlike the previous DragonDictate, the speaker between each word has no unnatural pauses make (discrete speech), but can speak continuously. The software can determine the ( probable) word boundaries based on the methods described himself from the sound sequences. However, a structured, clear ( but not overly articulated ) and liquid speech is the best guarantee of success (the manufacturer recommends, to be guided by the speech of newscasters ).

System Requirements and Features

Dragon NaturallySpeaking is running under the operating system Windows in versions Windows XP (32- bit), Windows Vista, Windows 7, Windows Server 2008, Windows Server 2012 and Windows 8; for 64- bit Windows operating systems since version 10.1. For Mac OS Nuance sells a on the same speech recognition core ( version 12 of Dragon NaturallySpeaking ) builds software that runs on the name Dragon Dictate ( currently version 3.0) hear (but should not be confused with the above precursor of Dragon NaturallySpeaking and the functionality of corrections and control over the computer behind Dragon NaturallySpeaking still remains ).

Since version 11 sets NaturallySpeaking on multi-core processors, a multi-pass technique one in which the same utterance analyzed in parallel on two processor cores and the most likely utterance is determined using each different Hidden Markov Models to increase in this way the reliability. To ensure sufficient computing capacity to retain for other tasks, in particular the target applications, in which is in dictates, therefore, the use of modern quad-core processors recommended. Processor and amount of RAM as well as a sufficiently large 2nd or 3rd level cache also have considerable influence on the speed of implementation. With a powerful current PC, the text usually appears immediately after an utterance is spoken.

Although the program is relatively high demands on the amount of memory and the capacity of the processor, the user interface is unobtrusive "Dragon Bar", which can also be completely hidden. For the start screen of Windows 8 there from version Naturally Speaking 12.5 a "Dragon Audio Bar " which allows you to turn on and off the microphone outside of the desktop environment. Since version 11 also may still appear for a sidebar that lists contextual possible control commands. The concept goes as meaning that the user dictates directly into target applications such as word processing programs, in which appears the spoken text without keyboard input. Likewise compatible application programs are controlled through voice commands ( such as saving or printing documents, formatting ); these functions will not last valued by users with restricted mobility. For communication with application programs Dragon NaturallySpeaking accesses the MSAA (Microsoft Active Accessibility ) interface and the Microsoft Speech Application Interface SAPI 4 (not the successor version 5) back. The full command set to control applications is therefore only in accordance compliant application programs such as Microsoft Word ( version 2013 is compatible only from NaturallySpeaking 12.5) or Internet Explorer available in the software referred to as " standard window " or " window with full control over text " ( in earlier versions also as a Select-and -Say ). Other software such as OpenOffice Writer, Mozilla Firefox or Mozilla Thunderbird are partially supported. Browser - based cloud applications such as Outlook.com are only partially supported, not such as the Microsoft Office Web Apps. Dragon NaturallySpeaking also has its own simple word processor " DragonPad " which is functionally similar to Microsoft WordPad, as well as a dictation, which can be used in non-compatible target applications for the transfer of dictated text. Besides compliant application programs you can add Dragon NaturallySpeaking the Windows interface with voice commands to control (8 restricted the home screen of Windows).

Recognition accuracy

The software requires an initial, about five minutes ahead of speaker training, which can be also skipped since version 9, as well as possible an analysis of your own texts of the speaker. The detection rate is a good eintrainierten profile depending on the quality of the hardware and clarity of speech currently enjoyed by more than 98 percent. Also, by using a better than the one supplied by the manufacturer microphone, the recognition accuracy will be positively influenced.

Traditionally considered, the sooner a limited vocabulary is used ( as with doctors or lawyers ), the better the recognition rate. Due to increasing the performance of the application and the hardware exists in the current versions of the need to use their own vocabularies for specific applications, practically no more. However, still applies that the existing vocabulary words also can not be detected correctly not already.

An exception is ( in the German version ) the auto- generation of compounds. Typical components of compound words are additionally provided in the vocabulary with features, after which it with other words to compound words (possibly with joint -s) are drawn together when they are dictated immediately before or after. This function is also statistically controlled and therefore sometimes also supplies the wrong compound words, eg in " Composed words".

Such cases are among the few in which recognition errors are noticed by a spell check in the target application - as opposed to incorrectly recognized words as in the ( fictional) example:. "The trainees went into the void " A proofreading dictated using voice recognition text is therefore recommended, after which the manufacturer's license agreement specifically advised.

Versions

293492
de