Interactive voice response

With a speech dialog system (English Voice Portal ), even IVR (Interactive Voice Response), callers can perform partially or fully automated natural language dialogues over the telephone or other acoustic media.

Example:

Caller: " What is the day high and the current price of the stock Siemens in Frankfurt? ".

Answer the speech dialogue system: " The day high of Siemens in Frankfurt xxx, yy € and up to date, Siemens has stood at xxx, yy euro. "

IVR is used in practice as a term for any type of phone navigation. So also for the multi -frequency dialing ( " For the sale please press the '1 ' for service please press the, 2', ...").

Basic structure

IVR systems consist of the following components:

Speech recognition (Automated Speech Recognition, ASR ) with grammar / semantics for interpreting voice inputs (Natural Language Understanding NLU )
Speech synthesis ( text - to-speech, TTS) for conversion of text into computer-generated voices for speech output,
Dialogue sequence interpreter ( eg VoiceXML browser) as a front end,
Business logic and databases for integration into business processes as a backend.
Interfaces to the IP network, telephone network, DECT systems or audio connections,

Figure 1: Architecture of IVR systems

There are biometric method for speaker authentication ( "The voice password" ) certified and available by the Federal Office for Security in Information Technology (BSI ) as safe.

Due to the development of voice recognition in the last years existing dialogues are also possible of sentences. Natural language ( Natural Language Understanding NLU ) is the intelligence of the dialogue partner was required. To use NLU effective, the artificial intelligence of the dialogue system must keep pace with the possibilities of the speech step. After the core technology is now regarded as largely mature, get new disciplines into the focus of the developers of spoken dialog systems, such as dialog design.

Fields of application

IVR systems make it possible to use language as a further Ein-/Ausgabemedium next to the keyboard, mouse and monitor.

The types of applications can be technically divided into

Pure voice services: provide only interaction on language and
Multimodal applications: combining voice interaction with other Ein-/Ausgabemedien (eg graphical interface).

The following are the types of applications are further divided into user groups, commercial voice services ( Business2Consumer, Business2Business ), in-house voice services and equipment integrated voice services (hardware and software control, computer games).

Commercial voice services

Pure voice services of a commercial nature encounter with German consumers as of 2009 most still rejected. Since the user can not be trained personally, the functioning of the systems do not knows and feels bothered by possibly through the voice service rehearsed advertising, end users often take a negative attitude towards voice services. The following fields of application are exemplary of the commercial area:

Services to retail customers ( business-to -consumer ): Information and advice on the phone, such as timetables and flight plans
Automatic order / reservation on the phone, such as ticket hotline, catalog orders, telephone banking
Auto Attendant / Mediation
Pre-qualification / authorization of callers, such as querying the customer ID or PIN
Intelligent waiting fields of call centers
Sturgeon Announcement Management
Televoting contests on the phone

Services between business processes (Business-to - Consumer): ( No known reacted solution. )

Internal voice services (for staff)

Internally, language processing is currently little used, although there are great potentials: The internal users can be trained in the operation and he works regularly with the voice service. This leads to more efficient use in a high acceptance by the user. The process times of internal processes can accelerate at the same time reducing the error rates in data entry due to reduced media breaks heavily.

Receipt of goods
Quality testing, load test, Produktendabnahme
Stocktaking
Inspection of facilities
Process-oriented event message
Remote and on site Diagnostic

Devices Integrated voice services

Be accepted only slightly better at 2009 device built-in dialog systems. However, a qualitative speech recognition requires high computing power with a corresponding demand for energy, so that satisfactory functioning solutions initially only in on-board systems in individual luxury cars, computer games or special application software are to be found. Examples of device- integrated speech recognition are:

Hands-free devices in vehicles
Navigation systems in motor vehicles
Dialing into mobile phones over the persons name
Computer Games As of 2009 there first computer games, voice input and output in their user interface and record the games concept. Since computer games are a major driver of technology already in the graphics area, they could perhaps in the future play a similar role in the speech technology.

Closer co-operation between man and machine, eg for the use of industrial robots in craft shops, is a current research topic.

Advantages and limitations of interactive voice response systems

Using language can be directly and naturally communicates over conventional graphical user interfaces:

Advantages of voice interaction The hands and the view remain free ( improved ergonomics and process time ).
Language is the people immediately accessible ( greater qualification measures and longer learning curve for surface operation omitted).
The requirements for the terminal is low ( it takes is a phone or headset with a good microphone).
The widespread availability of (mobile) phones enables new degrees of freedom while interacting with software applications.
Modern speaker-independent recognition understands statements made by various persons without training ( multilingual applications possible; tolerated up to a certain degree even dialects ).
All information elements are directly accessible ( no tedious passage through hierarchical menus and long lists ).
Within a specific context, complex sentences understood and processed automatically (for example, for the reservation of a company car via a telephone connection: ". Hello I would like a car for the route Stuttgart - Darmstadt on Thursday from 6 to 22 clock ").
Visual tasks requiring alertness. Dialogs can be virtually " fly" out.

This enormous flexibility of language technology creates new potential for innovation, for example, for integrated business processes and their coordination.

NLU is the most natural form of computer interaction, but the possibilities of the presentation of information compared to visual media are restricted:

Limits of language interaction No 100-percent detection Problems are very extensive vocabularies ( increased similarities in the pronunciation of various terms ).
Also in the foreseeable future no perfect detection ( variability of the human voice ).

Recurring environmental noise can be signally and software technology now well filtered out.
The filtering of human voices in the background, however, remains problematic.

The user must make only familiar with the navigation and functionality of a voice application. Solution: Tiered application modes for beginners and advanced users to make efficient use.
With regular use of persuasive processing times are possible.
Human perception can be long lists of visually look good; acoustically, however, the listing of many information in one piece is difficult to understand.
For example, most Internet users first use simple search terms and check the results, and then to refine the search. This usually takes two to three quick iterations to complete, in order to obtain the desired result set. This approach would be in " the spoken results " time-consuming and thus not practical.

You have to know " the rules". Computer does not "understand" - it's just a voice " recognition ".
Today's speech recognition technologies correlate the spoken words with a list of expected utterances, which is limited in size to a few thousand entries. In developing a dialog system assumptions must be made, which could be asked. Based on this question / answer dialogues need to be developed that lead the caller to a specific piece of information. A dialogue could then for example look like this: "? Look for information about a company, a movie, traffic information ..." " company", "What kind of company" " restaurant ," What kind of restaurant "" Chinese, " In! what street, neighborhood or near which local? "Although this approach may work and may be useful for the caller, it is still far from the possibilities one has with a free- text input to a search engine on the Internet.

New culture technique Linguistic interaction with computers is a new cultural technique. Both users and developers will agree only in the course of time on common and well-known dialogue concepts ( blocks).
One should therefore not be irritated by poorly designed applications, but set up and use economic solutions.
"Language is the bike under the user interfaces It's great fun [ ... ], but it bears only a small payload Sober advocates know that it will be difficult to replace the automobile. .. Graphical user interface " ( " Speech is the bicycle of user - interface design, it is great fun to use [ ... ], but it can carry only a light load Sober advocates not know that it will be tough to replace the automobile. Graphic user -interfaces " Ben Shneiderman, 1998).

Natural Dialog Systems Natural user interfaces designed to allow the user (ie, especially without special training or experience ) to get the simplest possible way to the desired information. Current IVR interfaces but require the user usually that it is familiar with the operation of such a system. Furthermore, it is often not use the power of the natural language made , as their interpretation is still extremely complex.
The naturalness ( adapted to the human operator ) of a dialogue system can be described by the following properties: adaptivity
Implicit confirmation
Demands and Ambiguitätenresolution
Corrections
About answering
Interpretation of negations
Discourse and rear covers
Interpretation of slang
Type of formulation / speech generation
Social behavior
Quality of the speech recognition and synthesis

In addition to the end-user and the developer is to look at. As long as there are no easy -to-use tools for creating dialogue systems, and the results will not be user- friendly, " A comparison of the systems, however, striking that many of the properties of natural dialogue systems have not yet been implemented This is mainly due to the lack of a. all-encompassing dialogue modeling and implementation tool. "

Criteria for the use of speech dialog systems

Following criteria favor the use of speech technologies in business applications:

The employee ... has little computer experience
Has a Schreib-/Leseschwäche
Only speaks foreign languages

The activity calls ... hands free and free view
Easy to grasp input into words
Mobility
Frequently repeated tasks

The work environment results in difficult visual perception
Lack of space, no screen / keyboard
Exchange between activity and not ergonomic computer workstation or time-consuming

Landline Digital Enhanced Cordless Telecommunications Biometrics Speaker recognition Passenger information system Attendant console Cooperation Sample space

236114