Interactive voice response

With a speech dialog system (English Voice Portal ), even IVR (Interactive Voice Response), callers can perform partially or fully automated natural language dialogues over the telephone or other acoustic media.

Example:

Caller: " What is the day high and the current price of the stock Siemens in Frankfurt? ".

Answer the speech dialogue system: " The day high of Siemens in Frankfurt xxx, yy € and up to date, Siemens has stood at xxx, yy euro. "

IVR is used in practice as a term for any type of phone navigation. So also for the multi -frequency dialing ( " For the sale please press the '1 ' for service please press the, 2', ...").

Basic structure

IVR systems consist of the following components:

  • Speech recognition (Automated Speech Recognition, ASR ) with grammar / semantics for interpreting voice inputs (Natural Language Understanding NLU )
  • Speech synthesis ( text - to-speech, TTS) for conversion of text into computer-generated voices for speech output,
  • Dialogue sequence interpreter ( eg VoiceXML browser) as a front end,
  • Business logic and databases for integration into business processes as a backend.
  • Interfaces to the IP network, telephone network, DECT systems or audio connections,

Figure 1: Architecture of IVR systems

There are biometric method for speaker authentication ( "The voice password" ) certified and available by the Federal Office for Security in Information Technology (BSI ) as safe.

Due to the development of voice recognition in the last years existing dialogues are also possible of sentences. Natural language ( Natural Language Understanding NLU ) is the intelligence of the dialogue partner was required. To use NLU effective, the artificial intelligence of the dialogue system must keep pace with the possibilities of the speech step. After the core technology is now regarded as largely mature, get new disciplines into the focus of the developers of spoken dialog systems, such as dialog design.

Fields of application

IVR systems make it possible to use language as a further Ein-/Ausgabemedium next to the keyboard, mouse and monitor.

The types of applications can be technically divided into

  • Pure voice services: provide only interaction on language and
  • Multimodal applications: combining voice interaction with other Ein-/Ausgabemedien (eg graphical interface).

The following are the types of applications are further divided into user groups, commercial voice services ( Business2Consumer, Business2Business ), in-house voice services and equipment integrated voice services (hardware and software control, computer games).

Commercial voice services

Pure voice services of a commercial nature encounter with German consumers as of 2009 most still rejected. Since the user can not be trained personally, the functioning of the systems do not knows and feels bothered by possibly through the voice service rehearsed advertising, end users often take a negative attitude towards voice services. The following fields of application are exemplary of the commercial area:

  • Services to retail customers ( business-to -consumer ): Information and advice on the phone, such as timetables and flight plans
  • Automatic order / reservation on the phone, such as ticket hotline, catalog orders, telephone banking
  • Auto Attendant / Mediation
  • Pre-qualification / authorization of callers, such as querying the customer ID or PIN
  • Intelligent waiting fields of call centers
  • Sturgeon Announcement Management
  • Televoting contests on the phone
  • Services between business processes (Business-to - Consumer): ( No known reacted solution. )

Internal voice services (for staff)

Internally, language processing is currently little used, although there are great potentials: The internal users can be trained in the operation and he works regularly with the voice service. This leads to more efficient use in a high acceptance by the user. The process times of internal processes can accelerate at the same time reducing the error rates in data entry due to reduced media breaks heavily.

  • Receipt of goods
  • Quality testing, load test, Produktendabnahme
  • Stocktaking
  • Inspection of facilities
  • Process-oriented event message
  • Remote and on site Diagnostic

Devices Integrated voice services

Be accepted only slightly better at 2009 device built-in dialog systems. However, a qualitative speech recognition requires high computing power with a corresponding demand for energy, so that satisfactory functioning solutions initially only in on-board systems in individual luxury cars, computer games or special application software are to be found. Examples of device- integrated speech recognition are:

  • Hands-free devices in vehicles
  • Navigation systems in motor vehicles
  • Dialing into mobile phones over the persons name
  • Computer Games As of 2009 there first computer games, voice input and output in their user interface and record the games concept. Since computer games are a major driver of technology already in the graphics area, they could perhaps in the future play a similar role in the speech technology.
  • Closer co-operation between man and machine, eg for the use of industrial robots in craft shops, is a current research topic.

Advantages and limitations of interactive voice response systems

Using language can be directly and naturally communicates over conventional graphical user interfaces:

  • Advantages of voice interaction The hands and the view remain free ( improved ergonomics and process time ).
  • Language is the people immediately accessible ( greater qualification measures and longer learning curve for surface operation omitted).
  • The requirements for the terminal is low ( it takes is a phone or headset with a good microphone).
  • The widespread availability of (mobile) phones enables new degrees of freedom while interacting with software applications.
  • Modern speaker-independent recognition understands statements made by various persons without training ( multilingual applications possible; tolerated up to a certain degree even dialects ).
  • All information elements are directly accessible ( no tedious passage through hierarchical menus and long lists ).
  • Within a specific context, complex sentences understood and processed automatically (for example, for the reservation of a company car via a telephone connection: ". Hello I would like a car for the route Stuttgart - Darmstadt on Thursday from 6 to 22 clock ").
  • Visual tasks requiring alertness. Dialogs can be virtually " fly" out.

This enormous flexibility of language technology creates new potential for innovation, for example, for integrated business processes and their coordination.

NLU is the most natural form of computer interaction, but the possibilities of the presentation of information compared to visual media are restricted:

  • Limits of language interaction No 100-percent detection Problems are very extensive vocabularies ( increased similarities in the pronunciation of various terms ).
  • Also in the foreseeable future no perfect detection ( variability of the human voice ).
  • Recurring environmental noise can be signally and software technology now well filtered out.
  • The filtering of human voices in the background, however, remains problematic.
  • The user must make only familiar with the navigation and functionality of a voice application. Solution: Tiered application modes for beginners and advanced users to make efficient use.
  • With regular use of persuasive processing times are possible.
  • Human perception can be long lists of visually look good; acoustically, however, the listing of many information in one piece is difficult to understand.
  • For example, most Internet users first use simple search terms and check the results, and then to refine the search. This usually takes two to three quick iterations to complete, in order to obtain the desired result set. This approach would be in " the spoken results " time-consuming and thus not practical.
  • You have to know " the rules". Computer does not "understand" - it's just a voice " recognition ".
  • Today's speech recognition technologies correlate the spoken words with a list of expected utterances, which is limited in size to a few thousand entries. In developing a dialog system assumptions must be made, which could be asked. Based on this question / answer dialogues need to be developed that lead the caller to a specific piece of information. A dialogue could then for example look like this: "? Look for information about a company, a movie, traffic information ..." " company", "What kind of company" " restaurant ," What kind of restaurant "" Chinese, " In! what street, neighborhood or near which local? "Although this approach may work and may be useful for the caller, it is still far from the possibilities one has with a free- text input to a search engine on the Internet.
  • New culture technique Linguistic interaction with computers is a new cultural technique. Both users and developers will agree only in the course of time on common and well-known dialogue concepts ( blocks).
  • One should therefore not be irritated by poorly designed applications, but set up and use economic solutions.
  • "Language is the bike under the user interfaces It's great fun [ ... ], but it bears only a small payload Sober advocates know that it will be difficult to replace the automobile. .. Graphical user interface " ( " Speech is the bicycle of user - interface design, it is great fun to use [ ... ], but it can carry only a light load Sober advocates not know that it will be tough to replace the automobile. Graphic user -interfaces " Ben Shneiderman, 1998).
  • Natural Dialog Systems Natural user interfaces designed to allow the user (ie, especially without special training or experience ) to get the simplest possible way to the desired information. Current IVR interfaces but require the user usually that it is familiar with the operation of such a system. Furthermore, it is often not use the power of the natural language made ​​, as their interpretation is still extremely complex.
  • The naturalness ( adapted to the human operator ) of a dialogue system can be described by the following properties: adaptivity
  • Implicit confirmation
  • Demands and Ambiguitätenresolution
  • Corrections
  • About answering
  • Interpretation of negations
  • Discourse and rear covers
  • Interpretation of slang
  • Type of formulation / speech generation
  • Social behavior
  • Quality of the speech recognition and synthesis
  • In addition to the end-user and the developer is to look at. As long as there are no easy -to-use tools for creating dialogue systems, and the results will not be user- friendly, " A comparison of the systems, however, striking that many of the properties of natural dialogue systems have not yet been implemented This is mainly due to the lack of a. all-encompassing dialogue modeling and implementation tool. "

Criteria for the use of speech dialog systems

Following criteria favor the use of speech technologies in business applications:

  • The employee ... has little computer experience
  • Has a Schreib-/Leseschwäche
  • Only speaks foreign languages
  • The activity calls ... hands free and free view
  • Easy to grasp input into words
  • Mobility
  • Frequently repeated tasks
  • The work environment results in difficult visual perception
  • Lack of space, no screen / keyboard
  • Exchange between activity and not ergonomic computer workstation or time-consuming
236114
de