VoiceXML Intro

VoiceXML is a language for creating voice-userWhile originally designed for building telephone
interfaces, particularly for the telephone. It usesservices, other applications of VoiceXML, such as
speech recognition and touchtone (DTMF keypad)speech-controlled home appliances, are starting to
for input, and pre-recorded audio andbe developed.
text-to-speech synthesis (TTS) for output. It is
based on the Worldwide Web Consortium'sVoiceXML Features
(W3C's) Extensible Markup Language (XML), andThe rapid growth of the Web was due largely to
leverages the web paradigm for applicationits open architecture and high-level common
development and deployment. By having ainterfaces to differing computing resources. HTML
common language, application developers, platformand HTTP hide much of the complexity of building
vendors, and tool providers all can benefit frominteractive applications. Just as an HTML developer
code portability and reuse.doesn't need to know how bits paint the screen
With VoiceXML, speech recognition applicationof a web user's PC, VoiceXML shields developers
development is greatly simplified by using familiarfrom many of the complexities of telephony
web infrastructure, including tools and Webplatforms.
servers. Instead of using a PC with a WebVoiceXML has features to control audio output;
browser, any telephone can access VoiceXMLaudio input; presentation logic and control flow;
applications via a VoiceXML "interpreter" (alsoevent handling; and basic telephony connections.
known as a "browser") running on a telephonyThese and other features are described as
server. Whereas HTML is commonly used forfollows:
creating graphical Web applications, VoiceXML canDialogs <menu>>, <form>>
be used for voice-enabled Web applications.Audio Output <prompt>>
There are two schools of thought regarding theSpeech synthesis controls (text-to-speech, or
use of VoiceXML:TTS) <emp>>, <pros>>, etc.
1. As a way to voice-enable a Web site, orPre-recorded audio (files or streams) <audio>>
2. As an open-architecture solution for buildingAudio Input
next-generation interactive voice responseSpeech recognition (ASR)
telephone services.Audio recording <record>>
One popular type of application is the voice portal,Touchtone (Dual-tone Multi-Frequency, or DTMF)
a telephone service where callers dial a phone
number to retrieve information such as stockPresentation logic
quotes, sports scores, and weather reports. VoiceControl flow <if>>, <else>>, etc.
portals have received considerable attention lately,ECMAScript client-side scripting <script>>
and demonstrate the power of speechServer-side/dynamic content generation <submit>>
recognition-based telephone services. These,Event handling
however, are certainly not the only application forBad input <noinput>>, <nomatch>>
VoiceXML. Other application areas, includingShorthand <help>>
voice-enabled intranets and contact centers,<catch>>, <throw>>
notification services, and innovative telephonyBasic Connection Control
services, can all be built with VoiceXML.Call transfer and bridging <transfer>>
By separating application logic (running on aDisconnect <disconnect>>
standard Web server) from the voice dialogsBeyond the scope of the language are application
(running on a telephony server), VoiceXML andlogic, state management, dialog generation and
the voice-enabled Web allow for a new businesssequencing, database operations, and interfaces to
model for telephony applications known as thelegacy systems (e.g., "screen scraping"). These
Voice Service Provider. This permits developersare handled by traditional Web application
to build phone services without having to buy orprogramming techniques.
run equipment.