Technology > VXML & MRCP Standards
/www.voicexml.org
VXML & MRCP Standards


Supporting emerging standards

As a leading company in the emerging voice biometrics market, PerSay believes in promoting and supporting standards which define the way applications and voice platforms interact with voice biometrics systems. PerSay is an active member of the VoiceXML forum and has utilized its unparalleled experience and professional knowledge, contributing to the upcoming VoiceXML 3.0 standard. The company is also active in the ISO SC 37 committee - defining interchange format for voice biometrics. These emerging standards, along with other standard that are already supported by PerSay (i.e. MRCP), will ensure that any investment in voice biometrics technology, application development, and customer enrollment (including the voice samples) is future proof.




VoiceXML :

VoiceXML (VXML) is the W3C's standard XML format for specifying interactive voice dialogues between a human and a computer. It allows voice applications to be developed and deployed in an analogous way to HTML for visual applications. Just as HTML documents are interpreted by a visual web browser, VoiceXML documents are interpreted by a voice browser or IVR.

VoiceXML has tags that instruct the voice browser to provide speech synthesis, automatic speech recognition, dialog management, and audio playback (Wikipedia).

VoiceXML current version, 2.1, does not support standard tags for speaker verification, this is planned to be included in its next version: VoiceXML 3.0. PerSay is an active member of the VoiceXML SIV (Speaker Verification and Identification forum) defining VoiceXML 3.0 SIV extension. The proposed standard extension for speaker verification has been recently approved by the W3C - the internet standardization body.


Over the past two years, PerSay has gathered invaluable experience working with leading VoiceXML platform and solution providers. Customers using VoiceXML benefit from PerSay's experience and out-of-the-box VXML script samples to significantly reduce application development time.

PerSay has successfully deployed its VocalPassword system and integrated it with a large set of leading VoiceXML platforms. Since VocalPassword exposes a set of Web Service APIs, integration can be easily performed at the voice application level in one of two possible architectural approaches:

Application Server integration:

During the dialog, when speaker enrollment or verification is required, the audio is recorded by the voice browser and sent to the application server.The application server then calls VocalPassword's web service API and provides new VoiceXML script to the Voice Browser which continues the dialogue according to the enrollment or verification results.

The advantage of this approach is that the integration can be done in any programming language as it is implemented on the server side. It also keeps the business logic on the server side.

VocalPassword VXML Integration option 1: Application Server Integration

VocalPassword VXML Integration option 1: Application Server Integration


Voice Browser integration:

During the dialog, when speaker enrollment or verification is required, the audio is recorded by the voice browser and sent directly to VocalPassword for processing using VXML <data> tag.
The <data> tag is used to perform the equivalent of an HTTP get (or post) request and fetch a block of XML data. When the request is completed, the VoiceXML application can parse the results using Javascript and DOM.
The advantage of this approach are:


VocalPassword VXML Integration option 2: Voice Browser Integration
VocalPassword VXML Integration option 2: Voice Browser Integration




MRCP :

MRCP is the Media Resource Control Protocol proposed to the IETF. It is a communication protocol which allows speech servers to provide various speech services (such as speaker verification and speech recognition) to its clients (Wikipedia).
MRCP, as its name implies, is a control protocol that uses requests and responses like HTTP. The protocol does not support audio streaming and data transfer. This would have to be handled by some other protocol such as RTP. MRCP v2 uses SIP as its control protocol.

PerSay VocalPassword communicates with voice platforms supporting MRCP through the implementation of an MRCPv2 listener, which handles MRCP's speaker verification requests, forwards them to VocalPassword for processing and builds MRCP responses accordingly. Using this module, a voice platform can easily add speaker verification capabilities to its services and use VocalPassword the same way it uses other speech processing resources such as speech recognition or TTS engines.


VocalPassword MRCP Support

VocalPassword MRCP Support





DY>