PerSay's superior core engine is the result of extensive research and development efforts begun in the late 1990s at Comverse Technology. The algorithms originally developed at Comverse were designed to allow law enforcement agencies to intercept calls of known suspects by analyzing voices recorded over multiple telephone channels. Since then, PerSay's research team has invested significant time and effort in improving the baseline verification accuracy, adding new services and features, and adapting advanced state-of-the-art classification algorithms to address the challenges faced in real-life operational environments. The original text-independent algorithms were extended to handle calls with more than one speaker; text-dependent algorithms were added to allow speaker recognition with short pass phrases; and text-prompted algorithms now allow liveness detection with random challenge questions.
The capabilities provided by PerSay's core engine can be divided into basic functionality-required from any speaker recognition system-and advanced features-developed specifically to solve operational challenges.
Basic Functionality:
- Calibration: creating a background model, or a set of background models, using local audio
- Enrollment: creating voice templates using samples of the speaker's voice
- Verification: comparing a single test segment with a voice template and providing an accept/reject decision
PerSay’s core engine provides highly accurate text-dependent and text-independent speaker recognition, using statistical pattern recognition techniques and advanced classification methods. It is completely language and accent independent, with no limitations on lexicon or grammar.
The verification accuracy of both the text-dependent and text-independent algorithms was confirmed via numerous databases, including those collected by PerSay and recorded by customers, as well as databases in the public domain. Independent third-party tests and NIST Speaker Recognition Evaluations showed the superiority of PerSay algorithms over competing solutions, especially in scenarios that matched the operational conditions in which PerSay's systems are being used.
Also, in 2005 and 2006, the National Centre for Biometric Studies at the University of Canberra performed an extensive scientific evaluation of selected text-dependent voice biometrics engines. When compared to ScanSoft and Nuance technology, PerSay delivered the highest levels of accuracy in all conditions, including mobile network environments and others with background noise.
For additional information on the evaluations, go to PerSay's Products Evaluation page.
Advanced Features:
- Using multiple background models, with automatic selection of the best matching background model. This allows working with multiple distinct call types in complex environments.
- Automatic enrollment consistency tests, ensuring minimal failure-to-enroll rate
- Voice print adaptation, allowing accurate tracking of the natural variability in the speaker's voice
- Multiple verification configurations, allowing the system to combine scores from successive verification attempts, significantly reducing the false reject rate
- Group identification, allowing a one-to-many comparison between a test segment and a set of voice templates
- Telephony tones (DTMF, dial tone, fax handshake, etc.) detection and removal
- Analysis of summed (2-wire) calls, allowing voice mining within a large set of recorded conversations, in which more than one speaker is talking