Smartphone: Three Generation of IVR Systems

Thursday, June 28, 2018

Three Generation of IVR Systems

Recently I invented new nice concept for marketing people. Basicallly there are three generations of IVR systems right now:

Generation 1.0 - Static systems based on VoiceXML. It was suprising for me they are in wide use now and a lot of products are dedicated to their optimization/develoment. There are IDEs and a lot of testing tools, recommendations how to build proper VoiceXML. Come on, its impossible to do that. Its something like static HTML websites that were popular in 1995. I dont believe any changes like javascript inside in VXML 3.0 will stop it slow death.
Generation 2.0 - Dynamic systems like Tropo from Voxeo. Much easier, much better. More control over content, more integration with the business logic. I really believe its next generation because it gives developer much more control over the dialog. At least with the power of real scripting language like Python youll be able to implement something non trivial with just several lines of code. Thats AJAX or ROR in speech world.
Generation 3.0 - Semantic based IVR. This consists of three components - large vocabulary recognizer, semantic recognizer on top of it and even-based actions on top of it. Probably also an emotion recognition and more intelligent dialog tracking. As I see the developer has to define the structure of the dialog and provide handlers. Such system was described and developed in CMU long time ago already and also its described in all ASR textbooks. But Im not aware of any widely known platform allowing to do this kind of IVR. Once again it shows how big the gap is between the academia and software developers.

If you are planning to create IVR application with CMUSphinx, please, consider IVR generation 3 as your base technology ;) And dont forget to share the code.

Update:

Very much on the same topic from a wonderful Nu Echo blog:

http://blog.nuecho.com/2010/01/25/voice-apis-back-to-basics/

visit link download