IM – Voice Bridge

Generalized XMPP-VoIP Transducer (Speech-to-Text Pivot) Bridge = Instant Message-to-Voice (Text-to-Speech / Speech Synthesis) + Voice-to-Instant Message (Speech-to-Text / Speech Recognition)


A lot of the tawdriness associated with chatterbots results from the current text interface impasse.  A truly interactive voice interface would literally do wonders for the “face” of AI, and help it to break through the glass ceiling where its been stuck for some decades.  Clearly, Internet infrastucture is moving in the direction of the mobile platform, which is even more suitable for voice interaction than the desktop of yore.

An IM-Voice Bridge goes from text to speech as well as from speech to text.  It takes standard instant messages and lets you hear them.  It also makes possible actually speaking into an instant messenger, in other words voice to text.  A good application would include an animated avatar, or talking head, with lipsync.  A really good application would include a customizable avatar to varying degrees, such as hair, skin, clothes, etc.  Most instant message applications are based on the XMPP protocol, formerly known as “Jabber”.  Recently, even the big holdouts like Skype and Facebook have come into line with XMPP standard for instant messaging.  In fact, XMPP instant messaging is the primary transport mechanism for conversational AI engines; in other words, textual conversation travels across the Internet in the form of an instant messaging.

Varying elements of this concept have been implemented in assistive technologies for the benefit of disabled people.  There have also been a number of similar products in the past, which for one reason or another became defunct.  In the past, speech-to-text reporters (STTR), or “captioners”, were people who manually transcribed speech to text for use by the disabled.  Assistive technologies include both Screen Readers and Voice Browsers.  More recently, there have even been moves toward “voice operating systems”.  For example, screen readers and voice browsers have been used to speak (hear) instant messages.
My primary interest is to be able to talk with AIs via any IM-Voice Bridge; however, such as system could equally be used by anyone for “hands free” communicating, while driving for example.  Instant message and SMS are indeed very close in compatibility.  Increasingly, there are mobile apps for hands-free SMS while driving.

If there were an existing application, such as an IM-Voice Bridge, then I could easily interface with any AI Engine via XMPP (formerly “Jabber”).  Ideally, an IM-Voice Bridge would include its own lipsynced avatar (animated avatar).  However, there still is no decent consumer IM -Voice Bridge application available, neither on the web nor on mobile (with or without an avatar).  I want a talking avatar app for instant messaging, plug and play any XMPP IM account on backend.  After investigating assistive technology such as screen readers and voice browsers, I’ve concluded they are not adequate.

Currently, there no web-based voice-in / voice-out solution available for instant messaging.  I am mystified as to why there seem to be no plug and play animated web avatar front-ends for IM.  There ought be a product that can speak instant messages on the fly (IM-to-Speech).  Speech-to-text (STT) (aka speech recognition) may currently be provided (1) on the desktop by Windows 7 speech tools (see Microsoft Speech API), (2) in the Chrome browser with Google Speech API, or (3) on the mobile platform with iPhone or Android speech tools.  I’ve searched hard for any Windows7-compatible desktop avatar (talking head) frontend, which can easily accept *ANY* IM-XMPP.  (Note, XMPP was formerly known as Jabber.)  Generally an Avatar System will include text-to-speech (TTS) (aka speech synthesis) with lip-sync.  I have succeeded in uncovering two enterprise level voice-IM gateways, by 4DK and Gold Systems.  And, I’ve asked many in the industry for advice, including:



Cloud-based Telephony APIs

@kookoo_ozonetel | @Twilio | @Tropo | @Plivo


CPEeK-Up” .. telephony text <=> speech bridge to help speech-disabled people

Defunct .. IM Speak! .. “adds speech and personality to incoming messages from your favorite Instant Message program” .. 2004 text to speech (TTS) plugin for instant messaging (IM) .. by @rebootworks

Haptek KATE IM Avatar System .. “avatar which speaks the messages coming and going” .. requires

Desktop .. 2009 text to speech (TTS) virtual agent for instant messaging (IM) .. USD 15 .. Microsoft Agent

Enterprise Solutions

4DK’s ‘SCout’ enables a PoC mobile voice communication to be delivered as an instant message to GTalk (and back again)
Gold Systems Vonetix 7 Personal Attendant Demo” .. IM text to speech integration by @GoldSystemsInc (formerly

Google API (+ App Engine?)

Wondering how to bridge undocumented @googlechrome speech recognition with undocumented @googlevoice API .. in order to *speak* to chatbots

Mobile (iPhone & Android: Hands Free SMS) .. by @asiayeah .. “Speak instant messages (text-to-speech support)” [Palm OS platform]

Listen To Incoming SMS On Your Android Phone

Listen To Incoming SMS And Respond via Speech-To-Text [Android]

Open Source

Listen to Text and Instant Messages with Gespeaker ‘Text To Speech’ Utility ..”provides text-to-speech and speech-to-text services for Instant Messaging clients for the blind” (2004)
Ruby Voice Chat Program .. “to make our IM programs read the received messages aloud” .. by @unnitallman
How to Make Your Instant Messenger Talk in Ubuntu Linux

Screen Readers

Wikipedia has a helpful “Comparison of screen readers“, which includes a listing of “Contemporary screen readers“.  Screen readers will be for Windows, Mac, Open Source, or mobile platforms.  Apple iOS has a built-in screen reader accessibility feature, called “VoiceOver“.  (Apparently, Apple iChat does do IM to speech, but don’t know about speech to IM.)  Generally speaking, assistive screen readers do not provide an ideal solution for people without special needs.  Perhaps, so-called voice browsers are a step further in that direction.  Fire Vox is a browser plugin, an open source extension for the Mozilla Firefox web browser that transforms it into a self-voicing application.  SkypeTalking is a Python program that reads incoming and outgoing chat messages using the Skype API and your screen reader.  WebAnywhere is a web-based screen reader for the web, and so requires no special software to be installed.
[Can a “Voice Browser“, such as, be used to hear and speak generic instant message IM sessions?]


SIMBA (SoftBridge for Instant Messaging Bridging Application) .. text-based IM client to communicate with an IP phone

SIMBA enables a text-based IM client to communicate with an IP phone, a telephone or a cellular phone. Similar to the Deaf Telephony SoftBridge, SIMBA provides a bridging service, which enables a Deaf user with a text-based IM client to communicate with a hearing user with a telephone or a cellular phone. Using an IM client, a Deaf user sends a text message to a telephone user through SIMBA. SIMBA establishes a call to the telephone user and converts text messages to speech via a Media Adapter Server (MAS). When the called user picks up the phone, he/she hears the synthetic voice and speaks to the Deaf user. After receiving audio from the hearing user, SIMBA then controls the MAS to convert the incoming audio stream to text and sends the text message to the Deaf user.


VoiceXML to Instant Messaging Demo


See also:

IVR Agents | Jabber, XMPP & Chatbots