Empowering People at the Bottom of the Pyramid with Voice-Enabled AI
About 3 billion people living at the bottom of the economic pyramid lack literacy skills, hindering their access to technology. Voice-enabled AI presents a solution to bridge the literacy and language gap, enabling semi-literate individuals to interact with computers and smartphones through speech. This technology has the potential to empower marginalized populations, providing opportunities for education, entertainment, communication, banking, and more.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Empowering the People at the Bottom of the Pyramid Using AI Bridging The Literacy Divide and Language Divide with Voice Enabled AI Raj Reddy Carnegie Mellon University Pittsburgh, PA 15213 Oct 6, 2020 Talk at RAISE Conf, Delhi, India
Bottom of the Pyramid The Bottom of The Pyramid, About 3 Billion People on the Planet, Are The Largest and Poorest Socio-economic Group. People with Incomes of less than $2.5 a day About One Billion are Illiterate Most of The Bottom of The Pyramid are Semi-Literate, i.e., Cannot Read and Follow Directions or Read a Newspaper Articles provide a gist to others. Cannot use Keyboard or Touch based Computing Apps If You Are a Semi-literate Person the Only Acceptable Mode of Interaction with Humans and Computers is Speech
People at the Bottom of The Pyramid Have Not Benefited from the Information Revolution Every Human on the Planet Can Hear and Speak If You Cannot Read or Write English, Computers and Smart Phones Capable of Speech-Based Interaction are Essential for Such Populations The Introduction of Low Cost Smart Phones Recently and the Emergence of Siri and Alexa Type of Voice Based Intelligent Assistants Promises to Make the People at The Bottom-of-the-Pyramid First Class Citizens
India's Language Problem People at The Bottom of the Pyramid: About 90% with Total Wealth of Less than 10 Lakhs People with Incomes of less than $5 (Rs 400) a day Intelligent Assistants using Voice Computing a la Amazon Echo is Essential To Empower People at The Bottom of the Pyramid Apps Must Enable The Semi-literate With Speech-Only- Interaction on Smart Phones (No Keyboard or Touch) to do Read Wikipedia Watch Foreign Language Movies Listen to Khan Academy Lectures Order Food Supplies Online Vote Online
Speech to Speech Indian Language Translation for the Semi-literate An Intelligent Agent That Anticipates What You Want To Do And Helps You To Do It Using Local Language and Clarification Dialog Entertainment and Education: Streaming Video Translation Reading Wkipedia: Text to Speech Translation and/or Synthesis Buying and Selling: Voice Dialog Management Communication: Voice and/or Video Email, Chat Banking: Monitor Bank account, Pay Bills Online Voting All Such Apps Will Require Speech Recognition and Synthesis, Spoken Dialog, and Speech to Speech Translation (No Keyboard or Touch) Microsoft Demonstrated Speech to Speech translation in 2012 Semi-literate Populations Will be the Biggest Source of Customers for Speech Based Apps in the Future
Existing Technology Can Create Compelling Apps to Empower the Semi-literate Population of the World Speech to Speech Exists (Microsoft, Facebook and Others) BTW, Current Implementations are Based on Incorrect Business Assumptions Available only for Commercial Languages English to Chinese Speech to Speech Translation Demonstrated in 2012 Text Based Translate App of 2016 has to become Speech Based Languages Supported Based on Commercial Viability Move to Need based Unlikely to result in Killer Apps (10 million users?) Apps Tailored to Semi-literate Populations Will Become Killer Apps 1 Minute Learning Time; Two clicks; and Spoken Dialog No Keyboard or Touch All Such Apps Will Require Speech Recognition, Speech Synthesis, Spoken Dialog, and Speech to Speech Translation Speech to Speech Translation Entertainment (Movies) and Education (Khan Academy) Translate Live Dialog QA Dialog (Siri and Cortana) eCommerce and eBanking English Language Learning - Detect Pronunciation Errors
A Typical App for the Semi-literate An Intelligent Agent That Anticipate What You Want To Do And Helps You To Do It Using Local Language and Clarification Dialog Entertainment and Education: Streaming Video Translation Play Hamlet (BBC Shakespeare) Reading Newspapers: Text to Speech Translation and/or Synthesis Read Eenadu Buying and Selling: Voice Dialog Management Order milk and bread Banking: Monitor Bank account, Pay Bills Charge my mobile device with 1000 rupees Communication: Voice and/or Video Email, Chat Call my Grandson in Seattle Online Voting Voice Dialog to enable the Voting Proces
KaaS That We Dont Yet Know How to Deliver Flash Forward Video
Architecture of Alexa-like KaaS Voice Computing App Alexa is a Mobile App Customized for Each Person Designed to be Non-intrusive, Autonomic, and Device Independent Always On, Always Present and Always Working Always Learning Enduring (Life-Long) Alexa Monitors, Analyzes and Learns From Experience; Learn From Own Experience And Experience of Others And share knowledge with a community of Alexas Automated Discovery of Data and Information Sources Sharing Data Among Alexa Apps: Data, suitably anonymized, can be used to learn appropriate responses for every situation Learning preferences by observing user choices, Learning by task similarity and user similarity, Learning by error correction and Simply learning thru clarification dialog ( does that mean yes? Would you care to define it?). 9
How To Provide A Guardian Angel Intelligent Assistant To Every Man, Woman And Child? By 2020, Everyone on The Planet Has Access to a Phone with Global Connectivity A Phone is Expected to Cost $50 Every Man, Woman And Child will have Access to 16GB+ of Space on The Cloud from Facebook, Google And Microsoft Everyone on The Planet Has Access to Unlimited Computation, Memory and Bandwidth Language divide and literacy divide limits access to the internet-enabled solutions to many people in the world. Providing the right information in the right language and right medium enables scalability to all the people on the planet. Sustainability and Affordability are natural consequences of exponential reduction in size and cost of Information Technology. 10
Necessary Conditions for Success in Development of Intelligent Assistants Infrastructure Instrument Data Sources People Places and Things Computing Power: Processor, Memory and Bandwidth Multi Farm Cloud Computing Super Computers for Processing Zettabyte (1021 Bytes) Storage Farms Million Gigabit bandwidth Machine Learning and Analytics
Potential Economic Impact of AI 2.0 Emergence of Knowledge as a Service (KaaS) Industry Every person on the planet will be able to perform many daily habits more effectively using Guardian Angelss Daily habits (routines, activities) include a wide spectrum from routine tasks (such as banking and travel planning) to tasks too difficult for the user Over 80% of all human activity will done by Guardian Angelss by 2020 7 Billion People Market Vs 2 Billion Today Ultimately Humans Could be 10 Times More Efficient and Effective Global GDP is $100 Trillion Even 10% improvement will lead to $10T additional wealth creation
PPP Business Model A Public-Private Partnership Model of App Development No Single Company Can Afford To Make Significant Investments in Orphan Languages Partner with Local Governments Industry Provides Technology, Develops and Maintains Apps Local Government Pays for Data Collection and Deployment Initially Worth Considering for Populations of 20 million or More In The First Phase India: Bengali, Telugu, Marathi, Tamil, Gujarathi, Kannada, etc Africa: Arabic, Swahili, Berber, Hausa For Illiterate Populations It Will Become A Lifeline And Used Everyday Once working even literate members of the region will use it because of Convenience
In Conclusion 3B Semi-literate Populations of The World Are A Major Untapped Market For IT Companies Effective Use Of Speech Technology and Voice Computing Is The Only Option To Support Their Needs We Have All The Needed Technology And Tools Partnering With Local Governments May Be One Way To Reduce The Cost of App Development Sharing Sparse Language Data Among Leading Tech Companies May Be Desirable