ISMOT Innovation Conference: Augmented Reality for Literacy

UI/UX Innovation:

Augmented Reality for Literacy

By Robin Rowe and Gabrielle Pantera
ISMOT Innovation Conference 2019
Presented at Tsinghua University, Beijing, China

Abstract: Advances in Augmented Reality (AR) and User Interface (UI) technology will trigger sweeping changes in how information workers use technology and how engineers create innovation. The keyboard is about to be replaced with a new user interface paradigm that will transform work and society. AR UI technology will enable people without training to be productive information workers in a post-keyboard world.

According to UNESCO, there are 750 million people who can’t read basic signage, medication labels, or a job application. AR will read those words to them, and in doing so, the illiterate will learn to read for themselves. Literacy will no longer be a prerequisite to employment. Reading and writing become skills that can be learned on the job.

Keywords: Augmented Reality (AR), China, Computer Vision, Gaze, Gesture, Keyboard, Innovation, Interactive Voice Response (IVR), Language Translation, Literacy, Mouse, Optical Character Recognition (OCR), Refugees, Speech Recognition (SR), Sweden, Technology, User Interface (UI)

1 Introduction

Long before the invention of computer keyboards, there was the telegraph key and Morse code.

Telegraph Key

Fig. 1 Telegraph Key

Training Morse operators was rigorous and expensive. To eliminate the requirement of learning to key in Morse code, a piano-like telegraph keyboard was developed. No knowledge of dots and dashes needed. Pressing each key would transmit one letter of the alphabet. That device evolved into the typewriter-like keyboard used in modern computing.

IBM Keyboard

Fig. 2 IBM PC keyboard

The transition from Morse code to Qwerty keyboards triggered a boom in innovation that we are still experiencing today. The next revolution is coming. The typewriter keyboard is to be replaced by the new interface of Augmented Reality (AR) or smart glasses.

Lenovo ThinkReality

Fig. 3 Lenovo ThinkReality enterprise AR Glasses

2 AR Applications

2.1 Gaming AR

Lenovo Jedi

Fig. 4 Lenovo Star Wars Jedi gamer AR Glasses

Gaming applications of AR involve inserting virtual players in the real world. In Jedi Challenges, players wearing AR headsets play against Star Wars villains or against each other with lightsabers that display virtual blades.

2.2 Enterprise AR

Enterprise applications of AR include aerospace engineering, healthcare, and technology maintenance and repair. Enterprise AR walks users through checklists and may call upon human Remote Experts (RE) to help via a video call. The headset user sees the RE as a Star Wars Princess Leia-like hologram, while the RE sees the point of view of the user captured by the AR glasses front-facing camera.

Fig. 5 Lenovo Research: Co-working Engineering Session with holographic Remote Expert

3 AR Technology

3.1 AR Is not VR

AR is related, but fundamentally different from Virtual Reality (VR), a technology of which more people are familiar.

VR replaces reality. A well-known VR implementation is Google Cardboard. Literally a piece of cardboard, it has lenses and mounts a mobile phone screen in front of the user’s eyes. The phone screen replaces our view of the world. In VR we have no situational awareness of the real world around us. We can only see the virtual world displayed before our eyes. Through headphones, we only hear the virtual world. VR cuts us off from reality.

Google Cardboard

Fig. 6 Google Cardboard VR

3.2 AR Glasses, a New Way of Looking at the World

AR augments reality instead of replacing it. The user looks through a pair of glasses seeing the real world. Tiny projectors in the glasses shine information on the back of the lenses, making text or 3D objects appear to float in midair.

What’s perhaps most significant is AR glasses have no keyboard. Users don’t need to learn how to type first in order to use AR glasses.

So if not by typing, how does AR get user input?

4 AR User Input

4.1 Gesture UI with Hand Tracking

Many AR glasses feature a front-facing camera that sees what the user sees. If the user raises his or her hands, that camera can see their hands, and with computer vision can track the hands. Like a traffic cop or orchestra conductor, the user signals what to do through hand signals, by pointing.

The user may reach for and grasp virtual objects in the AR view. The user may push virtual buttons. Gesture can be very useful for controlling UI elements and manipulating 3D virtual objects in a scene.

4.2 Gaze UI with Eye Tracking

Some AR glasses, such as Microsoft HoloLens 2, have tiny cameras pointing inward at the user’s eyes. A user may gaze at an item displayed in a virtual menu, which then pops up a tiny clock countdown timer. If the user doesn’t look away within two seconds, before the timer reaches zero, the menu item is activated.

Gaze and gesture replace the mouse, but are too slow for typing. What replaces the keyboard? Speech Recognition (SR), which comes in two flavors: Interactive Voice Response (IVR) and continuous dictation.

4.3 Speech Recognition UI with IVR

We may be familiar with IVR from using banking or other automated customer service telephone support. IVR is context sensitive. The user navigates through a hierarchy of actions. Only a few options are available at each step. These systems typically prompt the user what to say from a short list of choices.

IVR can be more reliable in recognizing spoken phrases because it only has to choose from among a few choices. However, many of us have experienced the frustration of saying, “Customer Service,” over and over again, with that failing to be recognized. IVR often gets it wrong.

Newer IVR technology, such as the automated phone reservation system at Priceline, maintains its own context, acts like a human conversation. However, this Artificial Intelligence (AI)) chatbot is using machine learning (ML) pattern matching to recognize phrases learned from a billion recorded Priceline customer conversations. If you ask it a question unrelated to booking a flight, it is lost.

4.4 Speech Recognition UI with Continuous Dictation

Continuous dictation is much harder to get right than IVR. The dictation recognizer is choosing from every word in the dictionary, not merely from a few phrases. AR may have both IVR and dictation, using the former to control menu choices.

4.5 Speech Recognition, the Wake-word

Not everything a person says is intended as a menu command or as dictation to be converted into text. Sometimes we’re talking to other people, not our AR device. To avoid confusion, it’s common to have a wake-word, such as, “Siri,” or “Alexa,” that tells the computer when to start doing speech recognition.

AR may be switching between listening for IVR or waiting for a wake-word or dictation.

4.6 AR Text to Speech

AR isn’t only vision. AR glasses typically have speakers, can output speech and music to the user. Using Optical Character Recognition (OCR), AR glasses can read signs and documents to the user, including translating between languages. A sign written in English may be spoken to the AR user in Chinese.

Bose Frames

Fig. 7 Bose Frames Rondo Audio Sunglasses

 

AR headsets may be audio only. Bose Frames Rondo have no projectors, but do have speakers in the sunglasses’ frames. Bose Frames let users hear and interact with the world while listening to music.

4.6 Visual Programming

Anticipating a future with AR, in which we have no keyboards, a logical question is, how will programmers write code? Probably not by using dictation. Although chess masters can dictate moves in chess notation without touching the board, that is perhaps too much to expect of a typical computer programmer writing code in C++.

UE4 Blueprint

Fig. 8 UE4 Blueprints Visual Programming UI

Unreal Engine 4 (UE4) is a popular gaming platform. Future support for it on Microsoft HoloLens 2 has been announced (expected to ship in fall 2019). UE4 Blueprints is a visual programming paradigm in which a set of logic components are strung together, the outputs connected to inputs with “noodles.” Blueprints are used to create software without coding. In conjunction with AR, this type of visual programming is feasible using gesture and gaze.

5 Knowledge Workers without Literacy

5.1 Case in Point, Swedish Unemployment

Sweden collects some of the most comprehensive demographic data of any country. In Sweden, it is not necessary to statistically sample data to answer a demographics question. Each individual is given a number and tracked. Sweden is currently experiencing an unemployment crisis.

In Sweden, with a population of ten million, there are 600,000 unemployed. Of those, 60% lack a high school education. Half of the Swedish unemployed were not born in Europe, may be refugees who escaped from countries that lack basic educational institutions.1 These are otherwise intelligent people who come from areas where illiteracy is widespread.
To be a farmer, it is not necessary to read and write. The question is, how do we help make them productive in a high-tech environment without a decade of education, without having literacy first?

The technological revolution of Augmented Reality (AR) is coming. Because AR has no keyboard, it does not require the rigorous training of learning how to type. Or even of being able to read and write. AR can read and write, can provide literacy.

5.2 Case in Point, Chinese Language Dialects

About 1.2 billion people, 16% of the world’s population, speak Chinese as their first language. Chinese is not one language. Mandarin speakers number 960 million, Wu 80 million, Min 70 million, Yue (Cantonese) 60 million and so forth..2 Most of these Chinese dialects are as different from each other as are English, Spanish, Italian and French. And, just as mutually unintelligible.

Chinese is particularly challenging to type on Latin keyboards. Chinese has over 3,000 separate symbols. AR can do more than translate. AR speech recognition can be a better way to read and write Chinese.

6 Conclusion

AR is bringing sweeping changes in the lives of knowledge workers, both in how they work and in the skills they need. Using AR, it is not necessary to learn to read or write before becoming a high-tech worker. In the U.S., there are 2.5 million chronically unfilled jobs requiring technological training.3 These can be filled.

Is literacy becoming obsolete? Will reading and writing go the way of Morse code? No. Literacy will become a skill we pick up along the way. Instead of being a prerequisite, an obstacle to overcome to become a knowledge worker, literacy will be a skill picked up as a knowledge worker works. AR will teach users literacy.

With new AR technology, people will become productive immediately in the information society.

References

[1] Swedish government statistics
[2] Wikipedia entry for Chinese Language
[3] Forbes, 2018/9/19, Most Laborers Are Knowledge Workers Now

About the Authors

  • Robin Rowe has implemented innovative technology for Lenovo, Disney, Mattel, AT&T, GoPro, DreamWorks Animation and NBC-TV. Has taught computer science at the Naval Postgraduate School and at the University of Washington. Chairman ANSI/ISO 56007 innovation idea management standard committee.
  • Gabrielle Pantera has advanced product development at Disney, Fox, MGM, and The CW.

Note: All trademarks are property of their respective owners.