As you may found out, I have been working for a major AR glasses company for 3 years now. I am using the AR goggles on my daily-basis. Throughout years of development experience with AR, I want to think out of the box and share some deeper thoughts of my understanding of the technology and its future.
- AR, AI, and Self-driving car
- AR, VR, and Smartphone AR
- What is good Mixed Reality Interaction
- AR Realistic Rendering (a.k.a. Inverse Rendering)
- Other Wearable Sensor for AR
- Rendering Engine and AR OS
- Cloud Streaming
- AR UI
- AR Killer App
- AR Gaming
- How can AR Stands Out Now
Although I used to see people wearing the unit and walk around the office every day, I am still very shocked to see this photo, which was taken during a Spine surgery in Japan Photo Credit. It makes me feel like just wake up from an alien laboratory. The concept of AR has been widely spread and it is very natural and straightforward for the public to understand what the technology can provide. However, it still hasn’t become mainstream in the consumer market. So far, there isn’t a strong reason for common people to acquire a pair of AR glasses when everyone has a smartphone in the pocket. Someone said that “The eBook also hasn’t replaced the printing industry yet”. That is true. Some aspect of our life may never be fully digitalized. As an engineer in this industry, we need to face the truth that we are getting into the “Trough of Disillusionment” phase, according to the famous Hype cycle. In my opinion, if our industry can re-modeling the AR product from some different perspectives, there might be a chance for AR products to compete with phones or tablets in the long run.
AR, AI, and Self-driving car
I, as a common person in society, don’t feel like the technology aspect of my life has any significant changes in this decade. All the HW or SW still looks similar but just more powerful. The introduction of the smartphone in the last decade is a significant pivot point that changes everybody’s life, but we haven’t and probably won’t see such a life improvement event in this ending decade. Among all technology sections, these 3 major directions are most anticipated. However, within the 3 directions, AR is the only one that has already provided a completed solution. AI and self-driving technology are still not very reliable even in the long run. The more decision made by machines, the more risk and more untrustworthy. I just think AI and self-driving car, such a decision-based system has a much longer road to go, in comparison with AR. This is the reason why I still strongly believe AR will be the one that triggers the next revolutionary pivot point.
AR, VR, and Smartphone AR
VR and smartphone AR also has a look-through camera system to simulate the AR experience. But the HW determines both platforms are mainly for some entertainment use cases. VR will trap the user in a certain space and smartphone AR is limited by the camera architecture on the phone. When using the front camera, it is face driven AR content. When using the back camera, it is a portal type experience. In comparison, the AR goggles have much more flexible indoor/outdoor use cases. Needless to say, if we can figure out rendering true black through optics, then AR glasses can potentially become VR goggles. So the ultimate form of the next generation AR platform will be the AR glasses platform for sure.
What is good Mixed Reality Interaction
Nowadays, AR glasses companies are advertising their brand under this concept of Mixed Reality. However, besides plane finding, world occlusion, spatial audio, the products are not very aligned with this new term. The virtual content still stays in the virtual space and real-world content stays in the real world. The interaction between these 2 realms is very minimal. The boundary is never truly broken. This is the main reason that I think why customers are forgetting about AR after trying it. It just simply not “wow” them enough. In between the boundary, there are 2 directions of data flowing: virtual control reality and reality fuse into virtual.
Virtual Control Reality: There isn’t too much we can do if the input signal is from virtual and the receiver is the real-world object. Actually, the IoT framework is a good point to kick in. We can always use smart glasses to control real-world electronic devices. Like AR assisted robotic control, or even use AR glasses as a sensor to collect data of the user’s body or environment and then use the data to manipulate IoT devices. Like capture user gesture and transform the position to a traditional monitor’s local space, then we have a touchscreen monitor!
Reality Fuse Into Virtual: There is much more we can do to sample data from reality and reflect on virtual space. We can do object recognition, 3D object Reconstruction, AR copy&paste, etc. However, this kind of task is usually required to have Computer Vision and Deep Learning to help. With this concept, we can create more content in the virtual space. Any user can become a static content creator. And user can share unique environment experiences with friends. Or we can even have a location-based service to share content publicly.
BTW talking about privacy in content sharing, I encourage you to watch this talk by Marc Pollefeys on how to use Epipolar lines to hide and preserve 3D geometry in the cloud storage scenario: 3DGV Talk: Marc Pollefeys - 3D geometric vision - YouTube
AR Realistic Rendering (a.k.a. Inverse Rendering)
Another big factor that affects user experience in AR is the rendering quality. To get a truly immersive experience, the device should render the virtual content more realistic, so that the user cannot really tell which is which. In the current generation of AR, the rendered virtual content is very aliasing and plain. It is impossible to achieve the immersive experience in those fancy demo videos. There are some effort been put into this direction, like Physically-based Rendering optimized for the mobile device, Deep Learning based neural rendering, Ray tracing with DLSS. I am very anticipated by Ray Tracing to happen on AR devices. However, to have such high-quality global illumination rendering results, we need to have enough information on the real-world environment. Unlink PBR in gaming, which all content including env is already in virtual, AR rendering requires to fuse the virtual into the real world. So inevitably, we need to do things like light source estimation, obtaining the environment map for IBL, obtaining textured world mesh, using high-quality virtual object materials, using high precision physics simulations, etc. For Ray Tracing we need the light ray to be bouncing correctly between the world mesh and the virtual object mesh. This is an new research area and I have huge interests in this. I will develop myself and put more focus on this direction in the coming decade.
On the other hand, if we look at the VR or smartphone-based camera look-through system, they can do some PBR too, with tricks like applying image processing over the whole camera frame. So that the rendered scene is blurred together and filtered and the boundary between virtual content rendering and the background camera frame is less obvious. GAN based Real-time video synthesis also can better performance in this scenario. Some researcher has already worked on this topic under Neural Rendering.
Other Wearable Sensor for AR
Just with AR glasses as the only source of input for machine perception may not be enough. In the future, people will put on more electronics on the body. So we can utilize all of them to collect data. For example, in smartphone-based AR, if we want to render the face AR filter more realistic, we need to estimate the light source in the user environment. Besides use DL to process the camera frame, we can use the EarBud as our light sensor. So we basically just want to understand where is the overall light and color over the hemisphere which center is the phone position. The front camera and back camera can collect light info of front and back, to collect data of left and right, we can install light sensors on the EarBud. Then we can estimate the light radiance to this hemisphere from all directions. Hence we can have a better user environment light estimation to help us render the realistic face AR filter. Also, the smartwatch can be used to detect user gestures. This can improve the most commonly used imaged based gesture detection when hands are out of sight. Besides the wearable sensor, we can even have an external sensor like a portable Spinning Lidar in the room, so we can use it to capture better world mesh and dynamic changes.
Some of my two cents: with more sensors on humans, instead of electronics become part of the human, it more feels like humans are part of the electronics. All we are doing is just walking around and help machines collect real-world signals into digital so that machines can better understand the world, then further to have some intelligence. And human needs to baby-feeding those electronics by charging them battery power. More and more service will target the electronics on the human instead of human themselves. The raw output of the human body does not matter anymore at the application level, because everything of our body is captured and converted to a digital signal in real-time, It feels like a human is serving the machines and make then stronger, while weak ourselves. Hope this will not happen.
Rendering Engine and AR OS
At the software side, besides the SLAM computer vision algorithms, another big section is the rendering engine and Operating system. Some companies are developing their own engine, like mine :), while the majority are using the existing solutions like Unity or Unreal. I can see that Unity is pushing more strategic focus on the AR market. And as an engine that already gathered a huge community in the gaming industry, it is very convenient for those traditional game developers to switch to work on AR. AR glasses product requires both a costive HW manufacturing line and complicated SW support. I think with more and more hardware companies start working on the AR glasses industry, using existing rendering engine solution like Unity is very attractive, so that each hardware company doesn’t need to put too much effort on the software side. And their products can also be beneficial from the engine’s existing ecosystem. This is a double win.
Unity can even integrate the SLAM feature and creating their own AR Operating System layer. I can see the migration trend of the operating system from Linux and windows -> Android and iOS -> Customized AR OS. Those gaming engine company might eventually dominate our next computing platform.
Nobody likes being tethered, period.
If Google Stadia can streaming 4K/60FPS on the browser, then wirelessly low latency cloud streaming is definitely feasible on AR glasses. The glasses itself’s computing unit just needs to take care of the head pose prediction. This can leverage the heat and weight of the device mount on the user’s head. The cost of the unit will also drop a lot to attract more consumers. Also, this can even make Ray Tracing possible on the AR device. I cannot see any disadvantages of this cloud streaming trending. Also with the subscription typed price modeling, this technology can create a very big stake in the consumer market.
The cloud renderer feature can also facilitate the new form of YouTube. I can see that when the AR era arrives, YouTube will not just be limited to streaming 2D videos but the 3D experience. All the content will be 3D mesh/point cloud instead of video frames. When users viewing a 3D video, their real-world environment will be overlaid by the 3D experience. The content creator can also decorate their own channel as a 3D virtual space and can be rendered at the user endpoint env. This will also facilitate other industries like cheap and portable consumer spinning Lidar or multi-drone based environment scanner and motion capture. Also, we will see some 3D mesh video editing tool similar to the existing tool like MeshLab. I can’t wait to enter the future.
Finally, let’s touch on the content. A major issue I see in most AR content, especially the UI part, they are very 2D-ish. The UX still feels like we are interacting with computer monitors. I think AR interaction designer needs to put more bold ideas into AR. The flat 2D UI should be outdated, we can have a more 3D-ish feeling in the interaction design. We can have more physics simulation to make the virtual content/UI behave like a real-world object.
AR Killer App
Is there even existing an AR killer app?
To understand what are the killer apps, we need to first understand how are the user using the AR glasses. All current AR glasses are uncomfortable to wear, don’t working outdoor, and the heat can make users sweat very soon. These determine that the user won’t wear those glasses for a long time. With such user behavior assumption, then we can confidently infer that most users will only use AR glasses at night at home for an average of 2 hours. It is almost certain that AR glasses are an entertainment-focused device.
This also means that functional applications like banking and email apps are left for smartphones. AR glasses are not a very strong functional device. It has a certain learning curve and sometimes users might feel unnatural to interact with it. Flat-screen has the advantages of higher efficiency of conveying information, and faster user interaction. I don’t see it as necessary to compete with the smartphone in such information-driven apps’ domain. Those functional apps are natively to be flat.
Back to the AR entertainment, I’d like to classify the entertainment into 2 categories: no user input(passive), require user input(active). Passive entertainment can be music videos, movies, concerts, and sports. Active entertainment can be gaming and interactive storytelling. These are the all very good entry point. Especially music videos, are short, and can easily ring the bell with the fans. And record companies are willing to pay extra money for high-fidelity 3D reconstruction MVs. Needless to say, a lot of modern MVs or movies are already heavily using rendering technologies, it is not too hard to support immersive capture. When such 3D reconstruction devices can be achieved by using commodity devices, like using trinity iPhones, then I can see AR YouTube will be a very attractive app.
Gaming is also a promising market. But what is a good AR game? This is still a very opening question. I just throw some questions that I don’t have answers to.
How to have a real-time stunning graphic? What is a suitable AR games’ viewport, flat screen, or virtual camera? Like Doom is the first to introduce the first-person camera, will we have an AR-friendly camera mode for the games? How can we make it easy for the user to perceive 3D content? Should we make the user physically moving? If so how to sync the user moves with the game? Should we make the game content dependent on the user environment? Should the game fit in user env? Should we use hands or controllers as input? If using hand how to give immediate feedback? Finally what type of story is more attractive to portrait on AR?
There are some games I think are suitable for AR glasses: Simulation, RTS, 45 deg camera games, fighting, side-scrolling games. Also, I am dreaming of playing Clash Royale on my table through AR glasses. These are generally not AAA action games. Action games might need to find a way to fit the AR glasses.
Everyone is talking about it, building for it, dreaming with it. But what exactly is Metaverse?
It is an Operating System of the real world. It can run on smartphones, computers, AR glasses. People will interact with the OS layer, which will result in execution in the real world. It will be location-based. Each entity will implement the “driver” to provide services. Metaverse will also deeply connected with IoT devices. Different companies can build different Metaverse, users can switch to whatever OS they like. I can see that we will see Metaverses that are not interoperable. I won’t categorize Metaverse as Internet, because it is more than just connection, it is a lifestyle.
It also has a dark side. It is a digital fog that keeps telling us how plain is our real world. The more fantastic the Metaverse, the more disappointed with the reality. People might stop do the actual decoration to the environment, and stop making stuff that other than food and basic appliances. It is an oasis to escape to. I just found everything currently on my table can be replaced by AR glasses, only except for my cat.
How can AR Stands Out Now
From the market penetration perspective, AR glasses now is very weak in comparison with smartphones. Although some claim that when Apple comes out their AR glasses devices, the consumer market will be much better. This might be true but it isn’t really helpful or fair for the rest of the HW manufacturers. And I think it will never replace the phone. I just cannot imagine that wake up in the morning and while half waked, put on a smart glasses and check my email. The smartphone is more fit in some daily basis use cases. The phone definitely will still be the major device for the consumer market in the next decade. However, the unique experience is the advantage of AR. Instead of replacing the phone, AR glass can try to replace TV, monitor, or even compete with game consoles. Also in my mind, AR should never be a stand-alone platform, it should be an assistant tool that works with all other IoT devices to improve our life quality. I am all in for that day to come.
AR advertisement, AR collaboration, Location-based AR experience.