Cloud Puts Foundations Under Dreams of VR
Feb 21, 2017
Virtual Reality is opening up new dimensions and horizons in gaming and in entertainment – literally and figuratively. There is abundant material surrounding the excitement of investment deals and surging revenues from client devices. However, much of the long-term potential and speed of growth of this market will be driven by the technology building blocks that are bringing this magic to life.
Before we dive into those aspects, a little bit of taxonomy is in order. We use the term 360 video to refer to a ring or sphere of video that a user can twist and turn within. Depending on resolution and quality, the degrees of freedom of a user within the video bubble may be restricted. Video may be either live or pre-filmed content, but either way this is a passive viewing experience of a fixed storyline. Fully immersive interactive VR is different in that the user is free to interact with the content and explore it with more degrees of freedom – this is more relevant to gaming, or gamified content. In either case, VR content could be 2D or stereoscopic; the content could be real, synthetic/animated or a blend. Both of those virtual reality experiences are in contrast to the data-driven overlay which characterizes augmented reality applications, but AR (and its evolution to mixed reality) is another story for another day.
It’s also important to clarify two different resolutions in the context of VR – the source frame size, and the rendered view size. The source frame captures the entire VR world at that point in time. The rendered view is the specific region that is rendered for a given user at a given point of time and is a subset of the source frame. The field of view (FOV) which is the region of the source frame that is “visible” to the user at a given point of time is often used interchangeably with the rendered view. The term foveated view refers to a view where the area that the user is looking at is rendered to maximum precision, while the periphery is at much lower resolution and quality. Typically you need the frame size to be at least 4X the rendered view size.
There is huge excitement in the market around the high-end experiences that head mounted devices (HMDs), together with specialized consoles or amped-up PCs, can deliver. It is our view however that massive success stories like Pokemon Go will not be created by services that only target this specialized groups of users. There are an estimated 1.6 billion mobile and PC gamers worldwide today; less than a tenth of these own current-generation gaming consoles. If delivering a full source frame to the end device for local rendering, the minimum bandwidth required for a high-definition VR experience is 25Mbps – essentially a VR service is streaming in 4K to enable HD rendering. In the US today, several states achieve peak broadband speeds of over 50Mbps according to 3Q2016 Akamai’s State of the Internet connectivity report, but not a single state crossed the 25Mbps mark for average connection speed, and nearly 20 states clocked in at less than 15Mbps average speed. Worldwide, the average connection speed hovers around 5Mbps. This should be tempered by the consideration that highly populous developing economies including India and Brazil achieve much of their high-speed connectivity via mobile networking which invariably involves data caps or high-cost unlimited data plans.
“Good” streaming quality, spanning from SD to 1080p HD, can be achieved in the range of 1-5Mbps today; the quality that can be packed into each bit continues to grow over time. This number dovetails well with the global connectivity landscape described above. It’s plain to see therefore that the reach and quality of VR experiences can take a dramatic leap forward if only the rendered frame, or a foveated view of the source frame, were to be delivered to the end user. Offloading local processing to the cloud is gaining strength as a best-of-both-worlds strategy to maintain quality of experience while maximizing reach.
Using cloud processing to handle the heavy lifting of point-of-view rendering makes high-quality experiences for 360 video and interactive gaming to be a significantly larger pool of end users, at considerably lower costs to both consumers and service providers. This accessibility, and its corresponding growth in revenue potential, makes all the difference between a showcase and a mainstream service. Producing live VR content can easily be 5X more expensive than producing UltraHD content, which in itself is costly enough that services and programmers have been finding it difficult to monetize and justify new investments.
There are considerations beyond connectivity and profitability that drive the need to minimize client-side processing load. Chief among these is battery life. We are several years away from widespread deployed footprint of hardware-accelerated VR rendering in smartphones and tablets. Processing VR frames in software imposes unfeasible levels of battery drain. The level of local processing resources including buffering capacity and memory for computation can also limit user quality of experience below acceptable levels. This disparity will continue over time – mobile devices will always trail plugged-in computing stations in terms of capacity and resolution, so at any point of time the state of the art across the two will always be differentiated.
Cloud processing of VR is of course easier said than done. Visual ergonomics is a make-or-break aspect of VR. In fully immersive VR experiences such as gaming, the slightest delays between head movement or event trigger and frame response can trigger disorientation or nausea and decimate user disengagement. 20ms is the target response time for such applications. Immersive 360-video immersive applications are somewhat more tolerant, with a target response time of up to 100ms. Ambient experiences, where users pan across 360-video on tablets or on their large screens, can tolerate even longer delays. Proprietary, high-speed protocols are being developed to help tackle this challenge. In the case of immersive gaming, it is very likely for now that close partnerships with edge-computing data centers and select ISPs will be required to effectively offload processing to the cloud. Over time, 5G is expected to add another set of tools and capabilities which could be exploited to tighten signaling round-trip times.
The other aspect of a cloud-based solution is cloud-based transcoding. Live on-demand transcoding is not a new concept; just-in-time transcoding has long been explored as an alternative to storage-intensive Cloud DVR architectures. We’ve spoken in the past about its value for 360-video systems as well. As transcoding vendors take maximal advantage of state-of-the-art processors, FPGAs and GPUs available in public and private data centers, the capacities and latencies which can be supported by JITT systems continues to grow. We forecast an expected revenue potential of USD 7.5 billion in entertainment revenue and over USD 20 billion in gaming revenue by 2021. Achieving this market potential will be dependent on how effectively VR services can exploit the cloud to maximize reach and quality, while minimizing cost of delivery.
Acknowledgements: Material from briefings with companies including b<>com, Ericsson, Fraunhofer, Haivision, Harmonic, Jaunt VR, Nanjing Yunyan Technology, NGCodec, Vantrix, Jaunt VR and others served as the source for developing this insight.
Please add your bio info through your member profile page, or through your dashboard.