Adaptive 360 VR Video Streaming based on MPEG-DASH SRD *† † Mohammad Hosseini , Viswanathan Swaminathan * † University of Illinois at Urbana-Champaign (UIUC) Adobe Research, San Jose, USA Email: [email protected], [email protected] Abstract—Wedemonstrateanadaptivebandwidth-efficient360 VR video streaming system based on MPEG-DASH SRD. We extend MPEG-DASH SRD to the 3D space of 360 VR videos, 7 and showcase a dynamic view-aware adaptation technique to 1 tackle the high bandwidth demands of streaming 360 VR videos 0 to wireless VR headsets. We spatially partition the underlying 2 3Dmeshintomultiple3Dsub-meshes,andconstructanefficient 3D geometry mesh called hexaface sphere to optimally represent n tiled360VRvideosinthe3Dspace.Wethenspatiallydividethe a 360 videos into multiple tiles while encoding and packaging, use J MPEG-DASHSRDtodescribethespatialrelationshipoftilesin 3 the3Dspace,andprioritizethetilesintheFieldofView(FoV)for 2 view-aware adaptation. Our initial evaluation results show that we can save up to 72% of the required bandwidth on 360 VR Fig.1. Visualoverviewofageneratedhexafacesphere. video streaming with minor negative quality impacts compared ] M to the baseline scenario when no adaptations is applied. M I. INTRODUCTION 360 VR videos are immersive spherical videos, mapped . s into a 3D geometry where the user can look around during c playback using a VR head-mounted display (HMD). Unfor- [ tunately 360 VR videos are extremely bandwidth intensive 1 especially with the 4K video resolution being widely viewed v as a functional minimum resolution for current HMDs, and 9 8K or higher desired. Therefore, a major challenge is how to 0 efficiently transmit these bulky 360 VR videos to bandwidth- 5 constrained wireless VR HMDs at acceptable quality levels 6 given their high bitrate requirements. 0 Fig.2. Varioustilesofanexample360video(Karate)accordingtothesix Inthiswork,wearemotivatedbythe360VRvideoapplica- . 3Dmeshesofourhexafacesphere3Dgeometry. 1 tionswith8Kand12Kresolutionsandthedatarateissuesthat 0 such rich multimedia system have. We extend MPEG-DASH theunderlying3Dsphericalmeshwhichisspatiallyrelatedto 7 SRD towards the 3D VR environment, and showcase how to thecorrespondingportionoftherawcontent.Forexample,the 1 utilizeasemanticlinkbetweentheusers’viewport,spatialpar- Samsung Gear VR HMD offers a 96-degree FoV, meaning it v: titioning, and stream prioritization using a divide and conquer can only cover a quarter of a whole 360-degree-wide content i approach. Once 360 videos are captured, we spatially divide horizontally. X them into multiple video tiles while encoding and packaging, To decrease the bandwidth requirements of 360 VR videos, r which are efficiently textured on the underlying 3D geometry we use a prioritized view-aware technique, and stream tiles a mesh that we construct called hexaface sphere. We then use inside the viewport at highest resolution, at or near the native MPEG-DASH SRD to describe the spatial relationship of resolutionoftheHMD.Toachievethat,ourapproachconsists tiles in the 3D space, and develop a prioritized view-aware of two parts. First, we spatially partition the 360 video approach to conquer the intense bandwidth requirements. into multiple tiles. We extend the features of MPEG-DASH We showcase our demo using a real-platform wireless HMD SRD towards the 3D space, and define a reference space and multiple 360 VR video sequences, and show that our for each video tile, corresponding to the rectangular region adaptations significantly reduces the total bandwidth required encompassing the entire raw 360-degree video. Second, we to deliver a high quality immersive experience. Our approach partition the underlying 3D geometry into multiple segments, can further increase the overall 360 VR video quality at a each representing a sub-mesh of the original 3D mesh with given bandwidth, virtually allowing 8K and higher VR video a unique identifier, in a two-step process. In the first step, resolutions. using the concepts of slices and stacks used in spherical 3D reconstruction, we split the sphere programatically into II. METHODOLOGY 3 major parts, including the top cap, the middle body, and While watching a 360 degree video on a VR HMD, a the bottom cap. The middle body covers 2βo degrees, given user views only a small portion of the 360 degrees. That the vertical FoV settings of the VR HMD. In the second step, small portion is equivalent to a specific confined region on we further split the middle body into four sub meshes, each BANDWIDTH USAGE Proposed (Rep4) Proposed (Rep2) Tiled (Rep1) 1 1 1 1 1 OITAR EV 147.0 926.0 027.0 447.0 354 657.0 ITALER 772.0 872.0 623.0 103.0 .0 WALDO PLANE STAR WARS RACING CAR KARATE Fig.3. Acomparisonofbandwidthsavingsofstreamingdifferent360VR videos,usingouradaptations,andtiled-streamingwithnoadaptation. coveringαo (=90degreesinthiscase)oftheentire360-degree wide screen given the horizontal FoV settings of the HMD. Withthisprocess,ourprojectionwillresultintoacombination of six 3D sub-spherical meshes that we call a hexaface sphere 3Dmesh.Finally, amappingmechanismis definedforspatial positioning of the tiles on the 3D space, so that each tile be textured on its corresponding 3D mesh segment. Figure 1 illustrates our hexaface sphere 3D geometry. Figure 2 shows howourtilingprocessisappliedagainstanexample360video frame, according to our hexaface sphere geometry. To enable view awareness, we follow three steps to create valid confines of unit quaternions specifically set for each of the hexaface sphere 3D mesh segments. We first convert Fig. 4. Visual comparison of a specific frame within a sample 360 VR EuleranglestoaunitquaternionrepresentationforVRdevice video with the peripheral tiles having lowest resolution. (Top) REP4 with orientation tracking, and calculate an array corresponding resolutionof240x480.(Bottom)REP2 withresolutionof720x1440. to a normalized direction vector from our quaternion. We present, and all tiles are streamed with highest representation, then combine the values together to compute the confines REP ). Figure 3 demonstrates results for only a small subset of 3D segment-specific quaternion representations inside the 1 of our experiments on all of our benchmarks, with ratios hexaface sphere. With the confines of each 3D mesh segment normalized. Our results show that our adaptations can signif- being defined, we then identify which 3D segments and icantly save bandwidth usage for up to 72% compared to the the corresponding tiles intersect with a user’s viewport and baseline case where our adaptation approach is not employed. implement our viewport tracking at every frame. With view- Figure 4 shows two sample screenshots of the experiments port tracking, we then implement our prioritized view-aware on Waldo. While the highest representation possible (REP - adaptation, and dynamically deliver higher bitrate content to 1 resolution of 960x1920) is delivered to the main front tile, thetileswithintheuser’sFoV,andassignlowerqualitycontent in Figure 4 (Top) lowest representation is delivered to the to the area outside the user’s immediate FoV. peripheral tile on the right (REP - resolution of 240x480), 4 III. EVALUATION whereas in Figure 4 (Bottom), the peripheral tile on the right has second highest representation assigned to it (REP - reso- To evaluate our work, we used the Samsung Gear VR 2 lutionof720x1440).Theredconfinesspecifytheapproximate HMD mounted with the Samsung Galaxy S7 smartphone as area for the peripheral tile with the lower quality. our target VR platform. We used the Oculus Mobile SDK Inourdemo,weshowthateventhelowestrepresentationon 1.0.3 joint with the Android SDK API 24 for development the peripheral tiles not within immediate viewports results in of a 360 VR video streaming application prototype based minorvisualchangesfromauser’sperspective,sometimesnot on MPEG-DASH SRD. We used our developed prototype even perceptible, while still maintaining the original quality to apply adaptations and run experiments. Our VR platform forthemainviewporttoensureasatisfactoryuserexperience. providesatotalresolutionof2560x1440(1280x1440pereye), Overall,consideringthesignificantbandwidthsavingachieved with maximum frame rate of 60 FPS, and a horizontal FoV usingouradaptations,itisreasonabletobelievethatmany360 of 96 degrees. We set the vertical FoV of our 360 VR video VRvideouserswouldacceptsuchminorvisualchangesgiven prototype to 90 degrees. We prepared 5 different 360 degree their limited bandwidth. More technical details are presented videos with various resolutions publicly available on Youtube in our other publication [1]. as test sequences for the purpose of applying our adaptations. We encoded all video tiles into four different representations REFERENCES (REP to REP with the highest to lowest resolutions), and 1 4 [1] M. Hosseini and V. Swaminathan, “Adaptive 360 VR video streaming: used MPEG-DASH SRD to describe our tiling. Divide and conquer!” in Proceedings of the 2016 IEEE International We compared the relative bandwidth usage when using our SymposiumonMultimedia(ISM’16),SanJose,USA,2016. adaptations against the baseline case where no adaptation is applied (the 360 VR video is tiled; no view awareness is