Mid Sweden University The Department of Information Technology and Media (ITM) Author: Håkan Andersson E-mail address: [email protected] Study programme: B. Sc. in electronics engineering, 180 ECTS Examiner: Mårten Sjöström, Mid Sweden University, [email protected] Tutors: Roger Olsson, Mid Sweden University, [email protected] Scope: 19752 words inclusive of appendices Date: 2010-11-25 B.Sc. Thesis within Electrical Engineering C, 15 ECTS 3D Video Playback A modular cross-platform GPU-based approach for flexible multi-view 3D video rendering Håkan Andersson 3D Video Playback ‐ A modular cross‐platform GPU‐based approach for flexible multi‐view 3D video rendering Abstract Håkan Andersson 2010‐11‐25 Abstract The evolution of depth‐perception visualization technologies, emerging format standardization work and research within the field of multi‐view 3D video and imagery addresses the need for flexible 3D video visualization. The wide variety of available 3D‐display types and visualization techniques for multi‐view video, as well as the high throughput requirements for high definition video, addresses the need for a real‐time 3D video playback solution that takes advantage of hardware accelerated graphics, while providing a high degree of flexibility through format configuration and cross‐platform interoperability. A modular component based software solution based on FFmpeg for video demultiplexing and video decoding is proposed, using OpenGL and GLUT for hardware accelerated graphics and POSIX threads for increased CPU utilization. The solution has been verified to have sufficient throughput in order to display 1080p video at the native video frame rate on the experimental system, which is considered as a standard high‐end desktop PC only using commercial hardware. In order to evaluate the performance of the proposed solution a number of throughput evaluation metrics have been introduced measuring average frame rate as a function of: video bit rate, video resolution and number of views. The results obtained have indicated that the GPU constitutes the primary bottleneck in a multi‐view lenticular rendering system and that multi‐view rendering performance is degraded as the number of views is increased. This is a result of the current GPU square matrix texture cache architectures, resulting in texture lookup access times according to random memory access patterns when the number of views is high. The proposed solution has been identified in order to provide low CPU efficiency, i.e. low CPU hardware utilization and it is recommended to increase performance by investigating the gains of scalable multithreading techniques. It is also recommended to investigate the gains of introducing video frame buffering in video memory or to move more calculations to the CPU in order to increase GPU performance. Keywords: 3D Video Player, Multi‐view Video, Lenticular Rendering, Auto‐stereoscopy, 3D Visualization, FFmpeg, GPU, OpenGL, C. ii 3D Video Playback ‐ A modular cross‐platform GPU‐based approach for flexible multi‐view 3D video rendering Acknowledgements Håkan Andersson 2010‐11‐25 Acknowledgements I would like to thank my supervisor Roger Olsson, Ph. D in telecommunications, Mid Sweden University, Sundsvall, Sweden for all his support throughout this work. Reviews of my report and discussions regarding technical problems and design issues that appeared during the progress of this project were very valuable to me and helped me achieve the results presented within this thesis. iii 3D Video Playback ‐ A modular cross‐platform GPU‐based approach for flexible multi‐view 3D video rendering Table of Contents Håkan Andersson 2010‐11‐25 Table of Contents Abstract.............................................................................................................ii Acknowledgements.......................................................................................iii Terminology...................................................................................................vii 1 Introduction............................................................................................1 1.1 Background and problem motivation......................................1 1.2 Overall aim...................................................................................2 1.3 Scope.............................................................................................3 1.4 Concrete and verifiable goals....................................................4 1.5 Outline..........................................................................................5 1.6 Contributions...............................................................................6 2 Three dimensional visualization.......................................................8 2.1 Human depth perception...........................................................8 2.2 Stereoscopy.................................................................................10 2.2.1 Positive parallax.............................................................10 2.2.2 Negative parallax...........................................................11 2.2.3 Rendering stereo pairs...................................................12 2.3 Auto‐stereoscopy.......................................................................12 2.3.1 Barrier strip displays......................................................13 2.3.2 Lenticular displays.........................................................14 3 Three dimensional video...................................................................16 3.1 3D Video Formats......................................................................16 3.1.1 Pre‐processed raw video...............................................16 3.1.2 Multi‐view video............................................................17 3.1.3 Video‐plus‐depth............................................................18 3.2 3D Video Players.......................................................................19 3.2.1 Stereoscopic player.........................................................19 3.2.2 Visumotion 3D Movie Center.......................................20 3.2.3 Spatial View SVI Power Player....................................20 3.3 Video application development frameworks........................21 3.3.1 Apple Quicktime............................................................21 3.3.2 Microsoft DirectShow....................................................22 3.3.3 FFMPEG...........................................................................23 iv 3D Video Playback ‐ A modular cross‐platform GPU‐based approach for flexible multi‐view 3D video rendering Table of Contents Håkan Andersson 2010‐11‐25 3.3.4 Components....................................................................23 3.3.5 FFDShow.........................................................................26 3.4 Hardware Requirements..........................................................26 4 Hardware‐accelerated graphics........................................................29 4.1 Video card overview.................................................................29 4.1.1 Vertex processor.............................................................30 4.1.2 Fragment processor........................................................31 4.2 OpenGL.......................................................................................32 5 Methodology........................................................................................33 5.1 Experimental methodology to evaluate performance.........33 5.2 Evaluation of cross‐platform interoperability.......................36 5.3 Verifying sub‐pixel spatial multiplexing...............................36 5.4 Hardware and software resources..........................................36 6 Design....................................................................................................38 6.1 Alternative design solutions....................................................38 6.2 Design considerations and system design overview...........40 6.2.1 Design considerations for high performance.............42 6.2.2 Design for flexibility and platform independence....43 6.3 3D video player component design........................................43 6.3.1 Architectural overview..................................................43 6.3.2 Demultiplexing...............................................................45 6.3.3 Video frame decoding....................................................45 6.3.4 Color conversion.............................................................46 6.3.5 Synchronization, filtering and rendering....................47 6.4 3D video filter pipeline.............................................................49 6.4.1 Defining input and output............................................49 6.4.2 Filter input parameters..................................................50 6.4.3 Video filter processing pipeline....................................51 6.4.4 Generic multi‐view texture mapping coordinates.....53 6.4.5 Texture transfers.............................................................53 6.4.6 Spatial multiplexing.......................................................55 6.5 Optimization details.................................................................57 7 Result.....................................................................................................58 7.1 Throughput as a function of video format bitrate................58 7.2 Throughput as a function of video resolution......................59 7.3 Throughput as a function of the number of views...............60 7.4 Average frame rate in relation to native frame rate.............61 v 3D Video Playback ‐ A modular cross‐platform GPU‐based approach for flexible multi‐view 3D video rendering Table of Contents Håkan Andersson 2010‐11‐25 7.5 Load balance..............................................................................61 7.6 Cross‐platform interoperability..............................................62 7.7 Validation of spatially multiplexed video.............................63 8 Conclusion............................................................................................64 8.1 Evaluation of system throughput...........................................64 8.2 Multi‐view and GPU performance.........................................65 8.3 Evaluation of load balance and hardware utilization..........66 8.4 Evaluation of cross‐platform interoperability.......................67 8.5 Evaluation of spatial multiplexing correctness.....................68 8.6 Future work and recommendations.......................................68 References........................................................................................................70 Appendix A: System specification for experiments................................75 Hardware specification..................................................................................75 Specification of software used throughout this project.............................75 Appendix B: Pixel buffer object (PBO) performance..............................76 vi 3D Video Playback ‐ A modular cross‐platform GPU‐based approach for flexible multi‐view 3D video rendering Terminology Håkan Andersson 2010‐11‐25 Terminology Mathematical notation Symbol Description f The measured average frame rate in terms of AVG processed frames per second with video synchronization turned off. f The native (intended) video frame rate. NATIVE Δf The difference between the measured average frame rate and native video frame rate. N Disparate view index of the view from which C { } the RGB‐sub component C∈ R,G,B should be fetched from. S A spatially multiplexed video frame (texture) N that contains pixel data from N disparate views. T The set of N texture mapping coordinate N quadruples or point pairs (u , v ), (u , v ) that 1 1 2 2 corresponds to the view alignment in a tiled multi‐view texture with N views. Vn A set of multiple‐views (multiple textures) that corresponds to the n:th video frame in a sequence of multi‐view video frames. vii 3D Video Playback ‐ A modular cross‐platform GPU‐based approach for flexible multi‐view 3D video rendering 1 Introduction Håkan Andersson 2010‐11‐25 1 Introduction The optical principles of displaying images with inherent depth and naturally changing perspective have been known for over a century, but until recently, displays have not been available that are capable of presenting 3D images and video without requiring user‐worn glasses, also known as auto‐stereoscopic displays. Different compression formats are in the process of being standardized by the motion pictures expert group (MPEG) as well as ISO/IEC for stereo and multi‐view based 3D video [1]. Software capable of decoding the different compression and encoding formats as well as present 3D video on different display types using standardized players is a vital part in the commercialization and evolution of 3D video. The multi‐perspective nature of 3D video enforces multiple data sources. Hence there is a high demand for fast data processing and as a direct implication of that, hardware accelerated solutions are of particular interest. 1.1 Background and problem motivation The wide variety of 3D display types and visualization techniques for multi‐view video require that the pixels from each view be mapped and aligned differently depending on the video format and the specific features of the visualization device. This implies that a video format exclusively generated for a specific device type will not be displayed correctly on other kinds of visualization devices. This is obviously a problem as the same video content has to be replicated in several different versions in order to be presented correctly on different visualization devices and screen resolutions. Several different video formats such as for example multi‐view video and video‐plus‐depth formats have been proposed, representing generic 3D video formats to address this problem as well as video compression issues. Using any of these generic formats, video can be interpreted and processed in real‐time to generate the correct pixel mappings required in order to display the 3D video content correctly. A generic format for representing 3D video thus eliminates the need to generate several different pre‐processed versions of the same video content [2]. 1 3D Video Playback ‐ A modular cross‐platform GPU‐based approach for flexible multi‐view 3D video rendering 1 Introduction Håkan Andersson 2010‐11‐25 The number of publicly available software video players capable of decoding and displaying 3D video is very limited and the few players available involve licensing fees, such as 3D Movie Center by Visumotion [3] and Stereoscopic player by Peter Wimmer [4]. In addition to the small range of available 3D video players, only a handful of pre‐defined video formats are usually supported, most commonly conventional stereo. Another disadvantage of these players is platform dependency. The lack of flexibility and limited functionality in current 3D video players means that there is a need for a flexible cross‐platform playback solution that can easily be configured and extended to support a wide range of both current and emerging displays and 3D video format standards. It is also desireable to investigate any associated hardware bottlenecks. 1.2 Overall aim The overall aim of this project is to design and implement a video playback solution, capable of displaying the basic 3D video formats. The possibilities of creating a playback solution built on top of a cross‐ platform, open source libraries for video decoding and hardware accelerated graphics will also be investigated in this work. Interpreting video data and converting it to a format that is compliant with the display in real‐time places high demands on system hardware as well as algorithm efficiency and therefore it is also of great interest to identify bottlenecks in the processing of 3D video content. By measuring hardware utilization and video throughput in terms of frame rate, this thesis aims at emphazising throughput related hardware problems associated with real‐time 3D video processing. This, in turn, might be valuable for future research, especially in fields such as 3D video compression and video encoding as well as playback system design. Moreover, the project aims to identify and propose a software architecture sufficiently efficient to process and present high definition 3D video in real‐time, yet be sufficiently flexible to support both current 3D video formats and emerging standards. It is also highly desirable to exploit the possibilities of implementing 3D video support in currently available video player software by means of extending the functionality 2 3D Video Playback ‐ A modular cross‐platform GPU‐based approach for flexible multi‐view 3D video rendering 1 Introduction Håkan Andersson 2010‐11‐25 of the existing software. This would eliminate the need for implementing synchronization, audio decoding, audio playback etc. which would be required if a video player was to be designed and implemented from scratch. 1.3 Scope This study is primarily focused on designing and implementing a 3D video playback solution for multi‐view auto‐stereoscopic content, primarily for lenticular displays. Hence, software architectural design and generalization of 3D video formats and display technologies are of greater interest than the implementation of extensive support for specific formats, display types or display brands. The comparison of suitable frameworks and libraries to use within this project is restricted to only giving consideration to cross‐platform and open‐source solutions. In addition, only frameworks that are non‐ commercial and free of charge are of interest. The choice of frameworks, libraries and platforms used throughout this project will be based purely on the results from the theoretical studies of related work and existing technologies publicly available. No experiments or benchmarking regarding this matter will be performed within this study. The theoretical part of this work is moreover limited to only offering the reader a brief introduction to the research field of auto‐stereoscopy and 3D visualization which is required in order to understand this work. Frameworks and libraries considered for this project will only be described briefly except for key parts and technologies of particular interest for this work. The practical part of this work aims at implementing a video playback solution as a simple prototype for research purposes according to the technical requirements of this thesis. No extensive testing of this prototype other than simple developer tests during the implementation phase will be conducted within the scope of this project. Performance measurements and the results obtained will be restricted to be only performed on one system, OS and hardware configuration. 3
Description: