Spatial Audio for Virtual and Augmented Reality Devices
Virtual and augmented reality devices are a hot topic for research and product development. In most of the cases these devices come in the form of head mounted displays. Integral part of these devices are several audio subsystems: spatial audio rendering, capture of the user’s voice and capturing the environmental audio.
Unlike the vision, where humans have approximately 90° field of view, human hearing covers all directions in all three dimensions. This means that the spatial audio system of these devices is expected to provide realistic rendering of sound objects in full 3D to complement the stereoscopic rendering of the visual objects.
In this tutorial, we will discuss the problems and potential solutions around the spatial audio subsystem in the devices for virtual and augmented reality. These include the Head-Related Transfer Functions (HRTFs) personalization, generation of the proper reverberation and distance cues. Most of the problems and solutions have broader scope and affect everyday scenarios such as listening to stereo music with headphones and watching movies on mobile devices.
Dr. Ivan Tashev received his Master’s degree in Electronic Engineering in 1984 and PhD in Computer Science in 1990 from Technical University of Sofia, Bulgaria. He was Assistant Professor in the Department of Electronic Engineering of the same university in 1998, when moved to Microsoft in Redmond, USA. Currently Dr. Tashev is a Partner Software Architect and leads the Audio and Acoustics Research Group in Microsoft Research Labs in Redmond, USA. Ivan Tashev is a senior member of IEEE since 2006, member of Audio Engineering Society since 2006. Serves as member and associate member of IEEE SPS Audio and Acoustics Signal Processing Technical Committee and IEEE SPS Industrial DSP Standing Committee. Since 2012 he is adjunct professor in the Department of Electrical Engineering of the University of Washington in Seattle, USA.
Dr. Tashev published two scientific books as the sole author, wrote chapters in two other books, authored or coauthored more than 70 publications in scientific journals and conferences. Ivan Tashev is listed as inventor of 50 USA patent applications, 31 of them already granted. The audio processing technologies, created by Dr. Tashev, have been incorporated in Microsoft Windows, Microsoft Auto Platform, and Microsoft Round Table device. Dr. Tashev served as the leading audio architect for Kinect for Xbox and Microsoft HoloLens. More details about him can be found in his web page https://www.microsoft.com/en-us/research/people/ivantash/.
Adaptive Streaming of Traditional and Omnidirectional Media
This tutorial consists of three main parts. In the first part, we provide a detailed overview of the HTML5 standard and show how it can be used for adaptive streaming deployments. In particular, we focus on the HTML5 video, media extensions, and multi-bitrate encoding, encapsulation and encryption workflows, and survey well-established streaming solutions. Furthermore, we present experiences from the existing deployments and the relevant de jure and de facto standards (DASH, HLS, CMAF) in this space. In the second part, we focus on omnidirectional (360°) media from creation to consumption. We survey means for the acquisition, projection, coding and packaging of omnidirectional media as well as delivery, decoding and rendering methods. Emerging standards and industry practices are covered as well. The last part presents some of the current research trends, open issues that need further exploration and investigation, and various efforts that are underway in the streaming industry.
Target Audience and Prerequisite Knowledge
This tutorial includes both introductory and advanced level information. The audience is expected of understanding of basic video coding and IP networking principles. Researchers, developers, content and service providers are all welcome.
Table of Contents
- Part I: The HTML5 Standard and Adaptive Streaming
- HTML5 video and media extensions
- Survey of well-established streaming solutions
- Multi-bitrate encoding, and encapsulation and encryption workflows
- The MPEG-DASH standard, Apple HLS and the developing CMAF standard
- Part II: Omnidirectional (360°) Media
- Acquisition, projection, coding and packaging of 360° video
- Delivery, decoding and rendering methods
- The developing MPEG-OMAF and MPEG-I standards
- Part III: Open Issues and Future Directions
- Common issues in scaling and improving quality, multi-screen/hybrid delivery
- Ongoing industry efforts
Ali C. Begen recently joined the computer science department at Ozyegin University. Previously, he was a research and development engineer at Cisco, where he has architected, designed and developed algorithms, protocols, products and solutions in the service provider and enterprise video domains. Currently, in addition to teaching and research, he provides consulting services to industrial, legal, and academic institutions through Networked Media, a company he co-founded. Begen holds a Ph.D. degree in electrical and computer engineering from Georgia Tech. He received a number of scholarly and industry awards, and he has editorial positions in prestigious magazines and journals in the field. He is a senior member of the IEEE and a senior member of the ACM. In January 2016, he was elected as a distinguished lecturer by the IEEE Communications Society. Further information on his projects, publications, talks, and teaching, standards and professional activities can be found at http://ali.begen.net.
Christian Timmerer received his M.Sc. (Dipl.-Ing.) in January 2003 and his Ph.D. (Dr.techn.) in June 2006 (for research on the adaptation of scalable multimedia content in streaming and constrained environments) both from the Alpen-Adria-Universität (AAU) Klagenfurt. He joined the AAU in 1999 (as a system administrator) and is currently an Associate Professor at the Institute of Information Technology (ITEC) within the Multimedia Communication Group. His research interests include immersive multimedia communications, streaming, adaptation, Quality of Experience, and Sensory Experience. He was the general chair of WIAMIS 2008, QoMEX 2013, and MMSys 2016 and has participated in several EC-funded projects, notably DANAE, ENTHRONE, P2P-Next, ALICANTE, SocialSensor, COST IC1003 QUALINET, and ICoSOLE. He also participated in ISO/MPEG work for several years, notably in the area of MPEG-21, MPEG-M, MPEG-V, and MPEG-DASH where he also served as standard editor. In 2012 he cofounded Bitmovin (http://www.bitmovin.com/) to provide professional services around MPEG-DASH where he holds the position of the Chief Innovation Officer (CIO).
Future Video Coding: Beyond H.265/HEVC
Zhenzhong Chen (Wuhan University)
Jizheng Xu (Microsoft)
Visual Content Perception and Understanding
Over the past decades, there has been a growing interest in visual content perception and understanding with broad multimedia applications. As an important component of visual perception, visual quality assessment has attracted extensive attention from both academia and industry. It can be used not only in monitoring image quality distortions, but also in optimizing various image processing algorithms/systems. Visual recognition aims to analyze visual content from images and videos, which plays an important role in many visual content understanding systems such as face recognition, person re-identification, and visual search. In this tutorial, we provide the past achievements, current status, and open problems of visual quality assessment and visual recognition.
Table of Contents
This tutorial will mainly introduce the development of visual quality assessment and visual recognition. In the first section, we briefly introduce the achievements of visual quality assessment during the past decade, and provide the key advantages and disadvantages of existing visual quality assessment metrics. Then we will introduce some of new visual quality assessment methods proposed in our team, mainly including two types: general-purpose and application-specific approaches. Some open problems in visual quality assessment are also discussed for the potential research in the future. In the second section, we introduce the basic concept of distance metric learning, and show the key advantages and disadvantages of existing distance metric learning methods in different visual understanding tasks. Then, we introduce some of our newly proposed distance metric learning methods such as deep metric learning, cost-sensitive metric learning, and Hamming distance metric learning, respectively. Lastly, we discuss some open problems in distance metric learning to show how to further develop more advanced metric learning algorithms for visual understanding in the future.
Yuming Fang, Ph.D
School of Information Technology
Jiangxi University of Finance and Economics, Nanchang, China
Biography: Yuming Fang is a Professor in the School of Information Technology, Jiangxi University of Finance and Economics, Nanchang, China. Previously, he was a (visiting) researcher in IRCCyN lab, PolyTech’ Nantes, Univ. Nantes, France, National Tsinghua University, Taiwan and University of Waterloo, Canada. He received his Ph.D. degree from Nanyang Technological University in Singapore, M.S. degree from Beijing University of Technology in China, and B.E. degree from Sichuan University in China. His research interest includes visual quality assessment, visual attention, computer vision, etc. He has published over 90 scientific papers in these areas, including over 30 papers are published in the IEEE Transactions/Journals. He is an IEEE MMTC member. And he serves as an associate editor of IEEE Access and Signal Processing: Image Communications. He is/was a TPC Co-Chair for ISITC 2016, a Program Chair for the Third Workshop on Emerging Multimedia Systems and Applications, an Area Chair for VCIP 2016, ICME 2016, ICME 2017, etc. He is a member of IEEE.
Jiwen Lu, Ph.D
Department of Automation
Tsinghua University, Beijing, China
Biography: Jiwen Lu is an Associate Professor with the Department of Automation, Tsinghua University, China. From March 2011 to November 2015, he was a Research Scientist at the Advanced Digital Sciences Center (ADSC), Singapore. His research interests include computer vision, pattern recognition, and machine learning. He has authored/co-authored over 150 scientific papers in these areas, where 40 papers are published in the IEEE Transactions (4 PAMI and 10 TIP) and 20 papers are published in top-tier computer vision conferences (ICCV/CVPR/ECCV). He is an elected member of the Information Forensics and Security Technical Committee of the IEEE Signal Processing Society. He serves an Associate Editor of Pattern Recognition Letters, Neurocomputing, and IEEE Access, and a Guest Editor of 5 journals such as Pattern Recognition, Computer Vision and Image Understanding, and Image and Vision Computing. He is/was a Program Co-Chair for ICGIP’17, an Area Chair for ICIP’17, BTAS’16, ICB’16, WACV’16, VCIP’16, ICME’15, and ICB’15, a Workshop Co-Chair for WACV’17 and ACCV’16, and a Special Session Co-Chair for VCIP’15, respectively. He is a senior member of the IEEE.
Image-based three-dimensional data acquisition
With the advance in image processing technology, many image-based three-dimensional data acquisition techniques have been developed and used in various applications including medical diagnosis, machine part inspection, movie and video game production, and many others. The objective of this tutorial is to give an overview of the basic principles of these techniques and discuss their latest development. The first part of this tutorial will introduce the basic principles of the fringe projection profilometry (FPP), which is one of the popular structured light illumination techniques for measuring the 3D shape of objects in a non-contact manner. Some latest development of robust FPP using the sparse representation techniques will also be discussed. The second part of this tutorial will cover the topic of motion capture (Mocap) data processing. The techniques for Mocap data compression will be introduced; and the applications of depth map camera will also be discussed.
Table of Contents
The first part of this tutorial is on the fringe projection profilometry (FPP) techniques for 3D measurement of objects’ shape. It starts with an introduction of the background and applications of the FPP techniques. Some current applications of FPP in 3D microscopy, 3D endoscopes and other 3D medical devices will be explained. Then the principles of two majors FPP techniques will be elaborated. They include the Fourier transform profilometry (FTP) and phase shifted profilometry (PSP). Their limitations in practical working environment will also be illustrated. These limitations in general lower the robustness of the FPP in adverse working environments. Then two state-of-the-art sparse representation techniques for enhancing the robustness of FPP will be introduced. Simulation and experimental results will be shown to illustrate the effectiveness of these approaches. The second part of this tutorial is on motion capture data processing. It starts with an introduction of Mocap signal processing techniques. Some latest topics such as Mocap data recovery, low rank representation and trajectory-based representation of Mocap data will be discussed. In addition, the tutorial will also cover topics in Mocap sequence compression using the low rank matrix approximation and low latency Mocap data compression. At the end, two applications of Mocap data processing will be briefly discussed. Video demonstration will be shown to illustrate the performance of the applications.
Daniel, P.K. Lun received his B.Sc.(Hons.) degree from the University of Essex, U.K., and PhD degree from the Hong Kong Polytechnic University in 1988 and 1991, respectively. In 1991, he joined the Hong Kong Polytechnic University and is now an Associate Professor and Interim Head of the Department of Electronic and Information Engineering. His research interest includes wavelets theory, signal and image enhancement, computational imaging and 3D model reconstruction. He was the Chairman of the IEEE Hong Kong Chapter of Signal Processing in 1999-00. He was the General Chair of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing (ISIMP 2004), and General Co-Chair of 19th International Conference on Digital Signal Processing (DSP 2014). He was also the Technical Co-Chair of 2015 APSIPA Annual Summit and Conference (APSIPA ASC 2015), and Technical Co-Chair of 2015 IEEE 20th International Conference on Digital Signal Processing (DSP 2015). He was an Area Chair of ICME 2016 and 2017. He was the executive committee members of a number of other international conferences including the ICASSP 2003, ICIP 2010 and ICME 2017. He received the Certificate of Merit from the IEEE Signal Processing Society for dedication and leadership in organizing the ICIP 2010. He and his research students have also received three best paper awards in international conferences. He was the Editor of HKIE Transactions published by the Hong Kong Institution of Engineers (HKIE) in the area of Electrical Engineering. He was also the leading guest editor of a special issue of EURASIP journal of Advances in Signal Processing. He is currently an Associate Editor of IEEE Signal Processing Letters. He is a Chartered Engineer, a fellow of IET, a corporate member of HKIE and a senior member of IEEE. Further information of his publications, academic and professional activities can be found in www.eie.polyu.edu.hk/~enpklun.
Lap-Pui Chau received the Bachelor degree from Oxford Brookes University, and the Ph.D. degree from The Hong Kong Polytechnic University, in 1992 and 1997, respectively. His research interests include fast visual signal processing algorithms, light-field imaging, VLSI for signal processing, and human motion analysis. He was a General Chairs for IEEE International Conference on Digital Signal Processing (DSP 2015) and International Conference on Information, Communications and Signal Processing (ICICS 2015). He was a Program Chairs for International Conference on Multimedia and Expo (ICME 2016), Visual Communications and Image Processing (VCIP 2013) and International Symposium on Intelligent Signal Processing and Communications Systems (ISPACS 2010). He was the chair of Technical Committee on Circuits & Systems for Communications (TC-CASC) of IEEE Circuits and Systems Society from 2010 to 2012. He served as an associate editor for IEEE Transactions on Multimedia, IEEE Signal Processing Letters, IEEE Transactions on Circuits and Systems for Video Technology, IEEE Circuits and Systems Society Newsletter, and is currently serving as an associate editor for IEEE Transactions on Circuits and Systems II, IEEE Transactions on Broadcasting, and The Visual Computer (Springer Journal). Besides, he was an IEEE Distinguished Lecturer for 2009-2016, and a steering committee member of IEEE Transactions for Mobile Computing from 2011-2013. He is an IEEE Fellow. Further information of his publication and research achievements can be found in http://www.ntu.edu.sg/home/elpchau.