Xianglong Feng's Project list

HomePage |User viewport prediction | Security | AI for 5/6G

I have a wide range of research interest spanning from hardware security to multimedia system. My research follows the three main tracks:

The optimization for multimedia streaming system.
Security of multimedia system (frome low-level hardware security to upper-level application security).
AI for next generation wireless communication network.

My research projects cover the following research topics:

Hardware security
Security of machine learning
Heterogeneous computing platform/ IoT /mobile platform

Machine learning/ AI/ Computer Vision
VR/AR video streaming system
5/6G, next-generation wireless communication

User viewport prediction for live VR video streaming system

Given the demands of high resolution and high frame rate in VR streaming to ensure user’s quality of experience (QoE), the VR video content is typically huge in size and thus poses significant challenges in the network bandwidth consumption. Even if a single or small number of VR video viewing sessions can be supported by the state-of-the-art high bandwidth networks, the nature of the video streaming services that could involve millions of concurrent viewing sessions would create significant capacity challenges in both the backbone and edge networks. Such challenges would eventually be converted to degraded QoE towards the user end, significantly blocking the wider deployment of premium VR experiences.

Mountain landscape — The potential solution to the bandwidth challenge of VR streaming leverages the fact that the user can only watch an around 90- degree viewport at any point of time, leaving the rest (more than 80%) of the video content in the 360-degree frame unnecessary to be delivered to the mobile HMD. One example solution is selective streaming [38], which proposes to stream the portion of video that the user is more likely to watch in high resolution, while the rest of the video is delivered in low resolution.

Solution A: Motion detection and feature tracking(LiveMotion)

Mountain landscape — We develop a new viewport prediction scheme that works with live 360 video streaming and complicated user head movement patterns. Different from all the existing approaches on the historical or current user behavior, our approach employs a user model combining the video content and the user interests for viewport prediction. Our key idea is that, even though the user behavior is hardly predictable, there is often a close correlation between the user’s viewport of interest and the moving objects in the video. It is because most users would focus and follow the most active objects in the 360-degree frames, which are often the objects in motion (e.g., the ball in a soccer game). Therefore, we employ computer vision-based techniques to detect and track the objects in motion, which serve as the predictor of the user’s future viewport of interest. In particular, we develop a set of algorithms to accommodate the various user movement patterns following a Tracking-Recovery-Update-Evaluation (T-R-U-E) workflow

Solution B: online CNN based viewport prediction (LiveDeep)

Mountain landscape — We explored the feasibility of using a single convolutional neural network (CNN) model for live viewport prediction and identified the limitations of the simple CNN structure. We further improve the performance of prediction by employing an alternate and hybrid deep learning approach involving both CNN and long short-term memory (LSTM) models.

Solution C: Object detection based viewport prediction(LiveObj)

A computer vision based user study

Mountain landscape — To analyze the user viewing behavior, we first deploy the YOLOv3object detection algorithm to detect the objects in each videoframe. Then, we implement the tracking algorithm from Collins etal. combined with location-based verification to match the objectsbetween frames. After that, we parse the user head movement dataand draw the conclusion on which objects the user has been watchingduring the live streaming session.

The analysis of the above two videos reveal an important observationthat the user’s viewport of interest (indicated by the temporal metric) isnot correlated with the size of the object (indicated by the spatial metric).Instead, the user’s viewport is heavily dependent upon the semanticsof the objects (i.e., the degree of importance or attractiveness) in thevideo, which validates our hypothesis that the users tend to watch themeaningful objects that they are interested in.

Mountain landscape — To analyze the user viewing behavior, we first deploy the YOLOv3object detection algorithm to detect the objects in each videoframe. Then, we implement the tracking algorithm from Collins etal. combined with location-based verification to match the objectsbetween frames. After that, we parse the user head movement dataand draw the conclusion on which objects the user has been watchingduring the live streaming session.

The analysis of the above two videos reveal an important observationthat the user’s viewport of interest (indicated by the temporal metric) isnot correlated with the size of the object (indicated by the spatial metric).Instead, the user’s viewport is heavily dependent upon the semanticsof the objects (i.e., the degree of importance or attractiveness) in thevideo, which validates our hypothesis that the users tend to watch themeaningful objects that they are interested in.

Mountain landscape — System Architecture.

Mountain landscape — Error recovery strategies in selective streaming. (a) is the originalframe (red – predicted viewport, yellow – actual viewport). (b) showsthe user view with no recovery. (c) and (d) are the user views with adown-resolution rate of 20% and 60%, respectively

Security

VVSec: Securing Volumetric Video Streaming via Benign Use ofAdversarial Perturbation

Mountain landscape — Inspired by the nature of the adversarial attacks, we propose anovel defense mechanisms,VVSec, to protect the confidentiality ofvolumetric video. In a nutshell,VVSecadds adversarial perturbationat the sender (i.e., Alice) side of the volumetric video streaming, sothat even if Malice could extract the RGB-D facial information inplaintext, it would fail to pass the face authentication due to theeffect of the "adversarial" perturbation on the deep neural network.On the other hand, the original functionality of volumetric stream-ing especially the perceivable quality of experience to human usersis unchanged, as ensured by the design principle of adversarialperturbations.

Runtime Fault Injection Detection for FPGA-based DNN Execution Using Siamese Path Verification

Mountain landscape — Cloud-Client system and how our SPV works

Mountain landscape — Cloud-Client system and how our SPV works

Towards the Security of Motion Detection-based Video Surveillance on IoT Devices

Mountain landscape — we implement a prototype system of a security sensitive surveillance camera, which has an on-device motion detection module to detect the objects in motion in the captured video in real time, as demonstrated in the above figure, which shows the hardware system setup using Xilinx Zynq-7000 ZC702 SoC. The SoC has a CPU component that contains two ARM cores and an FPGA component that contains a Xilinx FPGA board. We employ the CPU part on the board to provide basic interface to receive the video frames from the HDMI card, and we deploy a motion detection IP core in the FPGA part to conduct real time motion detection based on the received video frames. We then develop a proof-of-concept prototype demonstrating video replay attacks, in which the compromised surveillance device hides the chosen suspicious motion by overwriting the corresponding frames with pre-recorded normal frames under the control of the attacker.

Mountain landscape — we implement a prototype system of a security sensitive surveillance camera, which has an on-device motion detection module to detect the objects in motion in the captured video in real time, as demonstrated in the above figure, which shows the hardware system setup using Xilinx Zynq-7000 ZC702 SoC. The SoC has a CPU component that contains two ARM cores and an FPGA component that contains a Xilinx FPGA board. We employ the CPU part on the board to provide basic interface to receive the video frames from the HDMI card, and we deploy a motion detection IP core in the FPGA part to conduct real time motion detection based on the received video frames. We then develop a proof-of-concept prototype demonstrating video replay attacks, in which the compromised surveillance device hides the chosen suspicious motion by overwriting the corresponding frames with pre-recorded normal frames under the control of the attacker.

Runtime verification based on proactive checking and approximating computing

Mountain landscape — To better illustrate the two CPU-FPGA threat models, we implement a prototype system of a security sensitive surveillance camera, which has an on-device motion detection module to detect the objects in motion in the captured video in real time, as demonstrated in the above figure, which shows the hardware system setup using Xilinx Zynq-7000 ZC702 SoC. The SoC has a CPU component that contains two ARM cores and an FPGA component that contains a Xilinx FPGA board. We employ the CPU part on the board to provide basic interface to receive the video frames from the HDMI card, and we deploy a motion detection IP core in the FPGA part to conduct real time motion detection based on the received video frames. As shown in (a) and (b), there is a remarkable white area to indicate the moving objects on the road. We further implement a threat model based on the prototype system. The outcome caused by the threat model is shown in (c), where the moving object is hidden in the background if there is no motion detected. The attack scenario is a “replay attack"

Mountain landscape — To better illustrate the two CPU-FPGA threat models, we implement a prototype system of a security sensitive surveillance camera, which has an on-device motion detection module to detect the objects in motion in the captured video in real time, as demonstrated in the above figure, which shows the hardware system setup using Xilinx Zynq-7000 ZC702 SoC. The SoC has a CPU component that contains two ARM cores and an FPGA component that contains a Xilinx FPGA board. We employ the CPU part on the board to provide basic interface to receive the video frames from the HDMI card, and we deploy a motion detection IP core in the FPGA part to conduct real time motion detection based on the received video frames. As shown in (a) and (b), there is a remarkable white area to indicate the moving objects on the road. We further implement a threat model based on the prototype system. The outcome caused by the threat model is shown in (c), where the moving object is hidden in the background if there is no motion detected. The attack scenario is a “replay attack"

Mountain landscape — we develop and deploy the approximate computing-based verification framework to a CPU-FPGA prototype and conduct a comprehensive case study using a video motion detection application. The approximate computing algorithm employs two types of application-level approximations, namely spatial approximation and temporal approximation, to achieve the goals of runtime repeated execution and verification. Our empirical hardware evaluation on the Xilinx Zynq CPU-FPGA platform justifies the premium security and performance of the proposed approach. (SoC).

Mountain landscape — we develop and deploy the approximate computing-based verification framework to a CPU-FPGA prototype and conduct a comprehensive case study using a video motion detection application. The approximate computing algorithm employs two types of application-level approximations, namely spatial approximation and temporal approximation, to achieve the goals of runtime repeated execution and verification. Our empirical hardware evaluation on the Xilinx Zynq CPU-FPGA platform justifies the premium security and performance of the proposed approach. (SoC).

© Xianglong Feng 2007-2021 All rights reserved. Last Update: Nov 19, 2020