Reinforcement Learning for Bandwidth Estimation in Real-Time Communications
This study presents a novel approach utilizing reinforcement learning for real-time bandwidth estimation and congestion control in communication networks. The focus is on improving two-way video streaming applications by addressing challenges such as limited upload speed, dynamic network conditions, and the absence of pre-encoded quality levels. The proposed R3Net model offers a solution by leveraging recurrent neural networks to adapt quickly to varying network parameters. Through a detailed exploration of the R3Net architecture and training process, this research aims to enhance the efficiency of real-time data transmission in communication systems.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Reinforcement learning for bandwidth estimation and congestion control in real-time communications June 2 2021 Joyce Fang, Martin Ellis, Bin Li, Siyao Liu, Yasaman Hosseinkashi, Michael Revow, Albert Sadovnikov, Ziyuan Liu, Peng Cheng, Sachin Ashok, David Zhao, Ross Cutler, Yan Lu, Johannes Gehrke, 33rd Conference on Neural Information Processing Systems (NeurIPS 2019)
Outline Introduction Initial Approach Evaluation Discussion and Conclusion 2
Introduction To concern with fundamental problems in networking research - how much data can be sent across a network path at a given time - how and when endpoints should send packets (to avoid causing network congestion and the associated packet delay and loss) To focus on applying reinforcement learning (RL) - improving real time, two-way, communications in video streaming - some sort of two-way video call like FaceTime or a Skype video call 3
Introduction RL has previously been applied to video streaming, but not in two-way real time communications. They are - video on demand (think of a regular YouTube) - real-time video streaming (think a live broadcast over Twitch, Facebook Live, etc.) What makes real time communications so difficult? - If you want two-way interactivity, you can t have a pre-fetched buffer which is typically used in the scenario - Upload speed comes into play for all parties, which is typically more limited than downlaod speed - The model needs to run in real-time, so decisions have a tighter window - There are no pre-encoded quality levels for the video streams, so the action space is perhaps not discrete To address these challenges propose a scheme of R3Net (an RL-based Recurrent Network for RTC) (allowing rapid adjustment to complex and dynamic network conditions) 4
R3Net: An Initial scheme Goal: to estimate bandwidth in real-time - to run on the receiver side - then results to the sender (the data to control the sending rate of the stream) bandwidth data flow and quality H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson. RTP: A Transport Protocol for Real-Time Applications. Internet RFCs, RFC 3550, 2003. the state input are as follows: a bandwidth estimate of 0 to 8 Mb/s receiver rate (kb/s) reward is designed as average package interval (ms) 0.6?? 4? + 1 ? 10? packet loss rate (%) R : receiver rate in (Mb/s) average RTT* (ms) D : average RTT L : packet loss rate 5 * round- trip time
Model and Training Use a R3Net structure with Gated Recurrent Units (GRUs) to estimate bandwidth! K.Cho et al, Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In EMNLP 2014 actor-critic framework 3 10 5 - The model is updated using Proximal Policy Optimization (PPO) and the Adam optimizer with a learning rate of J.Schulman et al, Proximal Policy Optimization Algorithms. arXiv preprint arXiv:1707.06347 [cs.LG], August 2017 - Around 10,000 network traces for simulation in training, and tested on 1150 different network traces. 6
Evaluation comparing R3Net with UKF based on set of 1150 test traces use purely network- based metrics including observed RTT, packet loss rate, and bandwidth utilization, the percentage of bandwidth used R3Net has 5% higher bandwidth than UKF (with similar RTTs and less packet loss) 7
Evaluation Using Real Networks evaluation of R3Net performance in RTC calls on real networks - deploy the R3Net into the ONNX format - use ONNX Runtime for inference in RTC system To compare the performance of UKF and R3Net in two different scenarios, WiFi, and 3G: (https://github.com/Netflix/vmaf, 2019) packet loss rates are higher on 3G networks when using R3Net (although the RTTs are fairly similar) corresponding degradations in both VMAF (indicating poorer image quality) and video frame drop rate (indicating choppy video) 8 R3Net takes relatively noisy actions, which lead to high packet loss and choppy video in real networks
Limitations two key areas for improvement: - the gap between training and the real world through realistic network simulation - design of a reward function that leads to high QoE developing a sufficiently realistic environment to represent the real world is challenging - using a generative model to produce realistic network traces - build a distributed testbed to enable large scale training using real network 9
Summary A new formulation of RL for bandwidth estimation and congestion control in real-time audio/video communication has been proposed. R3Net provides reasonable adjustment to dynamic network conditions in simulation and real networks using WiFi connections. 10