Cocktail: Neural Network Controller from Multiple Experts via Adaptive Mixing
In cyber-physical systems, designing a single neural network controller from various experts improves performance and verifiability. The system aims to enhance robustness, efficiency, and control properties, using adaptive mixing and robust distillation methodologies. By optimizing control robustness, energy efficiency, and verifiability, Cocktail offers a comprehensive approach to control system design. With a focus on teacher-student learning paradigms and reinforcement techniques, it addresses challenges in system-level weighted combination and adversarial training to achieve elevated control performance.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Cocktail: Learn a Better Neural Network Controller from Multiple Experts via Adaptive Mixing and Robust Distillation Yixuan Wang, Chao Huang, Zhilu Wang, Shichao Xu, Zhaoran Wang, and Qi Zhu Northwestern University, USA 1
Motivation and Goal In cyber-physical systems, multiple controllers(experts) may be designed for the same function, and each have its own advantages under different circumstances. Design and learn a single neural network controller from multiple experts to have better performance and verifiability. In literature, simplex architecture, reinforcement learning based switching control may be resource- consuming and intractable to verify(important in safety-critical systems). 2
Problem Formulation Feedback control system with multiple control experts Dynamics: ? ? + 1 = ? ? ? ,? ? ,? ? ,? ? , ? 0; Initial state: ? 0 ?0 ?(safe region) Multiple experts: ? ? = ??? ? ,? = 1, ?. Trajectory: ??,? 0= ? 0 ,? 1 , , follows dynamics and control expert ? With ?, A system starting from ?(0) is safe if and only if ? ? 3
Problem Formulation Control properties Control robustness for a controller ? is defined as its safe control rate ?? under optimized or random ?(?) Control energy efficiency for a controller ? is defined as the average control energy cost of various trajectories starting from its initial safe region(LQR). Verifiability is measured by the computation time of the verification processes for various properties on a given platform. Problem: Given a system and multiple control experts ??(? = 1, ,?), we will design a new neural network controller ? that optimizes the control robustness, control energy efficiency, and verifiability 4
Cocktail Overview Teacher Learn adaptive mixing for multiple control experts with optimal linear combination by Reinforcement learning to improve robustness and efficiency System-level weighted combination Adaptive Mixing ? (?) Robust Distillation Student ? Control Experts X ?1 ? (?) + X ? (?) Adversarial Examples ?(?) ?(?) Conduct robust distillation by probabilistic adversarial training and regularization from the adaptive mixing to enhance the robustness and verifiability ?? X Plant 5
Cocktail: RL based Adaptive Mixing Goal: Learn an adaptive mixing strategy to improve robustness and efficiency. Expand adaptation space into a polyhedron, linearly combine the control input by multiple experts with a clipping function ? ? ? = ???? ??? ??? ? ,????,???? ?=1 ?? ?? ??,?? 1 RL-Reward function design Penalty for energy consumption(Efficiency) ? ?,? = (| ? |),?? ?????? ????,?? ? ? Penalty for falling in unsafe region(Robustness) 6
Cocktail: RL based Adaptive Mixing Optimization problem formulated for adaptive mixing as ? 1 ?[?? ?(? ? ,?(?))] max???= ?=0 ? ? ? = ???? ??? ??? ? ,????,???? ?=1 ?? ?? ??,?? 1 ? ? + 1 = ?(? ? ,? ? ,? ? ,?(?)) Solution: Proximal Policy Gradient(PPO), which is proved to achieve global optimum at a sub-linear rate, thus obtaining optimal linear weights for experts, improving control robustness and efficiency 7
Cocktail: Robust Distillation Goal: Learn a single controller with better robustness and verifiability from the hierarchical adaptive mixing architecture. Observation: Smaller Lipschitz constant of neural network controller leads to stronger control robustness and verifiability. 8
Cocktail: Robust Distillation Solution: Distillation with adversarial training and regularization 2) ? ? ? ? + ?;? ,? + ? ? min ?( max Probabilistic version: Case 1: ? is optimized by Fast Gradient Sign Method(FGSM) with probability ? for robust training to reduce the Lipschitz constant. Case 2: ? = 0 with probability 1 ? for approximating the function of adaptive mixing. 9
Cocktail: Verification for Neural Network Controlled Systems Control invariant set ??= ? ??,?? ?? ?, ? 0, ? ? } Solved by function approximation with partitioning the state space and SDP solver Reachable set ??= ?? 0 ,?? ? 0 ?0, 0 ? ? 1} Solved by ReachNN 10
Experiments Nonlinear system models 2D Van der Pol s oscillator with 2 NN controllers; 3D system with 1 NN controller and 1 polynomial controller; 4D cartpole system with 2 NN controllers. Methods Using single expert such as only ?1, only ?2; Switching control ??; Adaptive mixing ??; Direct distillation ?? from adaptive mixing; Cocktail ? ; 11
Results: Overall Comparison Overall comparison for safe control rate, control energy efficiency, and Lipschitz constant. Controller from Cocktail behaves best among the different methods. 12
Results: Robustness Controller from Cocktail has better robustness due to the novel robust distillation approach 13
Results: Verifiability 15-step of reachable set is shown for ? while the 12-step reachable set verification of ?? failed by running out of memory. Invariant set comparison. It takes 11 hours to verify ?? while only costs 32 minutes to verify ? . 14
Conclusion We propose cocktail framework to learn a better neural network controller from multiple existing control experts with improving the control robustness, control energy efficiency, and verifiability. Our framework first learns an adaptive mixing strategy for multiple experts and conducts robust distillation to synthesize a single optimal controller 15
Thanks! 16