
Neural Network Approach to Visual Tracking with FCN Model
"Explore the use of Fully Convolutional Networks (FCN) in visual tracking, focusing on the FCN-tracker model and its benefits in object tracking with neural networks. Learn about the motivation, procedures, results, and conclusions of this innovative approach from the Chinese University of Hong Kong. Discover how FCN leverages CNN for efficient feature learning in tracking algorithms."
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
A neural network approach to visual tracking Zhe Zhang, Kin Hong Wong*, Zhiliang Zeng, Lei Zhu Department of Computer Science and Engineering The Chinese University of Hong Kong Contact: *khwong@cse.cuhk.edu.hk ANN approach to visual tracking, MVA17 v.7g 1
Contents Introduction Motivation What is FCN? Procedures and Implementation Result Conclusion ANN approach to visual tracking, MVA17 v.7g 2
Introduction FCN-tracker is a modified FCN (Fully Convolutional Networks) model for object tracking. The tracking problem is to locate a moving target. Time=1 Time=2 Time=3 ANN approach to visual tracking, MVA17 v.7g 3
Contents Introduction Motivation What is FCN? Procedures and Implementation Result Conclusion ANN approach to visual tracking, MVA17 v.7g 4
Motivation Many tracking algorithm are solely rely on simple hand-crafted features. CNN (Convolutional neural network) is good at learning efficient features from a large quantity of data. FCN is based on CNN which provides end-to- end training that can help to construct a simple pipeline. ANN approach to visual tracking, MVA17 v.7g 5
Contents Introduction Motivation What is FCN? Procedures and Implementation Result Conclusion ANN approach to visual tracking, MVA17 v.7g 6
ANN approach to visual tracking, MVA17 v.7g What is FCN (Fully Convolutional Networks)? Fully Convolutional Networks[1] for Semantic Segmentation(FCN Net) dogcat Same size [1] J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015 7
In FCN [1] Output: 3 labels: background, cat, dog ANN approach to visual tracking, MVA17 v.7g 8
Why we use FCN for a tracker? Two aspects that semantic segmentation and visual tracking are in common The goal of segmentation is to distinguish between different objects and background, while visual tracking is aiming at discriminating the target object from other objects and background. In addition, both tasks produces the pixel-level output. ANN approach to visual tracking, MVA17 v.7g 9
Contents Introduction Motivation What is FCN? Procedures Result Conclusion ANN approach to visual tracking, MVA17 v.7g 10
Procedures Our window is : s x s=128x128 For the first frame: Use the first ground truth data :w,h, center (cx,cy) of the target window provided by the dataset. Find E based on w,h,s, =4: The label (fi,j ) for FCN of pixel (i,j) is generated by a Gaussian function of user selected ? = max ?,? ? 2 4 2+ ? ?? 2?2 ? ?? ???= ? After FCN training (100 iterations) , it can predict the response (heat) map. From the second frame onward during tracking: Using the generated label and response map to re-train the model (once for each subsequence frame). 11 ANN approach to visual tracking, MVA17 v.7g
Procedure Network Structure One more convolution layer added here Our network structure is nearly the same as the original FCN-8s except that one more convolutional layer is appended after the last feature map layer. The newly added convolutional layer is trained to transform the segmentation response map into target center highlighted response map. 12 ANN approach to visual tracking, MVA17 v.7g
FCN Tracker Crop Result Hand pick a window of size w,h The system uses a ExE window for training w E h E Here, S = 128, ? = 4, w = 81, h = 81 ANN approach to visual tracking, MVA17 v.7g 13
Procedure :FCN Tracker- Crop function We use following equation to crop the image sequence w ? = max ?,? ? E= the edge length of square region, s = is the expected input size (128 here), w and h denote the width and the height of the object in the first frame. ? =4 in our program. The reason we do this is that we only concern the object, instead using the whole image, we only use the square region around the object center. ? here is a scaling factor because we want to cover some context information around the object. E h E ANN approach to visual tracking, MVA17 v.7g 14
FCN Tracker-Predict output After training the model, the network is able to predict the target center. The target center would be the maximum position of the confident map. Since we assume that the object movement is smooth, so we impose higher weights near the center and lower weights in the surrounding region. To do this we utilized the Hann window. ? = ??? Here C denote the confident map, ?? is the response map output from the network, P is the prior distribution, which is a Hann window in our program. ANN approach to visual tracking, MVA17 v.7g 15
FCN Tracker-Predict output Apply prior distribution to response map x FCN predict the response map fc C Input cropped data ? = ??? P=Hann window ANN approach to visual tracking, MVA17 v.7g 16
FCN Tracker-Generate Label After tracking, we still need to refine our model to improve the accuracy, that means we need the label. We don t provide the prior label, instead we assume the label should be a Gaussian shaped distribution. ???= ? 2 4 2+ ? ?? 2?2 ? ?? Here ??? denotes the value in the label indexed by i and j, cx and cy denotes the coordinate of the target center. ? is the predefined parameter. We set it to 4 in our program. ANN approach to visual tracking, MVA17 v.7g 17
The tracker in action Frame 1 Frame 3 Frame 2 Use the response map to predict the next target position Use the updated response map to predict the next target position Apply FCN for 100 iterations to generate the response (heat) map Apply FCN for 1 iteration to generate the updated response (heat) map Apply FCN for 1 iteration to generate the updated response (heat) map 100 iterations 1 iteration 1 iteration Updated response map Response map Updated response map ANN approach to visual tracking, MVA17 v.7g 18
FCN Tracker - Result Demo video https://youtu.be/WcQLmTi07Oc ANN approach to visual tracking, MVA17 v.7g 19
FCN Tracker - Result How many % of frames fall into the error threshold How many % of frames fall into the overlap threshold Error Result compare to other trackers using OTB database ANN approach to visual tracking, MVA17 v.7g 20
Conclusion Our FCNs can track the target and online update the model efficiently. The whole pipeline is straightforward and simple. Experimental results on OTB benchmark show that our tracker is competitive to state-of-the-art results. ANN approach to visual tracking, MVA17 v.7g 21
Thank you Q&A ANN approach to visual tracking, MVA17 v.7g 22