
Unroll Factors Using Supervised Classification
Explore the use of supervised classification to predict optimal loop unroll factors, understanding the benefits and drawbacks of loop unrolling techniques, and leveraging heuristics and supervised learning to enhance loop optimization in software development.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Predicting Unroll Factors Using Supervised Classification Abhishek Kumar, Christian Smith, Thomas Oliver
Loop Unrolling Loop transformation technique to increase speed by replicating loop body and decreasing branches Pro: Branch penalty decreased/reduces loop overhead Reduce # of executed instructions, ex. same memory access Expose ILP: more flexible scheduling Con: Code expansion: decrease I$ performance. Longer live range: increase register pressure Impacts scheduler, register allocation, memory system, and other optimizations
Heuristics Static, conservative approach. Tuned by humans, does not account for all the parameters and interdependencies Time-consuming and machine-dependent Results in under-utilization g++ Unrolls loops are rolled constant times or can compute rolls in compile time, only unrolls remaining loops if asked Unroll only inner loops Unroll by a set factor that s a power of 2, and small body LLVM: Compares cost of rolled vs unrolled loop Fully or partially unroll loops with a static trip count, rest unrolled based on threshold
Supervised Learning Maps an input to an output based on example input-output pairs Performed on training examples <xi, yi> is composed of a feature vector xiand a corresponding label yi. Vector contains measurable characteristics of the object, ex. Loop characteristics Training label indicates which optimization is the best for each training example, ex. Unroll factor Classifier learns how to best map loop characteristics to observed labels using training examples Once trained, hopefully be able to accurately discriminate examples not in training set
Training Data 72 benchmarks from SPEC, Mediabench applications, Perfect suite, etc. 3 languages: C, Fortran, Fortran 90 Open Research Compiler Measure runtime of loops for each unroll factor At entry and exit, capture processor s cycle counter and place it in the loop s associated counter using inserted assembly instructions Record the fastest unroll factor as the label for that loop Affects execution time so only use loops that are run for at least 50,000 cycles Run each benchmark 30 times for all unroll factors up to eight
Machine Learning Algorithms The paper uses two supervised learning algorithms to map loop features to unroll factors: Near Neighbor classification Support Vector Machines
Near Neighbor Classification Basic and intuitive technique Easy and fast to train Algorithm: Have a set of training examples <xi, yi> and an input feature vector x Let S = the set all training examples <xi, yi> with Euclidean distance ||xi- x|| < d Value of d is configurable - the paper uses 0.3 Our answer is the most common y among all examples in S
Near Neighbor Classification Have a set of training examples <xi, yi> and an input feature vector x Let S = the set all training examples <xi, yi> with Euclidean distance ||xi- x|| < d The most common y among all examples in S is assigned to x
Support Vector Machines Detailed description not given here Maps d-dimensional feature space to higher dimensional space (easier to separate data) Finds boundaries that maximally separate the classes
Support Vector Machines More complex and longer to train than NN classification SVMs are binary classifiers, so we must do some additional work to get them to work for multi-class problems Less chance of overfitting compared to NN classification Results in better accuracy on novel examples after training
Feature Selection 38 extracted features from which to select Need to determine which features are used as inputs to the model Too many features Longer training times Possible overfitting to training set Harder to understand Too few features Not as accurate Want to eliminate all redundant and irrelevant features
Mutual Information Score (MIS) Measures how much knowledge of best unrolling factor reduces the variance in a feature (higher score means the feature is informative) Does not provide information about how one feature interacts with another Only measures information content, not helpfulness for a specific classifier
Greedy Feature Selection Picks the single best remaining feature for a specific classifier and adds it to the features it uses. Repeat a user-specified number of times. Different for each type of classifier
Results of Feature Selection Used the union of the features in the tables for their classifier Number of instructions in a loop is the de facto standard for heuristics
Oracle Best decision based on timing of individual loop Theoretically best performance for a classifier Not always the best overall because it only accounts for the unrolling of a single loop
Results 65% of the time SVM finds optimal without SWP 14% finds nearly-optimal 79% of the time the classification is within 7% of optimal SVM faster for 19 of 24 benchmarks without SWP Average speedup of 5% on SPEC in general Average speedup of 9% on SPECfp 62% of the time nearest neighbor finds optimal without SWP Faster on 16 of 24 benchmarks Average speedup of 4% SVM is 1% faster with SWP Oracle is 4.4% faster with SWP
Thanks! Any questions?
From Binary to Multi-Class SVMs are binary classifiers - they only support giving a yes or no answer Mapping loops to unroll factors is not a binary classification problem, but a multi-class classification problem We can break a multi-class classification problem into multiple binary classification problems, allowing the use of SVMs Technique is called output codes Each binary classifier gives a yes or no answer for a single class E.g., should we apply an unroll factor of 4 to this loop?
From Binary to Multi-Class: Output Codes Each binary classifier gives a yes or no answer for a single class E.g., should we apply an unroll factor of 4 to this loop? The output code of an input x is the concatenated results of all classifiers Each class also has a unique binary code (1 for that class s classifier, and 0 for all other classifiers) We assign x to the class with the most similar code (according to Hamming distance)