
Dynamic Dataflow Analyses for Software Security
Explore the importance of dynamic dataflow analyses in detecting software errors and security vulnerabilities, along with examples highlighting buffer overflows and concurrency bugs. Learn how high-quality dynamic software analysis can uncover elusive bugs missed by other approaches, providing a layered solution for effective testing and validation in large populations.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia June 10, 2011
Software Errors Abound NIST: SW errors cost U.S. ~$60 billion/year as of 2002 FBI CCS: Security Issues $67 billion/year as of 2005 > from viruses, network intrusion, etc. CERT-Cataloged Vulnerabilities 9,000 6,000 3,000 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 (proj.) 2
Simple Buffer Overflow Example void foo() { int local_variables; int buffer[256]; buffer = read_input(); return; } Return address New Return address Local variables Bad Local variables Buffer Fill buffer buffer If read_input() reads 200 ints If read_input() reads >256 ints 3
Concurrency Bugs Also Matter Thread 1 mylen=small Thread 2 mylen=large Nov. 2010 OpenSSL Security Flaw if(ptr == NULL) { len=thread_local->mylen; ptr=malloc(len); memcpy(ptr, data, len); } 4
Concurrency Bugs Also Matter Thread 1 mylen=small Thread 2 mylen=large if(ptr==NULL) if(ptr==NULL) len2=thread_local->mylen; ptr=malloc(len2); TIME len1=thread_local->mylen; ptr=malloc(len1); memcpy(ptr, data1, len1) memcpy(ptr, data2, len2) ptr LEAKED 5
One Layer of a Solution High quality dynamic software analysis Find difficult bugs that other analyses miss Distribute Tests to Large Populations Low overhead or users get angry Accomplished by sampling the analyses Each user only tests part of the program 6
Dynamic Dataflow Analysis Associate meta-data with program values Propagate/Clear meta-data while executing Check meta-data for safety & correctness Forms dataflows of meta/shadow information 7
Example Dynamic Dataflow Analysis Data Input Meta-data Associate x = read_input() Clear validate(x) x = read_input() Propagate y = x * 1024 Check w Check w y = x * 1024 w = x + 42 a += y z = y * 75 Check z Check z a += y z = y * 75 Check a Check a 8
Distributed Dynamic Dataflow Analysis Split analysis across large populations Observe more runtime states Report problems developer never thought to test Instrumented Program Potential problems 9
Problem: DDAs are Slow Symbolic Execution 10-200x Taint Analysis (e.g.TaintCheck) 2-200x Data Race Detection (e.g. Helgrind) 2-300x Dynamic Bounds Checking 10-80x Memory Checking (e.g. Dr. Memory) 5-50x FP Accuracy Verification 100- 500x 10
One Solution: Sampling Lower overheads by skipping some analyses 100 Detection Accuracy (%) 75 Ideal 50 25 0 No Analysis Overhead Complete Analysis 11
Sampling Allows Distribution 100 Detection Accuracy (%) End Users Beta Testers 75 Many users testing at little overhead see more errors than one user at high overhead. Ideal 50 25 Developer 0 Overhead 12
Cannot Navely Sample Code Input validate(x) x = read_input() Validate(x) x = read_input() Skip Instr. False Positive y = x * 1024 w = x + 42 Check w Check w y = x * 1024 w = x + 42 a += y False Negative a += y z = y * 75 Check z Check z Skip Instr. Check a Check a 13
Sample Data, not Code Sampling must be aware of meta-data Remove meta-data from skipped dataflows Prevents false positives 14
Dataflow Sampling Example Input x = read_input() validate(x) Skip Dataflow x = read_input() y = x * 1024 Check w Check w y = x * 1024 w = x + 42 Skip Dataflow a += y False Negative a += y z = y * 75 Check z Check z Check a Check a 15
Mechanisms for Dataflow Sampling (1) Start with demand analysis Demand Analysis Tool Instrumented Application (e.g. Valgrind) Native Application No meta-data Shadow Value Detector Meta-data 16
Mechanisms for Dataflow Sampling (2) Remove dataflows if execution is too slow Sampling Analysis Tool Native Application Instrumented Application Application Instrumented OH Threshold Clear meta-data Shadow Value Detector Meta-data 17
Prototype Setup Taint analysis sampling system Network packets untrusted Xen-based demand analysis Whole-system analysis with modified QEMU Overhead Manager (OHM) is user-controlled Xen Hypervisor Admin VM OS and Applications Shadow Page Table Taint Analysis QEMU App App App Net Stack OHM Linux 18
Benchmarks Performance Network Throughput Example: ssh_receive Accuracy of Sampling Analysis Real-worldSecurity Exploits Name Apache Eggdrop Lynx ProFTPD Heap smashing attack on ProFTPD Server Squid Heap smashing attack on Squid proxy server Error Description Stack overflow in Apache Tomcat JK Connector Stack overflow in Eggdrop IRC bot Stack overflow in Lynx web browser 19
Performance of Dataflow Sampling (1) ssh_receive 25 Throughput with no analysis 20 Throughput (MB/s) 15 10 5 0 0 20 40 60 80 100 Maximum % Time in Analysis 20
Performance of Dataflow Sampling (2) netcat_receive 60 Throughput with no analysis 50 Throughput (MB/s) 40 30 20 10 0 0 20 40 60 80 100 Maximum % Time in Analysis 21
Performance of Dataflow Sampling (3) ssh_transmit Throughput with no analysis 25 20 Throughput (MB/s) 15 10 5 0 0 20 40 60 80 100 Maximum % Time in Analysis 22
Accuracy at Very Low Overhead Max time in analysis: 1% every 10 seconds Always stop analysis after threshold Lowest probability of detecting exploits Name Apache Eggdrop Lynx ProFTPD Squid Chance of Detecting Exploit 100% 100% 100% 100% 100% 23
Accuracy with Background Tasks netcat_receive running with benchmark 100 % Chance of Detecting Exploit Apache Eggdrop Lynx ProFTPD Squid 80 60 40 20 0.4 0.38 0 10% 25% Maximum Allowed Overhead % 50% 75% 90% 24
Accuracy with Background Tasks ssh_receive running in background 100 % Chance of Detecting Exploit Apache Eggdrop Lynx ProFTPD Squid 80 60 40 20 2 0.7 0.1 0 10% 25% Maximum % Time in Analysis 50% 75% 90% 25
Conclusion & Future Work Dynamic dataflow sampling gives users a knob to control accuracy vs. performance Better methods of sample choices Combine static information New types of sampling analysis 26
Outline Software Errors and Security Dynamic Dataflow Analysis Sampling and Distributed Analysis Prototype System Performance and Accuracy 28
Detecting Security Errors Static Analysis Analyze source, formal reasoning + Find all reachable, defined errors Intractable, requires expert input, no system state Dynamic Analysis Observe and test runtime state + Find deep errors as they happen Only along traversed path, very slow 29
Width Test 30