
Genetic Improvement for Software Systems at Stirling University
Explore the concept of Genetic Improvement for software systems through examples like Gen-o-fix and GI in Hadoop. Learn about the motivations, biological analogies, and system diagrams involved in this process. Understand how Gen-O-Fix operates on abstract syntax trees and the distribution of hashcodes. Discover some GP settings and the aim of improving existing systems with small changes.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Genetic Improvement CHORDS GROUP Stirling University John Woodward Saemundur Haraldsson 04/04/2025 John Woodward (Stirling) 1
Overview 1. Motivations 2. Examples 1. Gen-o-fix (time series) 2. HashCodes for Hadoop 3. Evolutionary Computation 4. Automatic Design vs. GI 3. Conclusions 04/04/2025 John Woodward (Stirling) 2
Motivation for GI 1. Aim is not to build programs from scratch. 2. Instead the aim is to take an already existing system and improve it (typically small changes). 3. This is done by operating on 1. source code (text file) 2. bytecode (executable file) 3. model of the code (e.g. GP function set) 4. abstract syntax trees 4. Biological Analogies (genetic modification, transplant) 5. Improve Functional/non-functional properties. 04/04/2025 John Woodward (Stirling) 3
System Diagram for Gen-O-Fix 04/04/2025 John Woodward (Stirling) 4
Gen-O-Fix: Abstract Syntax Trees Main features of framework are 1. Embedded adaptively. 2. Minimal end-user requirements. 1. Initial source code: location of Scala source code file containing a function 2. Fitness function: providing a means of evaluating the quality of system 3. Source to source transformations 4. Operates on ASTs (i.e. arbitrarily fine). 04/04/2025 John Woodward (Stirling) 5
Gen-O-Fix output 04/04/2025 John Woodward (Stirling) 6
GI Hadoop Hashcode 1. Hadoop provides a mapReduce implementation in Java. 2. Equals method has to obey contract (Reflective, Symmetric, Transitive, ) 3. x.equals(y) implies hashCode(x)== hashCode(y). 4. hashCode method is an integer function of a subset of an object's fields 04/04/2025 John Woodward (Stirling) 7
Some GP Settings 1. Terminal set is 1. Field values 2. Random integers [0, 100] 2. Function set is 1. {+, *, XOR, AND} 3. Fitness function: close to uniform distribution (uniform distribution is the ideal), over 10,000 instances. 04/04/2025 John Woodward (Stirling) 8
Distribution of Hashcodes 04/04/2025 John Woodward (Stirling) 9
Case Study 1 public int hashCode() { return layoutVersion ^ namespaceID ^ (int)(cTime ^ mostRecentCheckpointTxId ^ curSegmentTxId) ^ clusterID.hashCode() ^ blockpoolID.hashCode();} public int hashCode() { int a = (int)curSegmentTxId; int b = (int)mostRecentCheckpointTxId; return (a ^ (26 * (26 * (b & a)))); } 04/04/2025 John Woodward (Stirling) 10
Case Study 2 public int hashCode() { return this.sequenceNumber;} public int hashCode() { int a = renewer.hashCode(); int b = realUser.hashCode(); int c = owner.hashCode(); return (a ^ b) ^ (30 * (c * (a + b)));} 04/04/2025 John Woodward (Stirling) 11
Case Study 3 public int hashCode() { return new HashCodeBuilder().append(file).append(size ).append(type) .toHashCode();} public int HashCode() { int a = file.hashCode(); int b = (int)size; int c = type.hashCode(); return (71 * (a + (43 * (b + c))));} 04/04/2025 John Woodward (Stirling) 12
Improving Evolutionary Programming 1. Evolutionary Programming has a mutation operator at its core providing variation. 2. Many variations of this have been proposed manually by different researchers. 3. Mutation is therefore a suitable candidate for GI. 4. On a standard set of benchmark problem classes novel mutation operators are automatically designed which outperform human designed ones. 04/04/2025 John Woodward (Stirling) 13
Gaussian, Cauchy and GI 1. Typically manually designed distributions used are symmetric and with mean of zero (Levy family). (above right) 2. Automatically designed are free of these constraints. (below right, sample 3000) 04/04/2025 John Woodward (Stirling) 14
Comparing GI with Automatic Design of Algorithms 1. Three techniques operate directly on programs (source code, executable code, abstract syntax trees). In situ (software is its own substrate). 2. Automatic Design operates on a Genetic Programming Function Set. In vitro (a model of the software) 04/04/2025 John Woodward (Stirling) 15
Genetic Improvement (GI) and Automatic Design of Algorithms (ADA) 1. ADA and GI both produce human competitive programs 2. ADA and GI both typically apply evolutionary computation methodsto generate changes. 3. GI can produce syntactically incorrect and nonterminating programs while ADA generates syntactically correct terminating code. 4. ADA is always to improve functional properties while GI's focus has largely been non-functional properties. 5. GI is more general (applied to any software), ADA has it typically used for heuristics and machine learning and data mining algorithms. 04/04/2025 John Woodward (Stirling) 16
Activity & Profile 1. GECCO 2014 workshop on Automatic Design of Algorithms (including GI). 2. PPSN 2014 tutorial (mentioning GI) 3. One PhD student working on GI 4. Possible project with KLM. 5. Agilent Grant (Jerry Swan) 1. 'Spark' data analysis + ML framework, including: framework configuration, query optimization, data placement and memory usage. 04/04/2025 John Woodward (Stirling) 17
Conclusions 1. Modifications to programs are typically much smaller than the original program. 2. Notion of problem classes. 3. A population of edits/changes is maintained not populations of programs. 4. Pre-analysis allows one to target parts of the program for modification. 5. GI is a superset of numerical parameter tuning 6. GI and ADA are highly related. 04/04/2025 John Woodward (Stirling) 18