
Advanced Code Classification Using Heterogeneous Directed Hypergraph Neural Network
Explore the cutting-edge code classification techniques presented at the 35th International Conference on Software Engineering & Knowledge Engineering. In this research, a novel approach utilizing a Heterogeneous Directed Hypergraph Neural Network (HDHGN) over abstract syntax trees has been introduced for code classification, showcasing promising results in comparison to traditional methods. Dive into the details of how the HDHGN model leverages hypergraph structures to capture high-order data correlations and enhance code classification accuracy.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
The 35th International Conference on Software Engineering & Knowledge Engineering Heterogeneous Directed Hypergraph Neural Network over abstract syntax tree (AST) for Code Classification Guang Yang, Tiancheng Jin, Liang Dou School of Computer Science and Technology, East China Normal University
Contents: Introduction Methodology Evaluation Conclusion Code classification is to classify codes based on their functions. Two main categories of code classification methods: AST-based and GNN-based. Deep Neural Network
Contents: Introduction Methodology Evaluation Conclusion Issues Only take into the pairwise relationships. Ignore the possible high-order correlations between AST nodes. Consider Ignore
Contents: Introduction Methodology Evaluation Conclusion Hypergraph can encode high-order data correlations because hyperedge can link any number of vertices. General hypergraph only has one type of node and one type of edge, and its hyperedge is undirected. edge type information direction edge type information direction
Contents: Introduction Methodology Evaluation Conclusion Main Contributions We propose a heterogeneous directed hypergraph (HDHG) to represent AST. We propose a Heterogeneous Directed Hypergraph Neural Network (HDHGN) to generate vector representations for code classification. We assess our model on public datasets and compare it with previous SOTA AST-based and graph-based methods.
Contents: Introduction Methodology Evaluation Conclusion Overview
Contents: Introduction Methodology Evaluation Conclusion HDHG
Contents: Introduction Methodology Evaluation Conclusion HDHG
Contents: Introduction Methodology Evaluation Conclusion HDHG
Contents: Introduction Methodology Evaluation Conclusion Feature Initialization
Contents: Introduction Methodology Evaluation Conclusion HDHGConv Aggregating messages from nodes to hyperedges Aggregating messages from hyperedges to nodes
Contents: Introduction Methodology Evaluation Conclusion HDHGConv Aggregating messages from nodes to hyperedges Add direction information Use attention to aggregate nodes of each hyperedge Add edge type information
Contents: Introduction Methodology Evaluation Conclusion HDHGConv Aggregating messages from hyperedges to nodes Add direction information Use attention to aggregate hyperedges of each node Update hidden vector of each node
Contents: Introduction Methodology Evaluation Conclusion Classification Use attention to aggregate all nodes Use MLP to predict the results
Contents: Introduction Methodology Evaluation Conclusion Datasets We use Python800 and Java250 which are from Project CodeNet. We randomly split the dataset into the training set, validation set, and test set by 6:2:2.
Contents: Introduction Methodology Evaluation Conclusion Experiment settings Parsers from the official Python 3.8 ast library and javalang library Layer number: 4 Hidden vector dimension size and embedding vector dimension: 128 Multi-head attention Number of heads: 8 Cross-entropy loss function Adam optimizer Learning rate: 5 10 5 Dropout rate: 0.2 Epochs: 100 Batch size: 32 Repeat experiment 5 times
Contents: Introduction Methodology Evaluation Conclusion Results In Python800, our HDHGN is 2.88% higher than the best baseline. In Java250, our model is 2.47% higher than the best baseline.
Contents: Introduction Methodology Evaluation Conclusion Ablation study Removing hyperedge make the result decrease by 3.08%. Removing heterogeneous information make the result decrease by 2.64%. Removing direction make the result decrease by 2.38%.
Contents: Introduction Methodology Evaluation Conclusion We propose HDHG and HDHGN for code classification. Our methods outperforms the AST-based and GNN-based baselines in Python and Java datasets. Further ablation study proves the effectiveness of introducing high-order data correlations. Our code now is available on https://github.com/qiankunmu/HDHGN
THANKS Reporter:Guang Yang