Advanced Code Classification Using Heterogeneous Directed Hypergraph Neural Network

the 35th international conference on software n.w
1 / 20
Embed
Share

Explore the cutting-edge code classification techniques presented at the 35th International Conference on Software Engineering & Knowledge Engineering. In this research, a novel approach utilizing a Heterogeneous Directed Hypergraph Neural Network (HDHGN) over abstract syntax trees has been introduced for code classification, showcasing promising results in comparison to traditional methods. Dive into the details of how the HDHGN model leverages hypergraph structures to capture high-order data correlations and enhance code classification accuracy.

  • Code Classification
  • Neural Network
  • Heterogeneous Graph
  • Hypergraph
  • Software Engineering

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. The 35th International Conference on Software Engineering & Knowledge Engineering Heterogeneous Directed Hypergraph Neural Network over abstract syntax tree (AST) for Code Classification Guang Yang, Tiancheng Jin, Liang Dou School of Computer Science and Technology, East China Normal University

  2. Contents: Introduction Methodology Evaluation Conclusion Code classification is to classify codes based on their functions. Two main categories of code classification methods: AST-based and GNN-based. Deep Neural Network

  3. Contents: Introduction Methodology Evaluation Conclusion Issues Only take into the pairwise relationships. Ignore the possible high-order correlations between AST nodes. Consider Ignore

  4. Contents: Introduction Methodology Evaluation Conclusion Hypergraph can encode high-order data correlations because hyperedge can link any number of vertices. General hypergraph only has one type of node and one type of edge, and its hyperedge is undirected. edge type information direction edge type information direction

  5. Contents: Introduction Methodology Evaluation Conclusion Main Contributions We propose a heterogeneous directed hypergraph (HDHG) to represent AST. We propose a Heterogeneous Directed Hypergraph Neural Network (HDHGN) to generate vector representations for code classification. We assess our model on public datasets and compare it with previous SOTA AST-based and graph-based methods.

  6. Contents: Introduction Methodology Evaluation Conclusion Overview

  7. Contents: Introduction Methodology Evaluation Conclusion HDHG

  8. Contents: Introduction Methodology Evaluation Conclusion HDHG

  9. Contents: Introduction Methodology Evaluation Conclusion HDHG

  10. Contents: Introduction Methodology Evaluation Conclusion Feature Initialization

  11. Contents: Introduction Methodology Evaluation Conclusion HDHGConv Aggregating messages from nodes to hyperedges Aggregating messages from hyperedges to nodes

  12. Contents: Introduction Methodology Evaluation Conclusion HDHGConv Aggregating messages from nodes to hyperedges Add direction information Use attention to aggregate nodes of each hyperedge Add edge type information

  13. Contents: Introduction Methodology Evaluation Conclusion HDHGConv Aggregating messages from hyperedges to nodes Add direction information Use attention to aggregate hyperedges of each node Update hidden vector of each node

  14. Contents: Introduction Methodology Evaluation Conclusion Classification Use attention to aggregate all nodes Use MLP to predict the results

  15. Contents: Introduction Methodology Evaluation Conclusion Datasets We use Python800 and Java250 which are from Project CodeNet. We randomly split the dataset into the training set, validation set, and test set by 6:2:2.

  16. Contents: Introduction Methodology Evaluation Conclusion Experiment settings Parsers from the official Python 3.8 ast library and javalang library Layer number: 4 Hidden vector dimension size and embedding vector dimension: 128 Multi-head attention Number of heads: 8 Cross-entropy loss function Adam optimizer Learning rate: 5 10 5 Dropout rate: 0.2 Epochs: 100 Batch size: 32 Repeat experiment 5 times

  17. Contents: Introduction Methodology Evaluation Conclusion Results In Python800, our HDHGN is 2.88% higher than the best baseline. In Java250, our model is 2.47% higher than the best baseline.

  18. Contents: Introduction Methodology Evaluation Conclusion Ablation study Removing hyperedge make the result decrease by 3.08%. Removing heterogeneous information make the result decrease by 2.64%. Removing direction make the result decrease by 2.38%.

  19. Contents: Introduction Methodology Evaluation Conclusion We propose HDHG and HDHGN for code classification. Our methods outperforms the AST-based and GNN-based baselines in Python and Java datasets. Further ablation study proves the effectiveness of introducing high-order data correlations. Our code now is available on https://github.com/qiankunmu/HDHGN

  20. THANKS Reporter:Guang Yang

Related


More Related Content