Detecting Related Questions in Stack Overflow Using Semantic Matching

related questions detection model in stack n.w
1 / 22
Embed
Share

Explore a detection model for related questions in Stack Overflow based on semantic matching techniques. The approach aims to provide efficient suggestions to users by leveraging high-level semantics and interaction features. Previous methods, such as CNN and Soft-Cos SVM, are discussed in the context of questions relatedness prediction. Dive into the background, methods, and experimental evaluation of this innovative model.

  • Stack Overflow
  • Semantic Matching
  • Question Detection
  • Machine Learning
  • Relatedness Prediction

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Related Questions Detection Model in Stack Overflow based on Semantic Matching Shizhao Huang ,Yimin Wu , Jinwei Lu , Chao Deng South China University of Technology, Guangzhou, China Guangdong Ocean University at yangjiang, Yangjiang, China se hsz@mail.scut.edu.cn, csymwu@scut.edu.cn, cskilljl@mail.scut.edu.cn, dengchao@gdou.edu.cn South China University of Technology

  2. Outline Background Previous Methods Our Approach Experimental Evaluation Conclusion Related Questions Detection Model in Stack Overflow based on Semantic Matching

  3. Outline Background Previous Methods Our Approach Experimental Evaluation Conclusion Related Questions Detection Model in Stack Overflow based on Semantic Matching

  4. Background Providing related questions as suggestions in Stack Overflow. The answers provided in related questions can help users to solve their problems more efficiently. Motivation of the proposed work: The artificial recognition method is inefficient and time-consuming. Deep learning methods cannot sufficiently consider the semantic and interaction features of high-level semantics. 2025/5/28 4 Related Questions Detection Model in Stack Overflow based on Semantic Matching

  5. Outline Background Previous Methods Our Approach Experimental Evaluation Conclusion Related Questions Detection Model in Stack Overflow based on Semantic Matching

  6. Previous Methods Questions reletedness prediction methods CNN [1] adopts a convolutional neural network and calculate the semantic similarity of two contextual vectors by the cosine function. TuningSVM [8] uses word2vec to obtain word embedding and adopts differential evolution (DE) as its tuning algorithm. In their study, tuning SVM with parameter tuning runs much faster than the CNN model. Soft-Cos SVM [9] calculates the soft cosine similarity based on Simbowto measure the distance between knowledge unit pairs and adopt SVM as the final classifier. [1] B. Xu, D. Ye, Z. Xing, X. Xia, G. Chen, and S. Li, Predicting semantically linkable knowledge in developer online forums via convolutional neural network, in 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2016, pp. 51 62. [8] W. Fu and T. Menzies, Easy over hard: A case study on deep learning, in Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE), 2017, pp. 49 60. [9] B. Xu, A. Shirani, D. Lo, and M. A. Alipour, Prediction of relatedness in stack overflow: deep learning vs. svm: a reproducibility study, in Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), 2018, pp.1 10. Related Questions Detection Model in Stack Overflow based on Semantic Matching

  7. Previous Methods Questions reletedness prediction methods SOFTSVM [6] uses word2vec for word embedding of knowledge units, then calculates the cosine similarity, and finally predicts the semantic relatedness between knowledge units by SVM. DOTBILSTM [6] uses BiLSTM to extract contextual information in the knowledge units, then calculates the inner product of the contextual vectors. ASIM [3] adopts BiLSTM to encode local semantic information and obtain the interaction information between two knowledge units by soft attention mechanism. [3] J. Pei, Y. Wu, Z. Qin, Y. Cong and J. Guan, Attention-based model for predicting question relatedness on Stack Overflow, in 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), 2021, pp. 97-107. [6] S. Amirreza, X. Bowen, L. David, T. Solorio, and A. Alipour, Question relatedness on stack overflow: the task, dataset, and corpus-inspired models, in Proceedings of the AAAI Reasoning for Complex Question Answering Workshop, 2019. Related Questions Detection Model in Stack Overflow based on Semantic Matching

  8. Outline Background Previous Methods Our Approach Experimental Evaluation Conclusion Related Questions Detection Model in Stack Overflow based on Semantic Matching

  9. Our Approach - DMSO Framework Detection Model in Stack Overflow (DMSO) Input: a pair of questions (We use the pre- processed text sequences as the final input). Output: the relatedness types between questions (Duplicate, Direct, Indirect, Isolated). Related Questions Detection Model in Stack Overflow based on Semantic Matching

  10. Our Approach - DMSO Framework Word Embedding Layer Transform each word into a vector representing through pre-trained word embeddings. We use the title, body, and answers in Stack Overflow as the corpus, constructing a 300- dimensional word vector through word2vec. Related Questions Detection Model in Stack Overflow based on Semantic Matching

  11. Our Approach - DMSO Framework Word Encoding Layer We adopt BiLSTM to fuse local features into each word s original representation. The semantic understanding of natural language words depends on the context. This means that the meaning of the word depends not only on what has been read before but also on what will be read. Related Questions Detection Model in Stack Overflow based on Semantic Matching

  12. Our Approach - DMSO Framework Local Interaction Layer Previous approaches do not consider the interaction information between these two questions. We introduce a novel interaction method to improve the extraction of interaction features through constructing interaction attention and difference attention. Related Questions Detection Model in Stack Overflow based on Semantic Matching

  13. Our Approach - DMSO Framework Global Interaction Fusion Layer Previous methods only consider local interaction features and ignore the extraction of global features, so it is difficult to capture the semantic information and dependencies at the sentence level. We extract the semantic features and interaction features of sentences to represent the global interaction features in knowledge unit pairs through this module. Related Questions Detection Model in Stack Overflow based on Semantic Matching

  14. Our Approach - DMSO Framework Prediction Layer Use max-pooling to convert vectors into fixed-size vectors. Apply softmax function to obtain the probability of each class. Related Questions Detection Model in Stack Overflow based on Semantic Matching

  15. Outline Background Previous Methods Our Approach Experimental Evaluation Conclusion Related Questions Detection Model in Stack Overflow based on Semantic Matching

  16. Experimental Evalution Dataset Built by Shirani et al. [6], contains 347,372 pairs of knowledge units. Knowledge Units have four relatedness classes: Duplicate Train Dev Test Total Direct Duplicate 52,106 8,684 26,053 86,843 Direct 52,106 8,684 26,053 86,843 Indirect Indirect 52,106 8,684 26,053 86,843 Isolated Isolated 52,106 8,684 26,053 86,843 Total 208,424 34,736 104,212 347,372 [6] S. Amirreza, X. Bowen, L. David, T. Solorio, and A. Alipour, Question relatedness on stack overflow: the task, dataset, and corpus-inspired models, in Proceedings of the AAAI Reasoning for Complex Question Answering Workshop, 2019. Related Questions Detection Model in Stack Overflow based on Semantic Matching

  17. Experimental Evalution DMSO outperforms the previous methods in all four classes. Related Questions Detection Model in Stack Overflow based on Semantic Matching

  18. Experimental Evalution The Interaction Feature Extractor and the Global Interaction Fusion Layer plays an essential role in the performance improvement of DMSO. Related Questions Detection Model in Stack Overflow based on Semantic Matching

  19. Experimental Evalution Model Accuracy(%) Jcrd 72.91 The generalization ability of DMSO SVM-bas 70.25 Dataset: the AskUbuntu dataset SVM-adv 75.87 prepared by Rodrigues et al.[15]. CNN 74.50 In the AskUbuntu dataset, 24K DNN 78.65 DCNN 79.00 question pairs are used for training, 6K DOTBILSTM 87.00 for testing, and 1K for validation. The SOFTSVM 90.00 two classes in the AskUbuntu dataset ASIM 96.25 are balanced. DMSO 97.69 [15] J. Rodrigues, C. Saedi, V. Maraev, J. Silva, and A. Branco, Ways of asking and replying in duplicate question detection, in Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017), 2017, pp. 262 270. Related Questions Detection Model in Stack Overflow based on Semantic Matching

  20. Outline Background Previous Methods Our Approach Experimental Evaluation Conclusion Related Questions Detection Model in Stack Overflow based on Semantic Matching

  21. Conclusion Propose DMSO for questions relatedness detection task, which can capture the features of sentences to enhance the model s ability to extract interaction formation. The experiment results show that DMSO is effective in predicting question relatedness and outperforms baseline models used in previous works. Moreover, in the duplicate question detection task of AskUbuntu, DMSO can also achieve state-of-the-art performance, which proves its generalization ability. Related Questions Detection Model in Stack Overflow based on Semantic Matching

  22. Thanks! Related Questions Detection Model in Stack Overflow based on Semantic Matching

Related


More Related Content