Effective Combinatorial Testing for Data Mining Algorithms

applying combinatorial testing to data mining n.w
1 / 26
Embed
Share

Explore the effectiveness of applying combinatorial testing to data mining algorithms through experimental design, research questions, subject programs, datasets, test generation, metrics, and results analysis. Discover the impact of different datasets on test coverage and the correlation between branch coverage and fault detection.

  • Data Mining
  • Combinatorial Testing
  • Experimental Design
  • Test Coverage
  • Algorithms

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Applying Combinatorial Testing to Data Mining Algorithms Jaganmohan Chandrasekaran(UTA), Huadong Feng(UTA), Yu Lei(UTA), D. Richard Kuhn(NIST), Raghu Kacker(NIST) March 13, 2017

  2. Outline Introduction Experimental Design Research Questions Subject Programs Datasets Input Parameter Modeling Test Generation Metrics Experimental Results Impact of Datasets Branch Coverage Results of T-Way Testing Mutation Coverage Results of T-Way Testing Conclusion & Future Work

  3. Introduction Data Mining Algorithms Widely developed and used Large amounts of data as input Intensive and complex computing Combinatorial Testing(CT) Proven method for more effective software testing at lower cost How effective is Combinatorial Testing when applied to Data Mining Algorithms? 3

  4. Outline Introduction Experimental Design Research Questions Subject Programs Datasets Input Parameter Modeling Test Generation Metrics Experimental Results Impact of Datasets Branch Coverage Results of T-Way Testing Mutation Coverage Results of T-Way Testing Conclusion & Future Work

  5. Experimental Design Research Questions How effective is CT applied to data mining algorithms? How do different datasets impact test coverage? Is branch coverage a good indicator of fault detection effectiveness for data mining algorithms? 5

  6. Experimental Design Subject Programs Top 5 most influential data mining algorithms* C4.5, K-Means, SVM, Apriori, EM Implementations from WEKA CT tests are applied on the configuration options of the subject algorithms 6

  7. Experimental Design Datasets 51 bench marking datasets Datasets provided by WEKA, UC Irvine Machine Learning Repository Not all datasets are applicable to all algorithms 7

  8. Experimental Design 8

  9. Experimental Design Input Parameter Modeling(IPM) Applied on configuration options Equivalence partitioning base on domain knowledge Identify representative values of equivalence partitions Constrains 9

  10. Experimental Design Test Generation 1-way to 6-way positive tests Generated using ACTS with extend mode Negative 1-way test 10

  11. Experimental Design Metrics Branch Coverage by JaCoCo A free code coverage library for Java. Mutation Coverage by PIT Mutation testing tool developed by Henry Coles. 11

  12. Outline Introduction Experimental Design Research Questions Subject Programs Datasets Input Parameter Modeling Test Generation Metrics Experimental Results Impact of Datasets Branch Coverage Results of T-Way Testing Mutation Coverage Results of T-Way Testing Conclusion & Future Work

  13. Impact of Datasets 13

  14. Impact of Datasets 14

  15. Impact of Datasets Finding Larger datasets do not necessarily achieve higher branch coverage. In some cases, smaller datasets can achieve higher branch coverage than larger datasets. Implication: The size of a dataset is not a dominating factor for determining test effectiveness of a dataset. Other characteristics must be considered, e.g., the dataset structure, and the relationship between different data instances. It is possible to create small datasets that are effective for testing data mining algorithms. 15

  16. Branch Coverage of T-way Testing 16

  17. Branch Coverage of T-way Testing 17

  18. Branch Coverage of T-way Testing Finding: Branch coverage increases progressively slower as test strength increases. The coverage increase stops at a test strength that is relatively low. Implication: During CT, data mining algorithms display similar behavior as general software applications. CT has the potential to be effective for testing data mining algorithms. 18

  19. Mutation Coverage of T-way Testing 19

  20. Mutation Coverage of T-way Testing 20

  21. Branch Coverage of T-way Testing 21

  22. Mutation Coverage of T-way Testing 22

  23. Mutation Coverage of T-way Testing Finding: Higher branch coverage seems to imply higher mutation coverage, and vice versa. Implication: Branch coverage could be used as a good indicator of fault detection effectiveness for data mining algorithms, since mutation coverage is expensive to measure. 23

  24. Outline Introduction Experimental Design Research Questions Subject Programs Datasets Input Parameter Modeling Test Generation Metrics Experimental Results Impact of Datasets Branch Coverage Results of T-Way Testing Mutation Coverage Results of T-Way Testing Conclusion & Future Work 24

  25. Conclusion Larger datasets do not necessarily achieve higher test coverage than smaller datasets. Test coverage of CT test set increases progressively slower with respect to increase of test strength. Branch coverage correlates well with mutation coverage. 25

  26. Future Work Detailed Code Analysis Why some branches are not covered by our test cases? Apply CT to create or reduce datasets for data mining algorithms Further investigation and experiments on negative testing of data mining algorithms. 26

More Related Content