AI Technical Test Specification Insights

fgai4h p 053 a01 n.w
1 / 13
Embed
Share

Discover best practices in AI testing, important test types, principles, levels, and more for creating effective AI technical test specifications. Dive into the background of software testing standards and explore strategies for achieving safety and quality in AI development.

  • AI
  • Testing
  • Specification
  • Best Practices
  • Software

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. FGAI4H-P-053-A01 Helsinki, 20-22 September 2022 Source: Editor DEL7.2 Title: Att.1 Presentation DEL7.2 Update: AI technical test specification Purpose: Discussion Contact: Auss Abbood, Robert Koch Institute, Berlin, Germany This PPT contains the current structure of the deliverable AI technical test specification. E-mail: abbooda@rki.de Abstract:

  2. Deliverable: AI Technical Test Specification Auss Abbood Robert Koch-Institute, Berlin, Germany Helsinki 22th September 2022

  3. Motivation What are best practices in AI testing that TGs can adapt? Which tests are specifically important for an assessment platform? AI Technical Test Specification - FG-AI4P 3

  4. Background Contains SOTA in testing as described by books, International Software Testing Qualification Board, the National Institute for Standards and Technology and ISO/IEC/IEEE standards A summary of commonly used terms and principles in software testing Already slightly filtered for our purpose AI Technical Test Specification - FG-AI4P 4

  5. Testing principles Testing shows presence of errors, not their absence Exhaustive testing is usually not possible. What are other ways to achieve safety? Testing early on. Even during model development Errors cluster together. In AI, discuss errors in teams of technical and subject matter experts Tests and test data need to be updated regularly. Pesticide paradox. Testing depends on purpose and environment of software Error-free does not equate to user satisfaction AI Technical Test Specification - FG-AI4P 5

  6. Test levels Unit/component testing Integration testing System testing AI Technical Test Specification - FG-AI4P 6

  7. Test types Test Type Explanation Tests what the system should do by specifying some precondition, running code and then compare the result of this execution with some postcondition. It is applied at each level of testing although in acceptance testing most implemented functions should already work. A measure of thoroughness of functional testing is coverage. Functional Testing Test how well a system performs. This includes testing of usability, performance efficiency, or security of a system and other characteristics found at ISO/IEC 25010. This test can be performed on all levels of test. Coverage for non- functional testing means how many of such characteristics were tested for. Non-functional Testing Tests the internal structure of a system or its implementation. Its is mostly tested in component and system testing. Coverage in this test measures the proportion of code components that have been tested as is part of component and White-box Testing Opposed to white-box testing, here we treat software as a black box with no knowledge on how software achieves its intended functionality. Merely the output of this form of testing is compared with the expected output or behaviour. The advantage of black-box testing is that no programming knowledge is required and therefore well equipped to detect biases that arise if only programmer write and test software. This test can be applied at all levels of testing. Tests changes of already delivered software for functional and non-functional quality characteristics. Black-box Testing Maintenance Testing Form of testing that does not execute code but manually examines the system, i.e., through reviews, linters, or formal proofs of the program. Static Testing Tests whether changes corrected (confirmation testing) or caused errors (regression testing). Change-related testing can be applied on all levels of testing. Change-related Testing This tests aims to make the software fail by proving unintended inputs which tests the robustness of the software. This can be applied on all levels of software testing. Destructive Testing AI Technical Test Specification - FG-AI4P 7

  8. AI testing No big difference: Cryptographic or scientific software usually also hard to test Base recipe Metrics Data/Benchmark Discriminatory (subject matter experts) Code (mostly dealt with through libraries) AI Technical Test Specification - FG-AI4P 8

  9. AI testing Metamorphic testing seems promising: Test coverage does not equate neuron activity Use other model to maximize neuron activity Create two sets of inputs (raw and modfied) with an expected change (pseudo oracle) Verify and validate AI Technical Test Specification - FG-AI4P 9

  10. ML ops as an addition Testing should appreciate connection between data, software, hardware, and AI over time: MLFlow, Argo, Docker, Sacred, DVC, etc. can help (or AWS, Google, with enough funding) device-specific properties of produced data BUT, not all forms of input can be tested (General Principles of Software Validation; Final Guidance for Industry and FDA Staff ). When is our testing done? AI Technical Test Specification - FG-AI4P 10

  11. ML ops as an addition AI Technical Test Specification - FG-AI4P 11

  12. Leaderboard probing Many questions overlap with other deliverables (e.g., 5 or 7.3) Inadequate input does not necessarily break the AI. How do we test this? Testing should include tests for biases, data leakage, etc. Leaderboard probing Data aggregation or missing data Vulnerable metrics Non weighted performance (hard vs. easy) Weight by best variables Adversarial validation Random search on hyperplane AI Technical Test Specification - FG-AI4P 12

  13. Outlook Receive more feedback (where to put more attention) Polish document Add more insights on ML ops Document remedies for leaderboard probing AI Technical Test Specification - FG-AI4P 13

More Related Content