Multi-Dimensional Analysis for Improved Ransomware Detection in Storage Devices

machine learning machine learning based n.w
1 / 22
Embed
Share

Explore a comprehensive study proposing a ransomware detection pipeline based on IO block operation for enhanced generalizability. The research covers evasion techniques, previous studies, and a detailed analysis of ransomware detection architecture across various setups.

  • Ransomware Detection
  • Storage Devices
  • Machine Learning
  • Cybersecurity
  • Security Analysis

Uploaded on | 5 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Machine Learning Machine Learning- -based Ransomware based Ransomware Detection in Storage devices Detection in Storage devices A Multi-Dimensional Analysis for improved generalizability Nicolas Reategui EPFL ACSAC 2024 Case Study Work done at

  2. 2 2 Introduction Introduction Benign or malicious process Multiple evasion techniques for OS-level defenses: Filesystem Obfuscation of static file analysis Semantic gap Only see trim, read and write operations Evasion of behavioral fingerprinting IO Need for multiple independent detection methods No information about files, process or application context We propose a ransomware detection pipeline based on IO block operation SSD Introduction

  3. 3 3 Our case Our case study study Previous studies on storage level ransomware detection Covers a wide range of practical setups Ensure generalizability while maintaining good performance Provide insights on relevant features Introduction [1] Hirano et. al. 2019 [2] Zhongyu et. al. 2024

  4. Ransomware detection architecture Ransomware detection architecture 4 4 Our study is based on setups (a) (b) Use case application on setup (c) IBM FlashSystem RHEL-based served with in-house device mapper kernel module: dm- entropy dm-entropy: inline entropy calculation on 4K blocks System-Tap: tracing and extracting information Setups: a) b) Linux-based ransomware detection in user space Hypervisor-based detection of ransomware on Guest OS Computation storage device with integrated detection capabilities Methods c)

  5. Ransomware detection architecture Ransomware detection architecture 5 5 Ransomware detection runs on the host Setup (a): vulnerable to privilege escalation Setup (b): additional isolation layer provided by QEMU/KVM Setups: a) b) Linux-based ransomware detection in user space Hypervisor-based detection of ransomware on Guest OS Computation storage device with integrated detection capabilities Methods c)

  6. Ransomware detection architecture Ransomware detection architecture 6 6 Strong isolation Feature extraction on hardware Inference engine runs within IBM Storage virtualized stack Ransomware detection can run on compromised systems Setups: a) b) Linux-based ransomware detection in user space Hypervisor-based detection of ransomware on Guest OS Computation storage device with integrated detection capabilities Methods c)

  7. Benign and Benign and ransomware workloads ransomware workloads 7 7 Govdocs1 [3] Garfinkel et. al. [4] Diamantopoulos et. al. .txt .pdf .csv .docx .jpg .png . Benign workloads: Automated file conversions using Govdocs1 [1] File compression using LZMA, ZIP and BZ2 Online transactional processing Online Transactional Processing (database systems) service Ransomware: writes queries Ransomware emulator [2] (training) Real ransomware samples (testing) database Methods

  8. 8 8 Feature Engineering Feature Engineering IOs dmentropy Primary features: Minimal information extracted by dmentropy: 4K entropy Logical Block Addresses (LBA) IO type (read, write, trim) Transfer size entropy LBA IO type transfer_size Feature extraction Computation of 79 lightweight features from primary features over a short window including: Average Mean absolute deviation Histograms Methods

  9. ML training ML training 9 9 Generated balanced train/test sets Assigned benign workloads to either set based on concurrency Ransomware samples were divided based on specific parameters from the emulator Test set Trained XGBoost models Results Train set

  10. Volume state Volume state generalizability generalizability 10 10 Device aging Simple method: Sequentially copied and removed files Resulted in large spread across available LBAs More sophisticated: Impressions [Agrawal et. al. 2009] Geriatrix [Kadekodi et. al. 2018] Motivation: Different device states impact IO features Aged devices have file fragmentation resulting scattered LBA XGBoost model trained on different volume states: 52% utilization 77% utilization Aged devices SSD Results

  11. Volume state Volume state generalizability generalizability 11 11 F1 scores on XFS traces Significant drop in performance on aged devices Driven by LBA-related features Need to explore multiple system configurations Results

  12. Filesystem Filesystem generalizability generalizability 12 12 F1 scores Strong drop in performance between Linux and Windows filesystems EXT4-trained models lead to predominance of entropy features Results

  13. Filesystem Filesystem generalizability generalizability 13 13 F1 scores Entropy histograms are the most important features Deployment: extract filesystem information by scanning initial block partitions Results

  14. Benign Benign workloads workloads generalizability generalizability 14 14 FPR on XFS traces Trained on file conversions and ZIP/LZMA compression Evaluated on OLTP traces using MySQL or PostgreSQL (w/wo database compression) Low PGSQL performance driven by similar read throughput distribution to ransomware Results

  15. Greedy Greedy feature feature removal removal 16 16 PGSQL Successively and greedily remove features Use feature importance to remove the last feature Safely remove up to 13 features without impacting baseline performance Results

  16. Copy Copy- -on on- -write write effects effects 17 17 Guest QEMU/KVM VMs use formats such as qcow2 that store image data in a single file and perform copy-on-write when data is updated QEMU raw / qcow2 RHEL Linux KVM Hardware Results

  17. Copy Copy- -on on- -write write effects effects 18 18 F1 scores from EXT4 traces Using qcow2 leads to different: LBA access patterns IO entropy compared to modifying the data directly on the device Results

  18. Real ransomware Real ransomware validation validation 20 20 Windows [1] Hirano et. al. 2019 Samples run inside an isolated Fedora/Windows VM Explored using raw and qcow2 format Linux FS1: results using our features FS2: results using Hirano et. al. [1] features Results

  19. Real ransomware validation Real ransomware validation 21 21 Runtime performance Deriving our set of features and running ML inference can be done within 200ms Increasing the throughput leads longer computing times due to larger number of IOs Results

  20. Conclusion Conclusion 22 22 Comprehensive training across a spectrum of disk utilizations enhances the model s resilience Incorporating filesystem specific training enhances performance Significant filesystem level changes, e.g. copy-on-write VMs or encryption, can seriously decrease performance Need to design and include of specialized workloads in training to allow for good generalizabilitiy Using expressive features, e.g. histograms, allow to better approximate complex decision boundaries : 12.7% higher median F1 scores 10.9% lower FNR 17.1% lower FPR Conclusion

  21. Acknowledgements Acknowledgements 23 23 This work was done at IBM Research Zurich together with: Dr. Roman Pletka, Senior Scientist at IBM Research Zurich Dr. Dionysios Diamantopoulos, Research Staff, IBM Research Zurich Dr. Haris Pozidis, Manager Infrastructure AIOPS, IBM Research Zurich A. L. Narasimha Reddy, Texas A&M University Acknowledgements

  22. Rerefences Rerefences 24 24 Manabu Hirano and Ryotaro Kobayashi. Machine learning based ransomware detection using storage access patterns obtained from live-forensic hypervisor. In 2019 sixth international conference on internet of things: Systems, Management and security (IOTSMS), 2019 1. Zhongyu Wang, Yaheng Song, Erci Xu, Haonan Wu,Guangxun Tong, Shizhuo Sun, Haoran Li, Jincheng Liu,Lijun Ding, Rong Liu, Jiaji Zhu, and Jiesheng Wu. Ransom access memories: Achieving practical ransomware protection in cloud with DeftPunk. In 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24), pages 687 702, Santa Clara, CA,July 2024. USENIX Association 2. Simson Garfinkel, Paul Farrell, Vassil Roussev, and George Dinolt. Bringing science to digital forensics with standardized forensic corpora. Digit. Investig.,6:S2-S11, Sep 2009. 3. Dionysios Diamantopoulos, Roman Pletka, Slavisa Sarafijanovic, A. L. Narasimha Reddy, and Haris Pozidis. Wannalaugh: A configurable ransomware emulator Learning to mimic malicious storage traces, Systor 2024. 4. Weidong Zhu, Grant Hernandez, Washington Garcia, Dave (Jing) Tian, Sara Rampazzi, and Kevin Butler. Minding the semantic gap for effective storage-based ransomware defense. In Proceedings of the 38th International Conference on Massive Storage Systems and Technology (MSST), Santa Clara, CA, June 2024 Andrea Continella, Alessandro Guagnelli, Giovanni Zingaro, Giulio De Pasquale, Alessandro Barenghi, Stefano Zanero, and Federico Maggi. Shieldfs: a self-healing, ransomware-aware filesystem. In Proceedings of the 32nd Annual Conference on Computer Security Applications, ACSAC 16, page 336 347, New York, NY, USA, 2016. Association for Computing Machinery Acknowledgements

Related


More Related Content