Improving the effectiveness and efficiency of dynamic malware analysis using machine learning

Date
2018
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
The malware threat landscape is constantly evolving, with upwards of one million new variants being released every day. Traditional approaches for detecting and classifying malware usually contain brittle handcrafted heuristics that quickly become outdated and can be exploited by nefarious actors. As a result, it is necessary to change the way software security is managed by using advanced analytics (i.e., machine learning) and significantly more automation to develop adaptable malware analysis engines that correctly identify, categorize, and characterize malware. ☐ In this dissertation, we introduce a next-generation sandbox that leverages machine learning to create an adaptive malware analysis platform. This intelligent environment considerably extends the capabilities of Cuckoo, an open-source malware analysis sandbox, and significantly optimizes the resources dedicated to the dynamic analysis of malware. ☐ Dynamic analysis allows security analysts to collect information about the behavior of malicious samples in an isolated environment. However, running malware in a sandbox is time-consuming and computationally expensive. This technique extracts information from malware without executing it and is orders of magnitude faster than dynamic analysis. Nevertheless, for some malware it may still be necessary to use dynamic-based features to produce better classifications and characterizations. ☐ With our system, we were successful in identifying the simplest characterizations required to accurately classify malware. This is an important feature because it allows us to determine the subset of samples that is truly different, and requires very expensive dynamic characterization. When dynamic analysis is imperative, our system also estimates the minimum amount of time required to accurately detect and classify malware. As a result, our intelligent analysis platform can reallocate the time saved to analyzing files that require longer execution times and produce actionable intelligence for our system. Finally, by leveraging the speed of static analysis, our system induces highly accurate machine learning models for malware capability detection, removing the need to perform dynamic analysis to identify high-level functionalities of malicious code.
Description
Keywords
Applied sciences, Dynamic analysis, Important capabilities, Machine learning, Malware classification, Malware detection, Static analysis
Citation