2023 |
Guerra-Manzanares, Alejandro; Bahsi, Hayretdin; Luckner, Marcin Springer Paris, 2023, ISSN: 22638733. Abstract | Links | BibTeX | Tags: Android, Concept drift, Machine learning, Malware detection, Mobile security, Permission @book{Guerra-Manzanares2023, title = {Leveraging the first line of defense: a study on the evolution and usage of android security permissions for enhanced android malware detection}, author = {Alejandro Guerra-Manzanares and Hayretdin Bahsi and Marcin Luckner}, url = {https://doi.org/10.1007/s11416-022-00432-3}, doi = {10.1007/s11416-022-00432-3}, issn = {22638733}, year = {2023}, date = {2023-01-01}, booktitle = {Journal of Computer Virology and Hacking Techniques}, volume = {19}, number = {1}, pages = {65--96}, publisher = {Springer Paris}, abstract = {Android security permissions are built-in security features that constrain what an app can do and access on the system, that is, its privileges. Permissions have been widely used for Android malware detection, mostly in combination with other relevant app attributes. The available set of permissions is dynamic, refined in every new Android OS version release. The refinement process adds new permissions and deprecates others. These changes directly impact the type and prevalence of permissions requested by malware and legitimate applications over time. Furthermore, malware trends and benign apps' inherent evolution influence their requested permissions. Therefore, the usage of these features in machine learning-based malware detection systems is prone to concept drift issues. Despite that, no previous study related to permissions has taken into account concept drift. In this study, we demonstrate that when concept drift is addressed, permissions can generate long-lasting and effective malware detection systems. Furthermore, the discriminatory capabilities of distinct set of features are tested. We found that the initial set of permissions, defined in Android 1.0 (API level 1), are sufficient to build an effective detection model, providing an average 0.93 F1 score in data that spans seven years. In addition, we explored and characterized permissions evolution using local and global interpretation methods. In this regard, the varying importance of individual permissions for malware and benign software recognition tasks over time are analyzed.}, keywords = {Android, Concept drift, Machine learning, Malware detection, Mobile security, Permission}, pubstate = {published}, tppubtype = {book} } Android security permissions are built-in security features that constrain what an app can do and access on the system, that is, its privileges. Permissions have been widely used for Android malware detection, mostly in combination with other relevant app attributes. The available set of permissions is dynamic, refined in every new Android OS version release. The refinement process adds new permissions and deprecates others. These changes directly impact the type and prevalence of permissions requested by malware and legitimate applications over time. Furthermore, malware trends and benign apps' inherent evolution influence their requested permissions. Therefore, the usage of these features in machine learning-based malware detection systems is prone to concept drift issues. Despite that, no previous study related to permissions has taken into account concept drift. In this study, we demonstrate that when concept drift is addressed, permissions can generate long-lasting and effective malware detection systems. Furthermore, the discriminatory capabilities of distinct set of features are tested. We found that the initial set of permissions, defined in Android 1.0 (API level 1), are sufficient to build an effective detection model, providing an average 0.93 F1 score in data that spans seven years. In addition, we explored and characterized permissions evolution using local and global interpretation methods. In this regard, the varying importance of individual permissions for malware and benign software recognition tasks over time are analyzed. |
2022 |
Guerra-Manzanares, Alejandro; Luckner, Marcin; Bahsi, Hayretdin Concept drift and cross-device behavior: Challenges and implications for effective android malware detection Journal Article Computers & Security, 120 , pp. 102757, 2022, ISSN: 0167-4048. Abstract | Links | BibTeX | Tags: Android, Android emulator, Concept drift, Malware detection, Mobile security, Real device, Smartphone @article{Guerra2022, title = {Concept drift and cross-device behavior: Challenges and implications for effective android malware detection}, author = {Alejandro Guerra-Manzanares and Marcin Luckner and Hayretdin Bahsi}, url = {https://www.sciencedirect.com/science/article/pii/S0167404822001523}, doi = {https://doi.org/10.1016/j.cose.2022.102757}, issn = {0167-4048}, year = {2022}, date = {2022-01-01}, journal = {Computers & Security}, volume = {120}, pages = {102757}, abstract = {The large body of Android malware research has demonstrated that machine learning methods can provide high performance for detecting Android malware. However, the vast majority of studies underestimate the evolving nature of the threat landscape, which requires the creation of a model life-cycle to ensure effective continuous detection in real-world settings over time. In this study, we modeled the concept drift issue of Android malware detection, encompassing the years between 2011 and 2018, using dynamic feature sets (i.e., system calls) derived from Android apps. The relevant studies in the literature have not focused on the timestamp selection approach and its critical impact on effective drift modeling. We evaluated and compared distinct timestamp alternatives. Our experimental results show that a widely used timestamp in the literature yields poor results over time and that enhanced concept drift handling is achieved when an app internal timestamp was used. Additionally, this study sheds light on the usage of distinct data sources and their impact on concept drift modeling. We identified that dynamic features obtained for individual apps from different data sources (i.e., emulator and real device) show significant differences that can distort the modeling results. Therefore, the data sources should be considered and their fusion preferably avoided while creating the training and testing data sets. Our analysis is supported using a global interpretation method to comprehend and characterize the evolution of Android apps throughout the years from a data source-related perspective.}, keywords = {Android, Android emulator, Concept drift, Malware detection, Mobile security, Real device, Smartphone}, pubstate = {published}, tppubtype = {article} } The large body of Android malware research has demonstrated that machine learning methods can provide high performance for detecting Android malware. However, the vast majority of studies underestimate the evolving nature of the threat landscape, which requires the creation of a model life-cycle to ensure effective continuous detection in real-world settings over time. In this study, we modeled the concept drift issue of Android malware detection, encompassing the years between 2011 and 2018, using dynamic feature sets (i.e., system calls) derived from Android apps. The relevant studies in the literature have not focused on the timestamp selection approach and its critical impact on effective drift modeling. We evaluated and compared distinct timestamp alternatives. Our experimental results show that a widely used timestamp in the literature yields poor results over time and that enhanced concept drift handling is achieved when an app internal timestamp was used. Additionally, this study sheds light on the usage of distinct data sources and their impact on concept drift modeling. We identified that dynamic features obtained for individual apps from different data sources (i.e., emulator and real device) show significant differences that can distort the modeling results. Therefore, the data sources should be considered and their fusion preferably avoided while creating the training and testing data sets. Our analysis is supported using a global interpretation method to comprehend and characterize the evolution of Android apps throughout the years from a data source-related perspective. |
Guerra-Manzanares, Alejandro; Luckner, Marcin; Bahsi, Hayretdin Android malware concept drift using system calls: Detection, characterization and challenges Journal Article Expert Systems with Applications, 206 , pp. 117200, 2022, ISSN: 0957-4174. Abstract | Links | BibTeX | Tags: Android malware, Concept drift, Malware behavior, Malware characterization, Malware detection, Malware evolution, Mobile malware, System calls @article{Guerra2022a, title = {Android malware concept drift using system calls: Detection, characterization and challenges}, author = {Alejandro Guerra-Manzanares and Marcin Luckner and Hayretdin Bahsi}, url = {https://www.sciencedirect.com/science/article/pii/S0957417422005863}, doi = {https://doi.org/10.1016/j.eswa.2022.117200}, issn = {0957-4174}, year = {2022}, date = {2022-01-01}, journal = {Expert Systems with Applications}, volume = {206}, pages = {117200}, abstract = {The majority of Android malware detection solutions have focused on the achievement of high performance in old and short snapshots of historical data, which makes them prone to lack the generalization and adaptation capabilities needed to discriminate effectively new malware trends in an extended time span. These approaches analyze the phenomenon from a stationary point of view, neglecting malware evolution and its degenerative impact on detection models as new data emerge, the so-called concept drift. This research proposes a novel method to detect and effectively address concept drift in Android malware detection and demonstrates the results in a seven-year-long data set. The proposed solution manages to keep high-performance metrics over a long period of time and minimizes model retraining efforts by using data sets belonging to short periods. Different timestamps are evaluated in the experimental setup and their impact on the detection performance is compared. Additionally, the characterization of concept drift in Android malware is performed by leveraging the inner workings of the proposed solution. In this regard, the discriminatory properties of the important features are analyzed at various time horizons.}, keywords = {Android malware, Concept drift, Malware behavior, Malware characterization, Malware detection, Malware evolution, Mobile malware, System calls}, pubstate = {published}, tppubtype = {article} } The majority of Android malware detection solutions have focused on the achievement of high performance in old and short snapshots of historical data, which makes them prone to lack the generalization and adaptation capabilities needed to discriminate effectively new malware trends in an extended time span. These approaches analyze the phenomenon from a stationary point of view, neglecting malware evolution and its degenerative impact on detection models as new data emerge, the so-called concept drift. This research proposes a novel method to detect and effectively address concept drift in Android malware detection and demonstrates the results in a seven-year-long data set. The proposed solution manages to keep high-performance metrics over a long period of time and minimizes model retraining efforts by using data sets belonging to short periods. Different timestamps are evaluated in the experimental setup and their impact on the detection performance is compared. Additionally, the characterization of concept drift in Android malware is performed by leveraging the inner workings of the proposed solution. In this regard, the discriminatory properties of the important features are analyzed at various time horizons. |
Publications
2023 |
Springer Paris, 2023, ISSN: 22638733. |
2022 |
Concept drift and cross-device behavior: Challenges and implications for effective android malware detection Journal Article Computers & Security, 120 , pp. 102757, 2022, ISSN: 0167-4048. |
Android malware concept drift using system calls: Detection, characterization and challenges Journal Article Expert Systems with Applications, 206 , pp. 117200, 2022, ISSN: 0957-4174. |