2023 |
Guerra-Manzanares, Alejandro; Bahsi, Hayretdin; Luckner, Marcin Springer Paris, 2023, ISSN: 22638733. Abstract | Links | BibTeX | Tags: Android, Concept drift, Machine learning, Malware detection, Mobile security, Permission @book{Guerra-Manzanares2023, title = {Leveraging the first line of defense: a study on the evolution and usage of android security permissions for enhanced android malware detection}, author = {Alejandro Guerra-Manzanares and Hayretdin Bahsi and Marcin Luckner}, url = {https://doi.org/10.1007/s11416-022-00432-3}, doi = {10.1007/s11416-022-00432-3}, issn = {22638733}, year = {2023}, date = {2023-01-01}, booktitle = {Journal of Computer Virology and Hacking Techniques}, volume = {19}, number = {1}, pages = {65--96}, publisher = {Springer Paris}, abstract = {Android security permissions are built-in security features that constrain what an app can do and access on the system, that is, its privileges. Permissions have been widely used for Android malware detection, mostly in combination with other relevant app attributes. The available set of permissions is dynamic, refined in every new Android OS version release. The refinement process adds new permissions and deprecates others. These changes directly impact the type and prevalence of permissions requested by malware and legitimate applications over time. Furthermore, malware trends and benign apps' inherent evolution influence their requested permissions. Therefore, the usage of these features in machine learning-based malware detection systems is prone to concept drift issues. Despite that, no previous study related to permissions has taken into account concept drift. In this study, we demonstrate that when concept drift is addressed, permissions can generate long-lasting and effective malware detection systems. Furthermore, the discriminatory capabilities of distinct set of features are tested. We found that the initial set of permissions, defined in Android 1.0 (API level 1), are sufficient to build an effective detection model, providing an average 0.93 F1 score in data that spans seven years. In addition, we explored and characterized permissions evolution using local and global interpretation methods. In this regard, the varying importance of individual permissions for malware and benign software recognition tasks over time are analyzed.}, keywords = {Android, Concept drift, Machine learning, Malware detection, Mobile security, Permission}, pubstate = {published}, tppubtype = {book} } Android security permissions are built-in security features that constrain what an app can do and access on the system, that is, its privileges. Permissions have been widely used for Android malware detection, mostly in combination with other relevant app attributes. The available set of permissions is dynamic, refined in every new Android OS version release. The refinement process adds new permissions and deprecates others. These changes directly impact the type and prevalence of permissions requested by malware and legitimate applications over time. Furthermore, malware trends and benign apps' inherent evolution influence their requested permissions. Therefore, the usage of these features in machine learning-based malware detection systems is prone to concept drift issues. Despite that, no previous study related to permissions has taken into account concept drift. In this study, we demonstrate that when concept drift is addressed, permissions can generate long-lasting and effective malware detection systems. Furthermore, the discriminatory capabilities of distinct set of features are tested. We found that the initial set of permissions, defined in Android 1.0 (API level 1), are sufficient to build an effective detection model, providing an average 0.93 F1 score in data that spans seven years. In addition, we explored and characterized permissions evolution using local and global interpretation methods. In this regard, the varying importance of individual permissions for malware and benign software recognition tasks over time are analyzed. |
Grzenda, Maciej; Kaźmierczak, Stanisław; Luckner, Marcin; Borowik, Grzegorz; Mańdziuk, Jacek Evaluation of machine learning methods for impostor detection in web applications Journal Article Expert Systems with Applications, 231 (August 2022), pp. 120736, 2023, ISSN: 09574174. Abstract | Links | BibTeX | Tags: Biometrics, Impostor detection, Keystroke dynamics, Machine learning, Multi-factor authentication, Supervised learning @article{Grzenda2023a, title = {Evaluation of machine learning methods for impostor detection in web applications}, author = {Maciej Grzenda and Stanisław Kaźmierczak and Marcin Luckner and Grzegorz Borowik and Jacek Mańdziuk}, url = {https://doi.org/10.1016/j.eswa.2023.120736}, doi = {10.1016/j.eswa.2023.120736}, issn = {09574174}, year = {2023}, date = {2023-01-01}, journal = {Expert Systems with Applications}, volume = {231}, number = {August 2022}, pages = {120736}, publisher = {Elsevier Ltd}, abstract = {Applying machine learning (ML) methods to multi-factor authentication is becoming increasingly popular. However, there is no comprehensive methodology to evaluate biometric systems based on machine learning in the literature. This paper proposes a general methodology for evaluation the ML-based systems for impostor recognition/detection using biometric traits. This includes creation of learning and testing sets with appropriate size balance (proportion) between these sets, selecting the number of instances coming from different users, evaluation of the influence of the impostors number on their detection rate, and the impact of the number of records representing user's behavior. In addition, we propose how the real data (possibly affected by account takeover attempts) could be used to extend the enrollment data to support the impostor detection. The proposed approach was used for a systematic comparison of an extensive set of ML and statistical methods. For some of them, the false acceptance rate (FAR) close to zero and false rejection rate (FRR) smaller than 0.05 in a supervised experiment were accomplished, proving the merit of certain ML-based approaches. Moreover, using the method proposed in the paper, a classifier trained on experimental data achieved FAR below 0.05 on the real-world data collected at an actual financial web page.}, keywords = {Biometrics, Impostor detection, Keystroke dynamics, Machine learning, Multi-factor authentication, Supervised learning}, pubstate = {published}, tppubtype = {article} } Applying machine learning (ML) methods to multi-factor authentication is becoming increasingly popular. However, there is no comprehensive methodology to evaluate biometric systems based on machine learning in the literature. This paper proposes a general methodology for evaluation the ML-based systems for impostor recognition/detection using biometric traits. This includes creation of learning and testing sets with appropriate size balance (proportion) between these sets, selecting the number of instances coming from different users, evaluation of the influence of the impostors number on their detection rate, and the impact of the number of records representing user's behavior. In addition, we propose how the real data (possibly affected by account takeover attempts) could be used to extend the enrollment data to support the impostor detection. The proposed approach was used for a systematic comparison of an extensive set of ML and statistical methods. For some of them, the false acceptance rate (FAR) close to zero and false rejection rate (FRR) smaller than 0.05 in a supervised experiment were accomplished, proving the merit of certain ML-based approaches. Moreover, using the method proposed in the paper, a classifier trained on experimental data achieved FAR below 0.05 on the real-world data collected at an actual financial web page. |
2019 |
Luckner, Marcin; Gad, Michal; Sobkowiak, Pawel Antyscam-Practical web spam classifier Journal Article International Journal of Electronics and Telecommunications, 65 (4), pp. 713–722, 2019, ISSN: 23001933. Abstract | Links | BibTeX | Tags: Automatic classification, Imbalanced sets classification, Machine learning, Spam detection, Web spam detection @article{Luckner2019b, title = {Antyscam-Practical web spam classifier}, author = {Marcin Luckner and Michal Gad and Pawel Sobkowiak}, doi = {10.24425/ijet.2019.130255}, issn = {23001933}, year = {2019}, date = {2019-01-01}, journal = {International Journal of Electronics and Telecommunications}, volume = {65}, number = {4}, pages = {713--722}, abstract = {To avoid of manipulating search engines results by web spam, anti spam system use machine learning techniques to detect spam. However, if the learning set for the system is out of date the quality of classification falls rapidly. We present the web spam recognition system that periodically refreshes the learning set to create an adequate classifier. A new classifier is trained exclusively on data collected during the last period. We have proved that such strategy is better than an incrementation of the learning set. The system solves the starting-up issues of lacks in learning set by minimisation of learning examples and utilization of external data sets. The system was tested on real data from the spam traps and common known web services: Quora, Reddit, and Stack Overflow. The test performed among ten months shows stability of the system and improvement of the results up to 60 percent at the end of the examined period.}, keywords = {Automatic classification, Imbalanced sets classification, Machine learning, Spam detection, Web spam detection}, pubstate = {published}, tppubtype = {article} } To avoid of manipulating search engines results by web spam, anti spam system use machine learning techniques to detect spam. However, if the learning set for the system is out of date the quality of classification falls rapidly. We present the web spam recognition system that periodically refreshes the learning set to create an adequate classifier. A new classifier is trained exclusively on data collected during the last period. We have proved that such strategy is better than an incrementation of the learning set. The system solves the starting-up issues of lacks in learning set by minimisation of learning examples and utilization of external data sets. The system was tested on real data from the spam traps and common known web services: Quora, Reddit, and Stack Overflow. The test performed among ten months shows stability of the system and improvement of the results up to 60 percent at the end of the examined period. |
Luckner, Marcin; Gad, Michal; Sobkowiak, Pawel Antyscam-Practical web spam classifier Journal Article International Journal of Electronics and Telecommunications, 65 (4), pp. 713–722, 2019, ISSN: 23001933. Abstract | Links | BibTeX | Tags: Automatic classification, Imbalanced sets classification, Machine learning, Spam detection, Web spam detection @article{Luckner2019c, title = {Antyscam-Practical web spam classifier}, author = {Marcin Luckner and Michal Gad and Pawel Sobkowiak}, doi = {10.24425/ijet.2019.130255}, issn = {23001933}, year = {2019}, date = {2019-01-01}, journal = {International Journal of Electronics and Telecommunications}, volume = {65}, number = {4}, pages = {713--722}, abstract = {To avoid of manipulating search engines results by web spam, anti spam system use machine learning techniques to detect spam. However, if the learning set for the system is out of date the quality of classification falls rapidly. We present the web spam recognition system that periodically refreshes the learning set to create an adequate classifier. A new classifier is trained exclusively on data collected during the last period. We have proved that such strategy is better than an incrementation of the learning set. The system solves the starting-up issues of lacks in learning set by minimisation of learning examples and utilization of external data sets. The system was tested on real data from the spam traps and common known web services: Quora, Reddit, and Stack Overflow. The test performed among ten months shows stability of the system and improvement of the results up to 60 percent at the end of the examined period.}, keywords = {Automatic classification, Imbalanced sets classification, Machine learning, Spam detection, Web spam detection}, pubstate = {published}, tppubtype = {article} } To avoid of manipulating search engines results by web spam, anti spam system use machine learning techniques to detect spam. However, if the learning set for the system is out of date the quality of classification falls rapidly. We present the web spam recognition system that periodically refreshes the learning set to create an adequate classifier. A new classifier is trained exclusively on data collected during the last period. We have proved that such strategy is better than an incrementation of the learning set. The system solves the starting-up issues of lacks in learning set by minimisation of learning examples and utilization of external data sets. The system was tested on real data from the spam traps and common known web services: Quora, Reddit, and Stack Overflow. The test performed among ten months shows stability of the system and improvement of the results up to 60 percent at the end of the examined period. |
Publications
2023 |
Springer Paris, 2023, ISSN: 22638733. |
Evaluation of machine learning methods for impostor detection in web applications Journal Article Expert Systems with Applications, 231 (August 2022), pp. 120736, 2023, ISSN: 09574174. |
2019 |
Antyscam-Practical web spam classifier Journal Article International Journal of Electronics and Telecommunications, 65 (4), pp. 713–722, 2019, ISSN: 23001933. |
Antyscam-Practical web spam classifier Journal Article International Journal of Electronics and Telecommunications, 65 (4), pp. 713–722, 2019, ISSN: 23001933. |