A ARCHITECTURE OF CNN-TRANSFORMER HYBRID WITH MASKED TIME SERIES AUTO-CODING FOR BEHAVIORAL BIOMETRICS ON MOBILE DEVICES

Authors

  • Mariia Havrylovych National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Department of Artificial Intelligence, Kyiv, Ukraine, Ukraine https://orcid.org/0000-0002-9797-2863

DOI:

https://doi.org/10.20535/kpisn.2025.4.344357

Keywords:

continuous authentication, masked autoencoder, smartphone sensors, CNN-Transformer, domain adaptation, behavioral biometrics

Abstract

Background. Continuous behavioral authentication (keystroke dynamics, touch/swipe, motion sensors) verifies identity without extra actions. However, models degrade under device, session and activity shifts, are sensitive to noise and often require significant labeling. As passwordless logins spread, demand rises for post-login risk control and for models that are robust, compute-efficient and stable in the wild.

Objective. To develop and empirically study a compact CNN-Transformer hybrid with lightweight self-supervised masked time-series autoencoding (MAE-style) for mobile behavioral biometrics on the HMOG and WISDM datasets.

Methods. A 1D-CNN front end extracts local cues from smartphone motion signals, while a Transformer encoder captures longer-range dependencies. We use masked reconstruction on unlabeled HMOG sessions for self-supervised pretraining under a limited computational budget and then fine-tune the same hybrid architecture for user identification. We evaluate three hybrid variants on HMOG (trained from scratch, with masked pretraining, and with masked pretraining plus CORAL domain adaptation) and three models on WISDM (a Transformer baseline, a hybrid trained from scratch and a hybrid initialized from the HMOG-pretrained weights). Performance is measured using user-level mean and median Equal Error Rate (EER) and AUC.

Results. On HMOG, the hybrid model trained from scratch achieves the best user-level metrics (EER 21.51% mean, 18.63% median; AUC 0.854 mean, 0.905 median), while the lightweight MAE and CORAL variants do not yet surpass this baseline. On WISDM, the hybrid model substantially outperforms a pure Transformer baseline (EER 9.41% vs 51.25% mean; AUC 0.902 vs 0.488 mean), and cross-dataset initialization from the HMOG MAE-pretrained weights provides an additional improvement (EER 8.42% mean, 2.07% median; AUC 0.907 mean, 0.959 median).

Conclusions. The results indicate that a compact CNN-Transformer hybrid is effective for sensor-based mobile behavioral biometrics and that even lightweight masked pretraining can be helpful for cross-dataset transfer. At the same time, the benefits of MAE and CORAL on HMOG depend strongly on the pretraining budget and masking configuration, suggesting that further tuning is needed to fully exploit self-supervised pretraining in this setting.

References

G. M. Weiss, K. Yoneda, and T. Hayajneh, “Smartphone and smartwatch-based biometrics using activities of daily living,” IEEE Access, vol. 7, pp. 133190–133202, 2019, doi: 10.1109/ACCESS.2019.2940729.

Z. Sitová, J. Šeděnka, Q. Yang, G. Peng, G. Zhou, P. Gasti, and K. S. Balagani, “HMOG: New behavioral biometric features for continuous authentication of smartphone users,” IEEE Transactions on Information Forensics and Security, vol. 11, no. 5, pp. 877–892, 2016, doi: 10.1109/TIFS.2015.2506542.

M. Abuhamad, T. Abuhmed, D. Mohaisen, and D. Nyang, “AUToSen: Deep-learning-based implicit continuous authentication using smartphone sensors,” IEEE Internet of Things Journal, vol. 7, no. 6, pp. 5008–5020, 2020, doi: 10.1109/JIOT.2020.2975779.

M. Havrylovych and V. Danylov, “Deep learning application in continuous authentication,” in Digital Ecosystems: Interconnecting Advanced Networks with AI Applications, A. Luntovskyy, Ed. Cham: Springer, 2024, pp. 644–667, Lecture Notes in Electrical Engineering, vol. 1198, doi: 10.1007/978-3-031-61221-3_31.

M. P. Havrylovych and V. Y. Danylov, “Research of autoencoder-based user biometric verification with motion patterns,” System Research and Information Technologies, no. 2, pp. 128–136, 2022, doi: 10.20535/SRIT.2308-8893.2022.2.10.

M. P. Havrylovych and V. Y. Danylov, “Research on hybrid transformer-based autoencoders for user biometric verification,” System Research and Information Technologies, no. 3, pp. 42–53, 2023, doi: 10.20535/SRIT.2308-8893.2023.3.03.

M. Havrylovych, V. Danylov, and A. Gozhyj, “Comparative analysis of using recurrent autoencoders for user biometric verification with wearable accelerometer,” in Proceedings of the 9th International Conference “Information Control Systems & Technologies” (ICST 2020), CEUR-WS, vol. 2711, 2020, pp. 358–370.

A. Alsultan and K. Warwick, “Keystroke dynamics authentication: A survey of free-text methods,” International Journal of Computer Science Issues, vol. 10, no. 4, pp. 1–10, 2013.

R. S. Ahmed, A. Wahab, M. Manno, M. Lukaszewski, D. Hou, and F. Hussain, “Keystroke dynamics: Concepts, techniques, and applications,” ACM Computing Surveys, vol. 57, no. 11, pp. 283:1–283:35, 2025, doi: 10.1145/3675583.

G. Stragapede, P. Delgado-Santos, R. Tolosana, R. Vera-Rodriguez, R. Guest, and A. Morales, “TypeFormer: Transformers for mobile keystroke biometrics,” Neural Computing and Applications, 2024, early access, doi: 10.1007/s00521-024-10140-2.

J. Kim, H.-D. Kim, and P. Kang, “Keystroke dynamics-based user authentication using freely typed text based on user-adaptive feature extraction and novelty detection,” Applied Soft Computing, vol. 62, pp. 1077–1087, 2018, doi: 10.1016/j.asoc.2017.09.045.

G. M. Weiss, “WISDM smartphone and smartwatch activity and biometrics dataset,” WISDM Lab, Fordham University, technical report, 2019. [Online]. Available: https://archive.ics.uci.edu/

Q. Liu, J. Ye, H. Liang, L. Sun, and B. Du, “TS-MAE: A masked autoencoder for time series representation learning,” Information Sciences, vol. 690, art. 121576, 2025, doi: 10.1016/j.ins.2024.121576.

Q. Wen, T. Zhou, C. Zhang, W. Lei, L. Yang, and H. Xiong, “Transformers in time series: A survey,” in Proceedings of the 32nd International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023, pp. 6778–6786, doi: 10.24963/ijcai.2023/759.

B. Sun and K. Saenko, “Deep CORAL: Correlation alignment for deep domain adaptation,” in Computer Vision – ECCV 2016 Workshops, 2016, pp. 443–450, doi: 10.1007/978-3-319-49409-8_35.

Downloads

Published

2025-12-29