Russian Federation
Purpose: the study aims to compare two anomaly detection methods applied to large-scale datasets, the ensemble-based Isolation Forest and the neural network-based Autoencoder. Methods: the investigation entailed modelling and empirical assessment of the algorithms utilizing a genuine credit card transaction dataset. Standard performance metrics such as precision, recall, F1-score, ROC-AUC were used accompanied by a confusion matrix to explore the occurrences of false positives and overlooked anomalies. Results: the findings revealed that both models achieved elevated ROC-AUC scores, confirming their robustness in differentiating between typical and anomalous transactions. Practical significance: the proposed methods can be suitable for the automated supervision of transactional flows, the prevention of fraud, and the analysis of large datasets. The integration of Isolation Forest and Autoencoder in hybrid systems has demonstrated superior effectiveness, enhancing detection accuracy while minimizing the occurrence of false positives in anomaly detection.
large datasets, machine learning, anomalies, machine learning, neural network, Autoencoder, Isolation Forest, transaction data
1. Obzor metodov obnaruzheniya anomaliy v potokah dannyh / V. P. Shkodyrev, K. I. Yagafarov, V. A. Bashtovenko, E. E. Il'ina // Proceedings of the Second Conference on Software Engineering and Information Management (SEIM-2017), (Saint Petersburg, Russia, 21 April 2017). CEUR Workshop Proceedings. 2017. Vol. 1864. Pp. 50–56.
2. Analiz dannyh i processov: uchebnoe posobie / A. A. Barsegyan, M. S. Kupriyanov, I. I. Holod [i dr.]. 3-e izd., pererab. i dop. SPb.: BHV-Peterburg, 2009. 512 s.
3. Liu F. T., Ting K. M., Zhou Z.-H. Isolation Forest // Proceedings of the Eighth IEEE International Conference on Data Mining (Pisa, Italy, 15–19 December 2008). Institute of Electrical and Electronics Engineers, 2008. Pp. 413–422. DOI:https://doi.org/10.1109/ICDM.2008.17.
4. Scikit-learn: Machine Learning in Python. URL: http://scikit-learn.org (data obrascheniya: 18.10.2025).
5. Pandas: Python Data Analysis Library. URL: http://pandas.pydata.org (data obrascheniya: 18.10.2025).
6. NumPy v2.3 Documentation. URL: http://numpy.org/doc/2.3 (data obrascheniya: 18.10.2025).
7. SciPy v1.16.2 Documentation. URL: http://docs.scipy.org/doc/scipy (data obrascheniya: 18.10.2025).
8. PyOD V2 Documentation. URL: http://pyod.readthedocs.io (data obrascheniya: 18.10.2025).
9. Matplotlib: Visualization with Python. URL: http://matplotlib.org (data obrascheniya: 18.10.2025).
10. Seaborn: Statistical Data Visualization. URL: http://seaborn.pydata.org (data obrascheniya: 18.10.2025).
11. Plotly Open Source Graphing Library for Python. URL: http://plotly.com/python (data obrascheniya: 18.10.2025).
12. Goodfellow I., Bengio Y., Courville A. Autoencoders // Goodfellow I., Bengio Y., Courville A. Deep Learning. Cambridge (MA): MIT Press, 2016. Pp. 499–523.
13. Hinton G. E., Salakhutdinov R. R. // Science. 2006. Vol. 313, Iss. 5786. Pp. 504–507. DOI:https://doi.org/10.1126/science.112764.
14. Credit Card Fraud Detection: Anonymized Credit Card Transactions Labeled as Fraudulent or Genuine // Kaggle. URL: http://www.kaggle.com/datasets/mlg-ulb/creditcardfraud (data obrascheniya: 18.10.2025).
15. The Numenta Anomaly Benchmark // GitHub. URL: http://github.com/numenta/NAB (data obrascheniya: 18.10.2025).
16. Imbalanced-learn v0.14.0 Documentation. URL: http://imbalanced-learn.org (data obrascheniya: 18.10.2025).
17. Makshanov A. V., Zhuravlev A. E., Tyndykar' L. N. Bol'shie dannye. Big Data: uchebnik dlya vuzov. 4-e izd., ster. Sankt-Peterburg: Lan', 2024. 188 s.
18. Fel'dman E. V., Ruchay A. N., Cherbadzhi D. Yu. Model' vyyavleniya anomal'nyh bankovskih tranzakciy na osnove mashinnogo obucheniya // Vestnik UrFO. Bezopasnost' v informacionnoy sfere. 2021. № 1 (39). S. 27–35. DOI:https://doi.org/10.14529/secur210104.
19. Novelty and Outlier Detection — Scikit-learn 1.7.2 Documentation. URL: http://scikit-learn.org/stable/modules/ outlier_detection.html (data obrascheniya: 18.10.2025).



