<!DOCTYPE article
PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.4 20190208//EN"
       "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="1.4" xml:lang="en">
 <front>
  <journal-meta>
   <journal-id journal-id-type="publisher-id">Intellectual Technologies on Transport</journal-id>
   <journal-title-group>
    <journal-title xml:lang="en">Intellectual Technologies on Transport</journal-title>
    <trans-title-group xml:lang="ru">
     <trans-title>Интеллектуальные технологии на транспорте</trans-title>
    </trans-title-group>
   </journal-title-group>
   <issn publication-format="online">2413-2527</issn>
  </journal-meta>
  <article-meta>
   <article-id pub-id-type="publisher-id">105900</article-id>
   <article-id pub-id-type="doi">10.20295/2413-2527-2025-444-17-25</article-id>
   <article-id pub-id-type="edn">zcjukj</article-id>
   <article-categories>
    <subj-group subj-group-type="toc-heading" xml:lang="ru">
     <subject>ИСКУССТВЕННЫЙ ИНТЕЛЛЕКТ И ТРАНСПОРТНЫЕ СИСТЕМЫ</subject>
    </subj-group>
    <subj-group subj-group-type="toc-heading" xml:lang="en">
     <subject>ARTIFICIAL INTELLIGENCE AND TRANSPORT SYSTEMS</subject>
    </subj-group>
    <subj-group>
     <subject>ИСКУССТВЕННЫЙ ИНТЕЛЛЕКТ И ТРАНСПОРТНЫЕ СИСТЕМЫ</subject>
    </subj-group>
   </article-categories>
   <title-group>
    <article-title xml:lang="en">Anomaly Detection in Large-Scale Data Using Isolation Forest and Autoencoder</article-title>
    <trans-title-group xml:lang="ru">
     <trans-title>Выявление аномалий в масштабных данных с применением Isolation Forest и Autoencoder</trans-title>
    </trans-title-group>
   </title-group>
   <contrib-group content-type="authors">
    <contrib contrib-type="author">
     <name-alternatives>
      <name xml:lang="ru">
       <surname>Герасимов</surname>
       <given-names>Максим </given-names>
      </name>
      <name xml:lang="en">
       <surname>Gerasimov</surname>
       <given-names>Maksim </given-names>
      </name>
     </name-alternatives>
     <email>maxger60@gmail.com</email>
     <xref ref-type="aff" rid="aff-1"/>
    </contrib>
    <contrib contrib-type="author">
     <name-alternatives>
      <name xml:lang="ru">
       <surname>Забродин</surname>
       <given-names>Андрей Владимирович</given-names>
      </name>
      <name xml:lang="en">
       <surname>Zabrodin</surname>
       <given-names>Andrey Vladimirovich</given-names>
      </name>
     </name-alternatives>
     <email>zabrodin@pgups.ru</email>
     <bio xml:lang="ru">
      <p>кандидат исторических наук;</p>
     </bio>
     <bio xml:lang="en">
      <p>candidate of historical sciences;</p>
     </bio>
     <xref ref-type="aff" rid="aff-2"/>
    </contrib>
   </contrib-group>
   <aff-alternatives id="aff-1">
    <aff>
     <institution xml:lang="ru">Петербургский государственный университет путей сообщения Императора Александра I</institution>
     <country>Россия</country>
    </aff>
    <aff>
     <institution xml:lang="en">Emperor Alexander I St. Petersburg State Transport University</institution>
     <country>Russian Federation</country>
    </aff>
   </aff-alternatives>
   <aff-alternatives id="aff-2">
    <aff>
     <institution xml:lang="ru">Петербургский государственный университет путей сообщения Императора Александра I</institution>
     <city>Санкт-Петербург</city>
     <country>Россия</country>
    </aff>
    <aff>
     <institution xml:lang="en">Emperor Alexander I St. Petersburg State Transport University</institution>
     <city>St. Petersburg</city>
     <country>Russian Federation</country>
    </aff>
   </aff-alternatives>
   <pub-date publication-format="print" date-type="pub" iso-8601-date="2025-12-15T00:00:00+03:00">
    <day>15</day>
    <month>12</month>
    <year>2025</year>
   </pub-date>
   <pub-date publication-format="electronic" date-type="pub" iso-8601-date="2025-12-15T00:00:00+03:00">
    <day>15</day>
    <month>12</month>
    <year>2025</year>
   </pub-date>
   <issue>4</issue>
   <fpage>17</fpage>
   <lpage>25</lpage>
   <history>
    <date date-type="received" iso-8601-date="2025-10-29T00:00:00+03:00">
     <day>29</day>
     <month>10</month>
     <year>2025</year>
    </date>
    <date date-type="accepted" iso-8601-date="2025-11-22T00:00:00+03:00">
     <day>22</day>
     <month>11</month>
     <year>2025</year>
    </date>
   </history>
   <self-uri xlink:href="https://itt-pgups.ru/en/nauka/article/105900/view">https://itt-pgups.ru/en/nauka/article/105900/view</self-uri>
   <abstract xml:lang="ru">
    <p>Цель: сравнительный анализ двух методов обнаружения аномалий в больших массивах данных — ансамблевого алгоритма Isolation Forest и нейросетевого Autoencoder. Методы: проведено моделирование и экспериментальное сравнение алгоритмов на реальном датасете транзакций по кредитным картам. Использованы стандартные метрики эффективности (precision, recall, F1-score, ROC-AUC), а также матрица ошибок для анализа структуры ложных срабатываний и пропусков аномалий. Результаты: обе модели достигли высоких значений ROC-AUC, что подтверждает их способность надежно различать нормальные и аномальные транзакции. Практическая значимость: разработанные подходы применимы для автоматизированного мониторинга транзакционных потоков, предотвращения мошенничества и анализа больших данных. Наиболее эффективно комбинированное использование Isolation Forest и Autoencoder в гибридных системах, что позволяет повысить точность и снизить количество ложных тревог при обнаружении аномалий.</p>
   </abstract>
   <trans-abstract xml:lang="en">
    <p>Purpose: the study aims to compare two anomaly detection methods applied to large-scale datasets, the ensemble-based Isolation Forest and the neural network-based Autoencoder. Methods: the investigation entailed modelling and empirical assessment of the algorithms utilizing a genuine credit card transaction dataset. Standard performance metrics such as precision, recall, F1-score, ROC-AUC were used accompanied by a confusion matrix to explore the occurrences of false positives and overlooked anomalies. Results: the findings revealed that both models achieved elevated ROC-AUC scores, confirming their robustness in differentiating between typical and anomalous transactions. Practical significance: the proposed methods can be suitable for the automated supervision of transactional flows, the prevention of fraud, and the analysis of large datasets. The integration of Isolation Forest and Autoencoder in hybrid systems has demonstrated superior effectiveness, enhancing detection accuracy while minimizing the occurrence of false positives in anomaly detection.</p>
   </trans-abstract>
   <kwd-group xml:lang="ru">
    <kwd>большие данные</kwd>
    <kwd>аномалии</kwd>
    <kwd>машинное обучение</kwd>
    <kwd>нейронные сети</kwd>
    <kwd>Autoencoder</kwd>
    <kwd>Isolation Forest</kwd>
    <kwd>транзакционные данные</kwd>
   </kwd-group>
   <kwd-group xml:lang="en">
    <kwd>large datasets</kwd>
    <kwd>anomalies</kwd>
    <kwd>machine learning</kwd>
    <kwd>neural network</kwd>
    <kwd>Autoencoder</kwd>
    <kwd>Isolation Forest</kwd>
    <kwd>transaction data</kwd>
   </kwd-group>
  </article-meta>
 </front>
 <body>
  <p></p>
 </body>
 <back>
  <ref-list>
   <ref id="B1">
    <label>1.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">Обзор методов обнаружения аномалий в потоках данных / В. П. Шкодырев, К. И. Ягафаров, В. А. Баштовенко, Е. Э. Ильина // Proceedings of the Second Conference on Software Engineering and Information Management (SEIM-2017), (Saint Petersburg, Russia, 21 April 2017). CEUR Workshop Proceedings. 2017. Vol. 1864. Pp. 50–56.</mixed-citation>
     <mixed-citation xml:lang="en">Shkodyrev V. P., Yagafarov K. I., Bashtovenko V. A., Ilyina E. E. Obzor metodov obnaruzheniya anomaliy v potokakh dannykh [The Overview of Anomaly Detection Methods in Data Streams], Proceedings of the Second Conference on Software Engineering and Information Management (SEIM-2017), Saint Petersburg, Russia, April 21, 2017. CEUR Workshop Proceedings, 2017, Vol. 1864, Pp. 50–56. (In Russian)</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B2">
    <label>2.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">Анализ данных и процессов: учебное пособие / А. А. Барсегян, М. С. Куприянов, И. И. Холод [и др.]. 3-е изд., перераб. и доп. СПб.: БХВ-Петербург, 2009. 512 с.</mixed-citation>
     <mixed-citation xml:lang="en">Barsegyan A. A., Kupriyanov M. S., Kholod I. I., et al. Analiz dannykh i protsessov: uchebnoe posobie [Data and Process Analysis: A Tutorial]. Saint Petersburg, BHV-Peterburg Publishing House, 2009, 512 p. (In Russian)</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B3">
    <label>3.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">Liu F. T., Ting K. M., Zhou Z.-H. Isolation Forest // Proceedings of the Eighth IEEE International Conference on Data Mining (Pisa, Italy, 15–19 December 2008). Institute of Electrical and Electronics Engineers, 2008. Pp. 413-422. DOI: 10.1109/ICDM.2008.17.</mixed-citation>
     <mixed-citation xml:lang="en">Liu F. T., Ting K. M., Zhou Z.-H. Isolation Forest, Proceedings of the Eighth IEEE International Conference on Data Mining, Pisa, Italy, December 15–19, 2008. Institute of Electrical and Electronics Engineers, 2008, Pp. 413-422. DOI: 10.1109/ICDM.2008.17.</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B4">
    <label>4.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">Scikit-learn: Machine Learning in Python. URL: http://scikit-learn.org (дата обращения: 18.10.2025).</mixed-citation>
     <mixed-citation xml:lang="en">Scikit-learn: Machine Learning in Python. Available at: http://scikit-learn.org (accessed: October 18, 2025).</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B5">
    <label>5.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">Pandas: Python Data Analysis Library. URL: http://pandas.pydata.org (дата обращения: 18.10.2025).</mixed-citation>
     <mixed-citation xml:lang="en">Pandas: Python Data Analysis Library. Available at: http://pandas.pydata.org (accessed: October 18, 2025).</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B6">
    <label>6.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">NumPy v2.3 Documentation. URL: http://numpy.org/doc/2.3 (дата обращения: 18.10.2025).</mixed-citation>
     <mixed-citation xml:lang="en">NumPy v2.3 Documentation. Available at: http://numpy.org/doc/2.3 (accessed: October 18, 2025).</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B7">
    <label>7.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">SciPy v1.16.2 Documentation. URL: http://docs.scipy.org/doc/scipy (дата обращения: 18.10.2025).</mixed-citation>
     <mixed-citation xml:lang="en">SciPy v1.16.2 Documentation. Available at: http://docs.scipy.org/doc/scipy (accessed: October 18, 2025).</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B8">
    <label>8.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">PyOD V2 Documentation. URL: http://pyod.readthedocs.io (дата обращения: 18.10.2025).</mixed-citation>
     <mixed-citation xml:lang="en">PyOD V2 Documentation. Available at: http://pyod.readthedocs.io (accessed: October 18, 2025).</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B9">
    <label>9.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">Matplotlib: Visualization with Python. URL: http://matplotlib.org (дата обращения: 18.10.2025).</mixed-citation>
     <mixed-citation xml:lang="en">Matplotlib: Visualization with Python. Available at: http://matplotlib.org (accessed: October 18, 2025).</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B10">
    <label>10.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">Seaborn: Statistical Data Visualization. URL: http://seaborn.pydata.org (дата обращения: 18.10.2025).</mixed-citation>
     <mixed-citation xml:lang="en">Seaborn: Statistical Data Visualization. Available at: http://seaborn.pydata.org (accessed: October 18, 2025).</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B11">
    <label>11.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">Plotly Open Source Graphing Library for Python. URL: http://plotly.com/python (дата обращения: 18.10.2025).</mixed-citation>
     <mixed-citation xml:lang="en">Plotly Open Source Graphing Library for Python. Available at: http://plotly.com/python (accessed: October 18, 2025).</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B12">
    <label>12.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">Goodfellow I., Bengio Y., Courville A. Autoencoders // Goodfellow I., Bengio Y., Courville A. Deep Learning. Cambridge (MA): MIT Press, 2016. Pp. 499–523.</mixed-citation>
     <mixed-citation xml:lang="en">Goodfellow I., Bengio Y., Courville A. Autoencoders. In: Goodfellow I., Bengio Y., Courville A. Deep Learning. Cambridge (MA), MIT Press, 2016, Pp. 499–523.</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B13">
    <label>13.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">Hinton G. E., Salakhutdinov R. R. // Science. 2006. Vol. 313, Iss. 5786. Pp. 504–507. DOI: 10.1126/science.112764.</mixed-citation>
     <mixed-citation xml:lang="en">Hinton G. E., Salakhutdinov R. R., Science, 2006, Vol. 313, Iss. 5786, Pp. 504–507. DOI: 10.1126/science.112764.</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B14">
    <label>14.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">Credit Card Fraud Detection: Anonymized Credit Card Transactions Labeled as Fraudulent or Genuine // Kaggle. URL: http://www.kaggle.com/datasets/mlg-ulb/creditcardfraud (дата обращения: 18.10.2025).</mixed-citation>
     <mixed-citation xml:lang="en">Credit Card Fraud Detection: Anonymized Credit Card Transactions Labeled as Fraudulent or Genuine, Kaggle. Available at: http://www.kaggle.com/datasets/mlg-ulb/creditcardfraud (accessed: October 18, 2025).</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B15">
    <label>15.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">The Numenta Anomaly Benchmark // GitHub. URL: http://github.com/numenta/NAB (дата обращения: 18.10.2025).</mixed-citation>
     <mixed-citation xml:lang="en">The Numenta Anomaly Benchmark, GitHub. Available at: http://github.com/numenta/NAB (accessed: October 18, 2025).</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B16">
    <label>16.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">Imbalanced-learn v0.14.0 Documentation. URL: http://imbalanced-learn.org (дата обращения: 18.10.2025).</mixed-citation>
     <mixed-citation xml:lang="en">Imbalanced-learn v0.14.0 Documentation. Available at: http://imbalanced-learn.org (accessed: October 18, 2025).</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B17">
    <label>17.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">Макшанов А. В., Журавлев А. Е., Тындыкарь Л. Н. Большие данные. Big Data: учебник для вузов. 4-е изд., стер. Санкт-Петербург: Лань, 2024. 188 с.</mixed-citation>
     <mixed-citation xml:lang="en">Makshanov A. V., Zhuravlev A. E., Tyndykar L. N. Bolshie dannye. Big Data: uchebnik dlya vuzov [Big Data: a textbook for universities]. Saint Petersburg, LAN Publishing House, 2024, 188 p. (In Russian)</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B18">
    <label>18.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">Фельдман Е. В., Ручай А. Н., Чербаджи Д. Ю. Модель выявления аномальных банковских транзакций на основе машинного обучения // Вестник УрФО. Безопасность в информационной сфере. 2021. № 1 (39). С. 27–35. DOI: 10.14529/secur210104.</mixed-citation>
     <mixed-citation xml:lang="en">Feldman E. V., Ruchay A. N., Cherbadzhi D. Y. Model vyyavleniya anomalnykh bankovskikh tranzaktsiy na osnove mashinnogo obucheniya [Model for Detecting Abnormal Banking Transactions Based on Machine Learning], Vestnik UrFO. Bezopasnost v informatsionnoy sfere [Journal of the Ural Federal District. Information Security], 2021, No. 1 (39), Pp. 27–35. DOI: 10.14529/secur210104. (In Russian)</mixed-citation>
    </citation-alternatives>
   </ref>
   <ref id="B19">
    <label>19.</label>
    <citation-alternatives>
     <mixed-citation xml:lang="ru">Novelty and Outlier Detection — Scikit-learn 1.7.2 Documentation. URL: http://scikit-learn.org/stable/modules/outlier_detection.html (дата обращения: 18.10.2025).</mixed-citation>
     <mixed-citation xml:lang="en">Novelty and Outlier Detection — Scikit-learn 1.7.2 Documentation. Available at: http://scikit-learn.org/stable/modules/outlier_detection.html (accessed: October 18, 2025).</mixed-citation>
    </citation-alternatives>
   </ref>
  </ref-list>
 </back>
</article>
