METHODOLOGY FOR PARRYING FAILURES AND FAILURES IN A MULTI-MODULE COMPUTING SYSTEM BASED ON THE CREATION AND REPLICATION OF CHECKPOINTS
Abstract and keywords
Abstract (English):
Introduction: in order to enhance the efficiency of target information processing, it is necessary to adopt new approaches to the rapid detection and recovery from failures and faults to minimize the impact of such issues on the overall computing system. Purpose: to outline a technique for failure management and fault recovery in a multimodule computing system. This system implements periodic saving of calculations (checkpoints) and their exchange between all computing modules. Results: the problem of planning such a computing process has been outlined, including the determination of the optimal number and time points for creating checkpoints. The time points for creating checkpoints are determined based on the law of distribution of time points of computing module failures. Practical significance: the results of the simulation modelling calculations conducted as part of the proposed approach demonstrate the feasibility of implementing the proposed technique.

Keywords:
multi-module computing system, model of the computing process, checkpoint
References

1. Bondarenko A. A., Yakobovskiy M. V. Obespechenie otkazoustoychivosti vysokoproizvoditel'nyh vychisleniy s pomosch'yu lokal'nyh kontrol'nyh tochek // Vestnik Yuzhno-Ural'skogo gosudarstvennogo universiteta. Seriya «Vychislitel'naya matematika i informatika». 2014. T. 3, № 3. S. 20–36.

2. Polyakov A. Yu., Danekina A. A. Optimizaciya vremeni sozdaniya i ob'ema kontrol'nyh tochek vosstanovleniya parallel'nyh programm // Vestnik SibGUTI. 2010. № 2 (10). S. 87–100.

3. A Survey of Rollback-Recovery Protocols in Message-Passing Systems / E. N. Elnozahy, L. Alvisi, Y.-M. Wang, D. B. Johnson // ACM Computing Surveys. 2002. Vol. 34, Iss. 3. Pp. 375–408. DOI:https://doi.org/10.1145/568522.568525.

4. Metod otkazoustoychivoy parallel'noy obrabotki informacii v bortovyh vychislitel'nyh sistemah letatel'nyh apparatov na osnove vremennoy izbytochnosti vychislitel'nogo processa / A. G. Basyrov, S. S. Zykova, I. N. Koshel', V. V. Kuznecov // Aviakosmicheskoe priborostroenie. 2023. № 6. S. 33–39. DOI:https://doi.org/10.25791/aviakosmos. 6.2023.1345.

5. Zykova S. S. Model' i algoritm planirovaniya parallel'noy obrabotki informacii v otkazoustoychivoy bortovoy vychislitel'noy sisteme na osnove vremennoy izbytochnosti vychislitel'nogo processa // Intellektual'nye tehnologii na transporte. 2023. № 4 (36). S. 28–33. DOI:https://doi.org/10.24412/2413-2527-2023-436-28-33.

6. GOST R ISO/MEK 25010—2015. Informacionnye tehnologii. Sistemnaya i programmnaya inzheneriya. Trebovaniya i ocenka kachestva sistem i programmnogo obespecheniya (SQuaRE). Modeli kachestva sistem i programmnyh produktov = Information technology. Systems and software engineering. Systems and software Quality Requirements and Evaluation (SQuaRE). System and software quality models: nacional'nyy standart Rossiyskoy Federacii: utverzhden i vveden v deystvie prikazom Federal'nogo agentstva po tehnicheskomu regulirovaniyu i metrologii ot 29 maya 2015 goda № 464-st: data vvedeniya 2016-06-01. M.: Standartinform, 2015. 36 s.

7. Rathore N. Checkpointing: Fault Tolerance Mechanism // i-manager’s Journal on Cloud Computing. 2017. Vol. 4, No. 1. Pp. 28–35. DOI:https://doi.org/10.26634/jcc.4.1.13756.

8. Koren I., Mani Krishna C. Fault-Tolerant Systems. Second Edition. Cambridge (MA): Morgan Kaufmann Publishers, 2020. 416 p.

9. Elnozahy E. N., Plank J. S. Checkpointing for Peta-Scale Systems: A Look into the Future of Practical Rollback- Recovery // IEEE Transactions on Dependable and Secure Computing. 2004. Vol. 1, Iss. 2. Pp. 97–108. DOI: 10.1109/ TDSC.2004.15.

10. Optimal Checkpointing Period: Time vs. Energy / G. Aupy, A. Benoit, T. Hérault [et al.] // High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation (PMBS 2013): Revised Selected Papers of the 4th International Workshop (Denver, CO, USA, 18 November 2013). Lecture Notes in Computer Science. Vol. 8551. Cham: Springer International Publishing, 2013. Pp. 203–214. DOI:https://doi.org/10.1007/978-3-319-10214-6_10.

Login or Create
* Forgot password?