High performance computing developments.
TASC is a software solution developed at RCC MSU aimed to help any HPC user in detecting performance issues in their jobs, finding root causes and determining possible ways for their elimination.
At the moment a lot of supercomputing applications are inefficient in terms of the usage of available resources. To decrease the number of such inefficient applications, a tool for supercomputer task flow analysis and detection of inefficient application runs is needed. In this research machine learning methods are considered to solve this issue.
JobDigest - is a set of tools for supercomputer jobs analysis and analysis of supercomputer jobs flow. The system collects data about running and completed tasks, system monitoring data from nodes and stores them for further analysis. An user can access detailed report about every job, integral characteristics for the whole completed jobs, partitions usage statistics and users activity.
The goal of the Octotron project is to design an approach that guarantees the reliable autonomous operation of large supercomputing centers. The approach is based on a formal model of a supercomputer that describes the proper functioning of its components and their interconnections. The supercomputer compares continually its current state with the information in the model.
The development of Octoshell system was inspired by MSU Supercomputing Center administrators who faced a strong need in powerful tool to manage several supercomputers simultaneously.
The key features of the Octoshell system are:
Performance monitoring is a method to debug performance issues in different types of applications. It uses various performance metrics obtained from the servers the application runs on, and also may use metrics which are produced by the application itself.
AlgoWiki is an open encyclopedia of algorithms’ properties and features of their implementations on different hardware and software platforms from mobile to extreme scale, which allows for collaboration with the worldwide computing community on algorithm descriptions.