Open Access Open Access  Restricted Access Access granted  Restricted Access Subscription Access

No 4 (2024)

Cover Page

Full Issue

Open Access Open Access
Restricted Access Access granted
Restricted Access Subscription Access

Information processing and data analysis

Text Image Normalization Using Fast Hough Transform

Bezmaternykh P.V.

Abstract

The tasks of text image normalization arise simultaneously in several modules of the automatic document image recognition system. The paper presents a solution for two classical tasks of geometric normalization of a digital text image: compensation for the global document skew angle and slant elimination for its textual fragments. For both tasks, which differ in the type of geometric distortions, the solution is based on a single method of image analysis of the fast Hough transform. This method is specified and two algorithms for solving these problems are proposed, and they are tested: for the task of slant normalization – on a variety of both known dataset and on a specially collected and published dataset of Cyrillic fragments KRUS, for the task of document skew normalization – on the popular DISEC dataset. It is shown that a distinctive feature of the proposed method is high speed with the ability to process a large range of angles, and the method itself can be successfully applied in systems for automatic processing of document images.

Journal of information technologies and computing systems. 2024;(4):3-16
pages 3-16 views

Digital Twins and the Task of Ensuring Long(Term Keeping of Documents

Solovyev A.V.

Abstract

The article discusses the formulation of the problem of long-term keeping of documents and the use of digital twin technology. The problems and risks of ensuring the safety of documents, as well as the negative effects of destructive factors on stored documents, are considered. The problem of the safety of electronic documents is highlighted. An assumption is made about the possibility of using digital twin technologies to ensure long-term safety as part of the digital transformation of the economy and society. A formal formulation of the problem of ensuring long-term keeping of documents using digital twin technology is given. Prospects for further research to solve the problem are given.

Journal of information technologies and computing systems. 2024;(4):17-25
pages 17-25 views

Indices of States in Finite Dynamic Systems of Complete Graphs Orientations

Zharkova A.V.

Abstract

Graph models occupy an important place in tasks related to information security, including the construction of models and methods for managing the continuous operation of systems and system recovery, countering denials of service. Finite dynamic systems of complete graphs orientations are considered. States of a dynamic system are all possible orientations of a given complete graph, and evolutionary function transforms the given complete graph orientation by reversing all arcs that enter into sinks and there are no other differences between the given and the next digraphs. In this paper, the algorithm to calculate indices of system states is proposed. Namely, the index of the state is equal to 0 if it does not have a sink or its indegrees vector (a vector whose components are the degrees of entry of all its vertices located in descending order) contains all possible degrees of entry, otherwise its index is equal to the power of the largest set of consecutive degrees of entry, starting with the maximum possible degree, which is a subvector of its indegrees vector. As a consequence, the states with non-zero index belong to a basin with an attractor of length 1, whose generator state has a source and no sink. The maximal index of the states in the system is found. The corresponding tables are given for complete graphs with the number of vertices from 1 to 8 inclusive.

Journal of information technologies and computing systems. 2024;(4):26-31
pages 26-31 views

Application of Mathematical Programming for Selection the Optimal Structures of Multivariate Linear Regressions

Bazilevskiy M.P.

Abstract

In this article formulates the problem of simultaneous selection of both responses and explanatory variables in multivariate linear regressions. This problem is called «key responses and relevant features selection». The ordinary least squares method is used to estimate regressions. First, the problem of selecting a given number of key responses and relevant features by the criterion of the maximum sum of the regression determination coefficients was reduced to a mixed 0-1 integer linear programming problem. Then, restrictions on the signs of the estimates were introduced into it, which made it possible to select optimal structures of multivariate regressions. After that, restrictions on the absolute contributions of regressors to the overall determinations were added, which allows controlling the number of explanatory variables. When conducting computational experiments on real data with a fixed number of key responses, the time required to construct multivariate models using the proposed method was approximately 67.3 times less than the time required to construct them using the generating all subsets method. At the same time, tightening the restrictions on the absolute contributions of regressors further reduced the time required to solve problems.

Journal of information technologies and computing systems. 2024;(4):32-45
pages 32-45 views

Modeling Controlled Sparsity of Databases’ Entity-Object Subschemas

Rodionov A.N.

Abstract

In the process of creating information systems at the enterprise level, we always have to solve the same problem to design a subschema, in the tables of which the lists of objects will be placed. Ideally, there should be one object table in the database, which will be used in all associations in which the objects participate. However, a wide range of requirements for the structural organization of objects lists, such as compactness, modifiability, semantic unambiguous identifiability, and a number of others, force the objects table to be partitioned into many local tables to accommodate in the latter objects with similar properties and identical appointments. The paper calls into question the dogma of compactness, which leads to an avalanche-like increase in the number of object’s tables, and as an alternative, the concept of controlled sparsity is put forward. Following this concept allows database creators, at the discretion of the user, to combine into one object type the set of local compact types. We proposed the structural framework, which includes two interrelated blocks: the block of objects’ types and metablock. The later contains meta-types, meta-relationships, and meta-constraints that collectively provide and guarantee the correctness of objects’ data placed in the database.

Journal of information technologies and computing systems. 2024;(4):46-59
pages 46-59 views

On the Project of an Effective Software Platform for Working with Genetic Data of Respiratory Viruses

Mordvinov A.V., Stuchinsky A.V., Devyaterikov A.P., Khayrulin S.S., Palyanova N.V., Palyanov A.Y.

Abstract

Progress in sequencing technologies, i.e. reading the nucleotide sequences of living organisms, has led to a rapid growth of the amount of genetic data. The largest global projects that accumulate this information and provide online access to it are Genbank and GISAID. Also they provide basic capabilities for analyzing this data online, but they are quite limited. This significantly limits our abilities to effectively solve a number of scientific problems and tasks, so we decided to develop our own domestic (Russian) web platform with capabilities which we need. The main goal of this project is to provide a team of researchers with the opportunity to effectively solve problems in bioinformatics, virology and epidemiology, based on modern, effective, reasonably selected software solutions operating with high performance and providing many useful functionalities which can be extended by adding new necessary programs for analyzing and modeling. The web platform we are implementing will allow to download, store, search and analyze genomic sequences of viruses, such as influenza, SARS-CoV-2 and, in perspective, other viral pathogens. In addition, the project will develop and advance through efforts of IT part of our team taking into account actual needs of bioinformaticians and virologists. We plan to make it available to researchers around the world and periodically update both the software and the data (from open sources) to improve the convenience and efficiency for scientists working in the relevant areas.

Journal of information technologies and computing systems. 2024;(4):60-73
pages 60-73 views

Intelligent systems and technologies

Segmentation of Pulmonary Nodules on Computed Tomography Scans

Teplyakova A.R.

Abstract

The article describes a solution to the problem of automating the process of segmentation of pulmonary nodules on computed tomography scans to expand the functionality of the previously developed module for determining the size and volume of pulmonary nodules. The main focus of the article is on comparing the accuracy of the models with the ResU-Net, Attention U-Net and Dense U-Net architectures when training on computed tomography images from the LIDC-IDRI dataset in their original form and using two proposed three-channel approaches to their preprocessing. For the three architectures considered, the DSC and IoU values in the ranges 0.8570–0.8735 and 0.7545–0.7881 were achieved. The best metric values were demonstrated by models trained on three-channel images with averaging. In such images, the first channel is represented by a scan in its original form, the second by an averaged scan, and the third by a scan to which anisotropic diffuse filtration is applied. The obtained results allow us to conclude that the use of preprocessing methods is promising for improving the accuracy of segmentation. The article also describes the training of the lung lobes segmentation model using data from the TotalSegmentator dataset. The input data of the modified software module are computed tomography scans, and its output data are processed images and a structured report (DICOM SR). This report, in addition to data on the size and volume of pulmonary nodules, contains information on the lobes in which the detected nodules are located.

Journal of information technologies and computing systems. 2024;(4):74-83
pages 74-83 views

Analysis of the Possibilities of Reading Instrument Readings Using Machine Vision Algorithms

Shlyakhov M.V., Petrenko E.O.

Abstract

This paper examines methods and devices designed for reading and remote transmission of pointer instrument readings. The range of tasks solved using machine vision tools is considered, and their applicability to the task at hand is assessed. The use of a machine vision algorithm integrated into a mobile application for reading pointer instrument readings is proposed.

Journal of information technologies and computing systems. 2024;(4):84-90
pages 84-90 views

Algorithm for Estimating the Convergence of Stochastic Pareto Optimization

Beketov S.M., Gintciak A.M., Dergachev M.V.

Abstract

The research is devoted to the development of an algorithm for estimating the convergence of stochastic Pareto optimization. The relevance of the work is due to the need to reduce the computational costs that arise with large multi-criteria calculations, where it is necessary to take into account many conflicting criteria to find optimal solutions. One of the problems in this context is finding a compromise between the accuracy of the Pareto front and the resources needed to calculate it. In multicriteria optimization, it is important to evaluate convergence in order to avoid an excessive number of iterations, which may be ineffective in terms of improving the result. The problem lies in finding the optimal number of iterations, at which the Pareto front reaches sufficient accuracy, and further iterations do not lead to a significant improvement in the quality of solutions. The aim of the study is to develop an algorithm that allows us to evaluate the convergence of the Pareto front and determine when it is possible to complete the optimization process without losing the quality of solutions. The results can be useful for specialists involved in multi-criteria optimization tasks and the development of algorithms based on stochastic conditions.

Journal of information technologies and computing systems. 2024;(4):91-99
pages 91-99 views

Mathematical modeling

Methods for Digital Twins Synthesis Based on Digital Identification Models of Production Processes

Bakhtadze N.N., Konkov A.E., Elpashev D.V., Kushnarev V.N., Mukhtarov K.S., Purtov A.V., Pyatetsky V.E., Chereshko A.A.

Abstract

The paper presents an approach to the development of a new digital twin type. It offers to use closed-loop identifiers for generating point identification models based on associative knowledge. Procedures for calculating control actions in the conditions of possible abrupt changes of process operation modes are described.

Journal of information technologies and computing systems. 2024;(4):100-111
pages 100-111 views

Mathematical foundations of information technology

Heuristic Approaches to Constructing a Minimum Volume Ellipsoid around a Subset of Points

Shcherbakov P.S., Kvinto Y.I.

Abstract

The paper deals with the following essentially combinatorial problem: Given N points in n, compose the ellipsoid of minimum volume containing exactly N k points where k is much less than N. Six algorithms for an approximate solution of this problem are proposed; they are based on certain heuristic considerations. Under various assumptions on the mechanism of generating the points and their amount, the comparative efficiency of the algorithms was conducted and the results of numerical experiments were presented.

Journal of information technologies and computing systems. 2024;(4):112-122
pages 112-122 views

Согласие на обработку персональных данных с помощью сервиса «Яндекс.Метрика»

1. Я (далее – «Пользователь» или «Субъект персональных данных»), осуществляя использование сайта https://journals.rcsi.science/ (далее – «Сайт»), подтверждая свою полную дееспособность даю согласие на обработку персональных данных с использованием средств автоматизации Оператору - федеральному государственному бюджетному учреждению «Российский центр научной информации» (РЦНИ), далее – «Оператор», расположенному по адресу: 119991, г. Москва, Ленинский просп., д.32А, со следующими условиями.

2. Категории обрабатываемых данных: файлы «cookies» (куки-файлы). Файлы «cookie» – это небольшой текстовый файл, который веб-сервер может хранить в браузере Пользователя. Данные файлы веб-сервер загружает на устройство Пользователя при посещении им Сайта. При каждом следующем посещении Пользователем Сайта «cookie» файлы отправляются на Сайт Оператора. Данные файлы позволяют Сайту распознавать устройство Пользователя. Содержимое такого файла может как относиться, так и не относиться к персональным данным, в зависимости от того, содержит ли такой файл персональные данные или содержит обезличенные технические данные.

3. Цель обработки персональных данных: анализ пользовательской активности с помощью сервиса «Яндекс.Метрика».

4. Категории субъектов персональных данных: все Пользователи Сайта, которые дали согласие на обработку файлов «cookie».

5. Способы обработки: сбор, запись, систематизация, накопление, хранение, уточнение (обновление, изменение), извлечение, использование, передача (доступ, предоставление), блокирование, удаление, уничтожение персональных данных.

6. Срок обработки и хранения: до получения от Субъекта персональных данных требования о прекращении обработки/отзыва согласия.

7. Способ отзыва: заявление об отзыве в письменном виде путём его направления на адрес электронной почты Оператора: info@rcsi.science или путем письменного обращения по юридическому адресу: 119991, г. Москва, Ленинский просп., д.32А

8. Субъект персональных данных вправе запретить своему оборудованию прием этих данных или ограничить прием этих данных. При отказе от получения таких данных или при ограничении приема данных некоторые функции Сайта могут работать некорректно. Субъект персональных данных обязуется сам настроить свое оборудование таким способом, чтобы оно обеспечивало адекватный его желаниям режим работы и уровень защиты данных файлов «cookie», Оператор не предоставляет технологических и правовых консультаций на темы подобного характера.

9. Порядок уничтожения персональных данных при достижении цели их обработки или при наступлении иных законных оснований определяется Оператором в соответствии с законодательством Российской Федерации.

10. Я согласен/согласна квалифицировать в качестве своей простой электронной подписи под настоящим Согласием и под Политикой обработки персональных данных выполнение мною следующего действия на сайте: https://journals.rcsi.science/ нажатие мною на интерфейсе с текстом: «Сайт использует сервис «Яндекс.Метрика» (который использует файлы «cookie») на элемент с текстом «Принять и продолжить».