Parallel processing of very large databases using distributed column indexes


Cite item

Full Text

Open Access Open Access
Restricted Access Access granted
Restricted Access Subscription Access

Abstract

The development and investigation of efficient methods of parallel processing of very large databases using the columnar data representation designed for computer cluster is discussed. An approach that combines the advantages of relational and column-oriented DBMSs is proposed. A new type of distributed column indexes fragmented based on the domain-interval principle is introduced. The column indexes are auxiliary structures that are constantly stored in the distributed main memory of a computer cluster. To match the elements of a column index to the tuples of the original relation, surrogate keys are used. Resource hungry relational operations are performed on the corresponding column indexes rather than on the original relations of the database. As a result, a precomputation table is obtained. Using this table, the DBMS reconstructs the resulting relation. For basic relational operations on column indexes, methods for their parallel decomposition that do not require massive data exchanges between the processor nodes are proposed. This approach improves the class OLAP query performance by hundreds of times.

About the authors

E. V. Ivanova

South Ural State University

Author for correspondence.
Email: Elena.Ivanova@susu.ru
Russian Federation, Chelyabinsk, 454080

L. B. Sokolinsky

South Ural State University

Email: Elena.Ivanova@susu.ru
Russian Federation, Chelyabinsk, 454080


Copyright (c) 2017 Pleiades Publishing, Ltd.

This website uses cookies

You consent to our cookies if you continue to use our website.

About Cookies