Development of a decision support system with the use of OLAP-technologies in the national terminological information environment
Автор: Rasim M. Alguliyev, Gulnara Ch. Nabibayova, Afruz M. Gurbanova
Журнал: International Journal of Modern Education and Computer Science @ijmecs
Статья в выпуске: 6 vol.11, 2019 года.
Бесплатный доступ
In order to improve analytical activities and providing support to decision makers in the field of terminology, the article proposes a decision support system developed based on the data warehouse technologies and online analytical processing OLAP. The architectural and technological model of the system, its integration with the terminological information system is presented. The role of this system in the National Terminology Information System is shown. The article also presents the structure of the data warehouse and the OLAP-cube developed on its basis.
Decision support system, data warehouse, OLAP, terminology information system, data marts, OLTP, hypercube, cube measurement, term
Короткий адрес: https://sciup.org/15016858
IDR: 15016858 | DOI: 10.5815/ijmecs.2019.06.05
Текст научной статьи Development of a decision support system with the use of OLAP-technologies in the national terminological information environment
Published Online June 2019 in MECS DOI: 10.5815/ijmecs.2019.06.05
Nowadays, terminology, which is the field of linguistics, is paid much attention. The presence of positive growth dynamics in the number of new terms is one of the characteristic features of the development of science, technics and technology. In this regard, the life cycle of terms, comprising the emergence of new terms, their use, and their transformation from terms into commonly used words, i.e., their determinization or even their inclusion to the category of archaisms, as well as semantic relations of terms, discovery and study of their equivalents in other languages, and their synonyms, etc. are of great interest for scientists and specialists in this field.
At present, due to scientific and technical progress, computerization and the rapid development of information technologies, their widespread use, and the modernization of various fields of activity are observed. In this regard, information technology is an integral part of any sphere of professional activity, including terminology, in which developing terminology information systems play an important role.
The article [1] presents the concept and architectural-technological model of the National Terminology Information System (NTIS) with a detailed description of all its functional subsystems.
Note that one of the areas of terminological activity is the terms inventory, i.e. collection and description of terms, date, place and language of their origin, the area they are related to, etc., since this information is important for scientists-terminologists for the analysis of the situation, predicting and identifying trends in order to make analytical decisions.
The article proposes a decision support system (DSS) developed based on the Online Analytical Processing (OLAP) and data warehouse (DW), which is predicted to provide significant support to decision makers (DM) and to improve analytical work in this area. The integration of the proposed DSS with the NTIS is shown as an integral part of the functions and functional subsystem of the Terminological Registry .
The article also presents the approximate structure of DSS DW, including the OLAP-cube built on its basis. The article also introduces the integration of the proposed DSS into the NTIS as a part of the functional subsystem
Terminological registry . Moreover, it presents a sample structure of the DW DSS, an OLAP-cube built on its basis, and the use of OLAP-cube technology for data analysis.
The article [2] addresses the problem of conceptual data modeling used in multi-dimensional analysis. The core concept of the model is a multidimensional aggregation cube (MAC), which gives a broad and flexible definition of the concept of a multidimensional cube. The main contribution of the authors is the definition of the basic concepts of the proposed model, although the set of requirements and the evaluation of all relevant models represent an additional output compared to these requirements.
The article [3] proposes a method of conceptual OLAP modeling of the subject field. It is shown that the use of conceptual OLAP-model allows increasing the efficiency of online analytical processing of multidimensional data. A conceptual analytical model of scientific activity of an organization based on the proposed method is built.
The article [4] explored and proposes a fairly universal updated approach to OLAP cube modeling based on modern ontological and systemic representations in this field of scientific knowledge.
The authors of the article [5] introduces a conceptual multidimensional data model that facilitates even complex constructions based on multidimensional data elements, such as dimensions, measures (indicators) and cells. The data model is closely related to mathematics. It uses a set of new mathematical concepts, namely the H-set, to define its multidimensional components, such as dimensions, measures, and meta-cubes. Based on these concepts, the data model is capable to represent and capture the natural hierarchical relationships between dimension members.
-
II. Integration Of Olap- And Hd-Based Dss with
the Terminology Information System
DSS is a computer system that enables decision makers to authenticate their choice based on the analytically confirmed recommendations provided to them [6,7]. The main purpose of using DSS is to extract useful for decision makers information out of the available data. The implementation of the collection, consolidation and subsequent processing and analysis of received information are aimed at improving the efficiency of the activities of the organization due to the use of modern information technologies and the creation of a single information space, its analysis and forecasting.
DSS should meet the following basic requirements:
-
• DSS architecture will provide openness for new applications, and the expansion of the set of functions performed, and the introduction of new data;
-
• implementation of information requests to the DSS will be realized in an extremely short time, ensuring the possibility of operational analysis and decision-making;
-
• DSS will be designed so that users who do not have special knowledge in the field of ICT will have the opportunity to work with it, since most of DMs are not experts in this field.
DSS can be developed based on various technologies, such as HD and OLAP.
HD is a set of decision support technologies designed to enable executives, managers and analysts to achieve more efficient and fast decisions. B. Inmon defines DW as “a subject-oriented, integrated, uncorrectable, timedependent set of data used mainly for decision-making in an organization” [8].
DW supports online analytic processing, that is, OLAP, the functional and performance requirements of which significantly differ from those for online transaction processing (OLTP), which traditionally support operational databases. Unlike OLAP, OLTP applications typically automate routine data processing tasks. These tasks are structured, repetitive, and consist of short and isolated transactions. Obviously, historical, aggregated and consolidated data, on which DW is built, is more important for decisions support rather than the detailed individual records that constitute the basis for OLTP. Note that the amount of DW of organizations may range from hundreds of gigabytes to terabytes.
OLAP (Online Analytical Processing) performs multidimensional data analysis and provides opportunities for complex calculations, trend analysis and complex data modeling. It is the basis for performance management, scheduling, budgeting, forecasting, financial and general reporting, analysis, etc. OLAP enables the end users to perform data analysis in several dimensions, thereby providing the information necessary to make effective decisions.
Data warehouse technologies have been successfully implemented in many industries, which may include logistics (for shipping and customer support), retail trade (for user identification and stock management), financial services (claims analysis, risk analysis, credit card analysis, and detection of various inconsistencies), transport (for car parks management), healthcare (for results analysis), etc. The use of OLAP is possible when it is necessary to take analytical decisions in various spheres of human activity, such as healthcare [17], public administration [18], etc. However, DW and OLAP technologies have not been applied in the field of terminology for term data analysis and their control.
A study of available information terminology systems has revealed several disadvantages. Despite the presence of important functions such as the establishment of semantic relations, discovery of new terms in texts through special program-agents, discussion of new terms in forums, etc., they have not fully monitored the terms. They also fail a comprehensive terms inventory that can provide significant support for DM when making analytical decisions.
Figure 1 shows the architectural and technological model of DSS for NTIS based on a three-level DW model [19].
The core of the architecture of any DW is its typical edition. However, when developing a DW for a particular organization or for solving a particular analytical task, it has its own characteristics.

Decision making process
SOURCES OF TERM DATA
Term extraction from texts
Forums
Adopted by scientists and specialists
Fig.1. Architectural and technological model of DSS for NTIS
The first level of the proposed DSS represents various sources of these terms, which are represented in the NTIS [1]. These may include the terms found as a result of automatic text processing and extracted from them using special program-agents, obtained as a result of ongoing discussions in forums, and adopted by a group of scientists and specialists in a certain field, etc. To ensure high quality data before they are included to DW, data cleaning may be required. Therefore, at the intermediate level, when transiting to the second level, the data enter the data cleaning area. In addition, the intermediate stage performs ETL (Extraction, Transformation, Loading).
ETL is a technology that converts heterogeneous data into consistent data useful for the use in a decision support process [20].
The second level of DSS is DW. DW stores both current and historical data. Historical data regarding the terms are the determinized terms and those that have shifted into the category of archaisms. It should be noted that historical data storage is the foremost advantage of using DW in NTIS, since their presence expands the possibilities for generating analytical reports and, consequently, for performing analysis.
Data marts (DM) are distinguished from DW, which represent the third level of DW. Each DM includes the data oriented at solving related tasks. In the current task, DMs focus on compiling the explanatory dictionaries (in certain fields of activity), translation dictionaries (azerus., aze-eng., etc.) and dictionaries of synonyms, etc. In this regard, DW and, consequently, the DMs built on its basis, should have all the necessary data.
The fourth level is OLAP, which is a key component of DW in DSS. OLAP provides the reports for analysis on DW queries.
To identify the location of the DSS in the NTIS, we review one of its functional subsystems - Terminology Registry . Terminology Registry consists of three segments: corpus of terminology dictionaries (CTD), terminology database (TDB) and backup copy system (BCS) [1]. Their functions are as follows:
-
• CTD supports and systematizes all terminology dictionaries and other materials (books).
-
• TDB stores electronic forms of terminology dictionaries and provides user requests.
-
• BCS stores backup copies of electronic resources.
NTIS, as noted above, is an open system and accordingly, TDB is constantly updated from various sources. Changing the composition of TDB leads to the need to change electronic dictionaries. This is enabled with the application of DW and OLAP, with the help of which dictionaries are automatically formed in real time based on DMs extracted from the current contents of DW. Figure 2 shows the structure of terminology registry of NTIS, in which TDB is integrated with DSS as a DW, with OLAP included in it.

Fig.2. The structure of the terminology registry of NTIS
It should be noted that when allocating DMs from DW, each DM has its own OLAP-cube, i.e. OLAP model built in DSS will be poly-cubic (Fig. 3).

Fig.3. Poly-cubic OLAP model
A formal presentation of this model is provided below.
Assume that D = {d ,d ,...d } is a set of dimensions of the hypercube, Md = {m1 ,m2 ,..., mk }, i =1,n - a set of members of dimension d , k - a number of members in the i -th dimension, M = Mdt UMd2 U...UMd - a set of members of the hypercube, H(D,M) - a hypercube of data containing the set of cells corresponding to the sets D,M , and V(H) - a set of dimensions of the hypercube H(D,M) .
In order to build a polycubic model based on DMs distinguished from DW, n number of cubes is extracted from the hypercube H ( D , M ) . H ( D , M ) , H ( D 2, M 2 ) , ... , Hn ( D , Mn ) are the subsets of the data hypercube corresponding to the sets of fixed values ( D i , M i ) , ( D 2 , M 2) , ..., ( D n , M n ) accordingly, { V ( H ) U V 2 ( H ) U ... U V „ ( H )} is a set of dimensions of the polycubic model. It should be noted that, in general, the same dimensions, and, consequently, the same members are used in different sub-cubes, i.e. for D and M , where i = 1, n , the following conditions are provided:
D k ( H ) П D i ( H ) * 0 , where k = 1, n and l = 1, n .
M ( h ) П M l ( H ) # 0 , Where к = 1, n and l = 1, n .
-
(1) In accordance with (1) and (2), for Vt , where i = 1, n ,
the following condition is provided:
V k ( H ) П V i ( H ) ^ 0 , (3)
where k = 1, n and l = 1, n .
That is, the same dimensions may correspond to different cubes.

Fig.4. The structure of DW of DSS
As it is known, the elements of the data storage structure are Fact Table and Dimension Tables [11]. Figure 4 shows an example of the structure of DW of DSS for NTIS. In the given DW, the fact is the appearance of a new term (Term-app_Fact), and the dimensions are the date (Date_Dim), the source of the term (Sourse_Dim), the fiel of activity (Field_of_act_Dim), and the language of thee term (Language_Dim) etc.
In the above mentioned DW, dimension “Date” has a hierarchical structure with the following hierarchy levels: “year –– month –– day”, as well as dimension “Language” with the following hierarchy levels: “language family -language group”.
-
III. Analysis of Olap-Cube Data in Dss for Ntis
The OLAP-cube integrated into the NTIS enables adapting OLAP-technologies for the analysis, visualization of multidimensional data and processing of terminological information. This integration results in the identification of trends and problems that constitutes the basis for affecting the management system.
Visualizing OLAP cube data in various aspects using OLAP operations allows analysts to implement more comprehensive analysis. OLAP operations include: slice, Dicing, Roll up, Drilling down, etc. The operation Slicing generates slices. A slice is a subset of a multidimensional array that corresponds to one value for one or several dimension elements that are not included in the subset. The operation dicing generates a dice that has more than two dimensions of the cube.
Roll up is a special analytical method by which the user shifts from drilled down data to rolled up data, whereas in Drilling down, on the contrary, from rolled up data to drilled down data. The Roll up function may include SUM, AVERAGE, COUNT, etc. [21].
An OLAP cube may have any number of dimensions. Figure 5 shows the OLAP-cube, built on the basis of the proposed DW. In this case, three dimensions that are the attributes of DW are selected for the analysis. They are Date, Field of activity and Source of data.

In the presented example of an OLAP cube, its cells contain the numbers representing the number of terms entered on the relevant date, related to respective field of activity, and generated in respective languages. Since two of these dimensions, namely Language and Date, are hierarchical, the results of the Roll up function SUM ()
will be placed in them. OLAP-cube also enables drilling down these values. With the use of a software application, this numerical information can be visualized, that is, inclusion of the terms themselves in the cells as well.
Figure 6 presents the versions of the cube slices shown in Figure 5. The slices are generated in three flats.


Fig.6. Examples of cube slices

c) cd

Figure 6 shows a) a slice of this cube in the flats “Field_of_act” and “Language” for the selected time period “Date”, b) - a slice of this cube in the flats “Date” and “Field_of_act” for the selected language of the term, “Language”, c) - a slice of this cube in the flats “Date” and “Language” for the selected field of the term “Field_of_act”. As a result of these operations, we obtain the data presented in the form of two-dimensional tables containing the corresponding indicators.
Figure 7 shows a dice of the cube, for which the values of three dimensions are selected simultaneously.
Data processing and analysis in OLAP cube represents the process of finding valuable information and detecting hidden patterns using the methods such as statistics, data collection and analysis, and forecasting.

Language
Fig.7. Sample cube cut
Note that the source and rolled up data in the DSS can be stored either in relational or in multidimensional databases. Three main ways of data storage are used at the physical level: multidimensional model, relational model ROLAP, and hybrid model HOLAP [22].
For multifunctional systems, HOLAP architecture is the most efficient and convenient. HOLAP is a hybrid architecture that combines ROLAP and MOLAP technologies. HOLAP architecture assumes the presence of two databases simultaneously: relational and multidimensional databases. The relational database is designed to store the drilled down source data, as well as to execute SQL queries in them. In addition to drilled down data, a multidimensional database also stores rolled up data. Since NTIS is a multifunctional system, it would be reasonable to use HOLAP for the implementation of the task.
Obviously, reporting is one of the most important components of OLAP. It should be noted that the use of HOLAP provides almost the entire set of necessary functions for reporting and data analysis on its basis. In this case, reporting can be implemented based on the data taken from a two-dimensional relational database, as well as on OLAP-cubes. Availability of these opportunities provides more comprehensive picture for data analysis.
-
IV. Case Study
In order to realize the approach proposed in the article to include OLAP technology in the NTIS, a model is developed in which the DW is based on data extracted from terminological dictionaries in various fields.
The system is implemented for personal computers operating in Windows 7, Windows 8, etc. The OLAP implementation environment includes Pivot Table Excel which is an OLAP visualization. The number of terms in various dimensions is determined by the aggregate function COUNT ().
Fig. 8 shows the distribution of the number of terms for all languages in all fields of activity.
Development of a Decision Support System with the use of OLAP-Technologies in the
National Terminological Information Environment
1 |
Language |(AII)| |
2 |
|
3 4 |
Quantity across the field Date Column names £] Row names - biochemistry chemistry computer science ethics history economy mathematics pedagogics physics psychology psychology |
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 ЯП |
Abscissa 1 Algorithm 1 Anisotropic filtering 1 AntiCam 1 Aqua-Lung 1 Authentication 1 Aviation 1 Barometer 1 BASIC 1 Bunsen-Rosco Law 1 Capitalism 1 Catalysis 1 Catharsis Computer 1 Deontology 1 Differential evolution 1 Diffusion 1 Digital electronic computer 1 Ellipse 1 Exponent 1 Golden ratio 1 Imperium 1 integral 1 Isomerism 1 |
Fig.8. The distribution of the number of terms for all languages in all fields of activity. |
Fig. 9 shows the distribution of the number of terms for the English language in all fields of activity.

Fig.9. The distribution of the number of terms for the English language in all fields of activity.
Fig. 10 shows the distribution of the number of terms for all languages in all fields of activity and by date.
Quantity across the field Language Column names * |
• computer science u ethics |
|||
Row names |
t biochemistry В |
- chemistry 1830 1833 1835 1858 1860 1862 1928 1930 1951 |
chemistry total |
|
Abscissa |
||||
Algorithm |
1 |
|||
Anisotropic filtering |
1 |
|||
AntiCam |
1 |
|||
Aqua-Lung Authentication Aviation Barometer |
1 |
1 |
||
BASIC Beurs Bunsen-Rosco Law |
1 |
1 |
1 |
|
Capitalism Catalysis Catharsis Computer |
1 |
1 |
1 |
|
Deontology |
1 |
|||
Differential evolution |
1 |
|||
Diffusion Digital electronic computer Ellipse Exponent Golden ratio |
1 |
1 |
1 |
|
imperium integral Icnmoncm |
1 |
1 |
||
Fig.10. The distribution of the number of terms for all languages in all fields of activity and by date.
Fig. 11 shows the distribution of the number of terms for all fields of activity in different languages.

Fig.11. The distribution of the number of terms for all fields of activity in different languages.
Fig. 12 shows the distribution of the number of terms for all languages in computer science.

Fig.12. The distribution of the number of terms for all languages in computer science.
-
V. Conclusion
The establishment of a national terminology information system is an important step towards the development of the field of terminology in Azerbaijan. The implementation of NTIS is predicted to support the terminological activities in Azerbaijan based on the international and national standards, and to enable increasing number of people to participate in online forums. In addition, the consolidation of terminology dictionaries in a single information system, which assumes NTIS, will increase the effectiveness of terminological research.
The distinctive contribution of NTIS is the formulation of the Azerbaijani term base. DSS proposed in this article, developed on the basis of DW and OLAP technologies, will make NTIS more advanced and flexible. It will provide significant support to decision-makers, improve the analytical work in the field of terminology, and play an important role in research.
Список литературы Development of a decision support system with the use of OLAP-technologies in the national terminological information environment
- Rasim M. Alguliyev, Afruz M. Gurbanova. The Conceptual Foundations of National Terminological Information System // I. J. of Education and Management Engineering, Vol. 8, No. 4, Jul. 2018, pp.19-30.
- Aris Tsois, Nikos Karayannidis , Timos Sellis. MAC: Conceptual Data Modeling for OLAP. Proceedings of the International Workshop on Design and Management of Data Warehouses (DMDW'2001) Interlaken, Switzerland, June 4, 2001, V.5, p. 1-11. http://ceur-ws.org/Vol-39/paper5.pdf
- Korobko A.V., Penkova T.G. Conceptual OLAP-modeling method based on formal conceptual analysis. Bulletin of the Siberian State Aerospace University after Academician M.F. Reshetneva, Russia, 2010, No 4 (30), pp. 74-79
- Kulagin V.P., Matchin V.T. Mathematical modeling of OLAP-cube in the context of aggregation of simple and hierarchical dimensions. News of the Tomsk Polytechnic University, 2010, Vol. 316, No 5, pp. 72-75.
- Thanh Binh Nguyen, A. Min Tjoa, and Roland Wagner. Conceptual Multidimensional Data Model Based on MetaCube, International Conference on Advances in Information Systems ADVIS 2000, LNCS 1909, 2000, pp. 24-33,
- Biryukov A.S. Decision Making and Data Warehouse // DBMS, 1997, No4, pp. 37-41.
- Berner Eta, S. Clinical Decision Support Systems. Theory and Practice / S. Berner Eta. Springer, 2007, 274 p.].
- Inmon W. Building the Data Warehouse (4th Edition), 2005, 543 p.
- Kudryavtsev Yu. OLAP-technologies: a review of tasks and studies // Business Informatics, 2008, No1, pp. 66-70,
- Pedersen T., Jensen K., Technology of multidimensional databases // Open Systems, 2002, No1, http://www.osp.ru/os/2002/01/180958/
- Fedorov A., Elmanova N. Introduction to Microsoft OLAP-technology. Moscow: Dialog-MEPI, 2002, 268 p.
- Shavelyev L.V. Operational analytical data processing: concepts and technologies, http://www.olap.ru/basic/olap_and_ida.asp
- Zabotnev M.S. Methods of presenting information in sparse data hypercubes, http://www.olap.ru/basic/theory.asp
- Kashirin I.Yu., Semchenkov S.Yu. Interactive analytical data processing in modern OLAP-systems // Business Informatics, 2009, No2, pp. 12-19.
- Kupriyanov M.S., Stepanenko V.V., Kholod I.I. Data analysis technologies: Data Mining, Visual Mining, Text Mining, OLAP. Spb .: BHV-Petersburg, 2007, 384 p.
- Nekrasov V. Mobile OLAP//Open Systems, 2003, No5.
- Nozhenkova L.F. OLAP-modeling tools and their use in health care problems // Mathematical methods of pattern recognition, M, 2007, vol.13, No1, pp. 609-612.
- Alguliyev R., Aliguliyev R., Nabibayova G. The Method of Measuring the Integration Degree of Countries on the Basis of International Relations // I.J. Intelligent Systems and Applications, 2015, no. 11, pp. 10-18.
- Spirley E. Corporate Data Warehouse. Planning, development and implementation. 2001, vol. 1, 400 p.
- Imhoff C. Understanding the Three E's of Integration EAI, EII and ETL. Management Magazine, April 2005, http://www.information-management.com/issues/20050401/1023893-1.htm
- E.M. Forster, G. Wallas, A. Gide. Data Visualization, https://apandre.wordpress.com/data/datacube/
- Mironov A.A., Mordvinov V.A., Skuratov A.K. Semantic-entropic management of OLAP and xOLAP integration models in SemanticNET (ONTONET) // Informatization of education and science, 2009, No2, pp. 21-30.