Towards semantic Geo/BI: a novel approach for semantically enriching Geo/BI data with OWL ontological layers (OOLAP and ODW) to enable semantic exploration, analysis and discovery of geospatial business intelligence knowledge

Автор: Belko Abdoul Aziz Diallo, Thierry Badard, Frédéric Hubert, Sylvie Daniel

Журнал: International Journal of Information Engineering and Electronic Business @ijieeb

Статья в выпуске: 6 vol.10, 2018 года.

Бесплатный доступ

To contribute in filling up the semantic gap in data warehouses and OLAP data cubes, and enable semantic exploration and reasoning on them, this paper highlights the need for semantically augmenting Geo/BI data with convenient semantic relations, and provides OWL-based ontologies (ODW and OOLAP) which are capable of replicating data warehouses (respectively OLAP data cubes) in the form semantic data with respect of Geo/BI data structures, and which enable the possibility of augmenting these semantic BI data with semantic relations. Moreover, the paper demonstrates how ODW and OOLAP ontologies can be combined to current Geo/BI data structures to deliver either pure semantic Geo/BI data or mixed semantically interrelated Geo/BI data to business professionals.

Еще

Business Intelligence, Geospatial Business Intelligence, semantic Geo/BI, OLAP, Data warehouse, metadata, semantic gap, semantic relations, data semantics, OWL Ontology, semantic layer, data analysis, knowledge discovery, Decision support

Короткий адрес: https://sciup.org/15016150

IDR: 15016150   |   DOI: 10.5815/ijieeb.2018.06.01

Текст научной статьи Towards semantic Geo/BI: a novel approach for semantically enriching Geo/BI data with OWL ontological layers (OOLAP and ODW) to enable semantic exploration, analysis and discovery of geospatial business intelligence knowledge

Published Online November 2018 in MECS DOI: 10.5815/ijieeb.2018.06.01

  • I.    Introduction

    Business Intelligence (BI) technologies are among decision support systems (DSS) which are widely and increasingly adopted by companies [1] and global revenue in the business intelligence (BI) and analytics software market was forecast to reach $18.3 billion in 2017, and $22.8 billion by the end of 2020, according to the latest forecast from Gartner, Inc. [2]. Thanks to their multidimensional and multilevel data structures, data warehouses and OLAP data cubes provide (i) an effective way of quickly crossing data, (ii) a straightforward means of data aggregation, and (iii) a quick calculation of data, allowing then an intuitive analysis and exploration of data.

However, despite all these capabilities, BI and derived Geospatial BI (Geo/BI) data structures do not provide answers to all concerns regarding business analysis. A good illustration of this, is their lack of semantic information.

The semantic gap within BI data (data warehouses, OLAP cubes) as stated by [3] is well known among BI practitioners and researchers and several solutions have been proposed to overcome or minimize that gap. All these have generally been based on the principle of providing more semantics to metadata to get enough description of BI concepts and properties.

But semantics is also about data itself and relations that may exist between the data occurrences. For instance, the fact of knowing that the “company A” that purchased from us our company $ 500,000 of products last year is competing with “company B” that provides us with these products, and is in partnership with the “company C” which ensures our deliveries to customers, provides more valuable insights to a decision maker. Such a knowledge is explicitly absent in Geo/BI data and is generally unearthed by the analyst/decision maker after an additional effort of information search from other data sources: e.g. browsing the web, calling the customer service, etc. Moreover, this knowledge, once established, is very poorly exploited since it is not saved in a formal and structured manner. It usually remains buried in the memory of the decision maker having deducted it or is mentioned in unstructured documents (oral exchanges, notes, reports, etc.).

Taking into account such semantic relations between Geo/BI data can, not only enrich the data, but also provide decision makers with semantic-oriented analysis, exploration, and discovery of the data and knowledge. As an illustration, a salesperson might want to know the part of sales realized with client companies competing between them, and the part realized with partnering client companies, targeted to his current location (e.g. Beauport district in Quebec City).

Regrettably, nowadays, Geo/BI data structures do not provide such a semantic support. And to the best of our knowledge, there is not yet a work regarding semantic enrichment of Geo/BI data with semantic relations between data occurrences.

The present paper addresses this problem as follows. After reviewing major proposals on semantic enrichment of BI (section 2), the paper through a realistic case study justifies why there is still a need to semantically enrich BI data, this time, with semantic relations (section 3). Then the paper exposes its approach towards enabling full semantic Geo/BI solutions: to overcome the lack of semantics in data warehouses (respectively OLAP data cubes) and enable semantic exploration and reasoning on them, the authors have designed OWL-based ontologies (O.DataW and OOLAP) which are capable of replicating data warehouses (respectively OLAP data cubes) in the form semantic data with respect of the data structure, and which provides the possibility of augmenting these BI semantic data with semantic relations (Section 4). Finally, section 5 demonstrates how O.DataW and OOLAP ontologies can be combined to current Geo/BI data structures to deliver either pure semantic Geo/BI data or mixed semantically interrelated Geo/BI data.

  • II.    Related Work on Semantic Enrichment of BI Data

BI data is usually loaded into a logical multidimensional data model (e.g. Fig. 5) and physically stored in a huge database called data warehouse. The logical data model is supposed to hold and organize data in the same way as expressed in the conceptual data model it derives from.

Unfortunately, as reported by [3] and recalled by [4], there still is a semantic gap between advanced conceptual data models and relational or multidimensional implementations of data cube [5]. Additionally, it appears to be an open problem how to represent dimension constraints [6] or even less expressive context dependencies [7], both of which explain the existence of null values in dimensions in logical implementations and allow to reason about summarizability with respect to sets of attributes.

To overcome or minimize the semantic gap within BI data, several authors have proposed different solutions ranging from creating semantic bridges and enriching business/semantic metadata, to annotating BI data cubes with ontological models of OLAP cubes. Here are some major works regarding these various proposals.

Semantic bridges : to fill the gap between conceptual and logical models, [8] proposes the construction of a semantic bridge between the two models by using a model-driven architecture (MDA) to translate semantics from the conceptual level into OLAP logical system. An OLAP algebra is built by using OCL to express needs and semantics at the conceptual level. This algebra is then transformed into a logical schema (e.g. SQL) by using QVT language. [9] also used MDA [10] method, OCL [11] and QVT [12] languages to build a semantic derivation from conceptual geospatial data warehouse specifications into their suitable logical models.

Fig.1. Enriched business metadata connected to fact data from [13]

Enriched semantic/business metadata : To help OLAP users in establishing a link between OLAP metrics values and business goals they have to reach, [13] proposed to enrich business metadata with a UML-based meta-model which defines details regarding enterprise goals (e.g. Goal name, Goal perspective, Metric name, Metric target value + unit, etc.). That model of goals is then linked to the data warehouse containing the BI data, by using the technique of model weaving, which consists of establishing links describing the relationships between the goals model and the data warehouse model. This linkage is then used to display business metadata (e.g. business goals) related to OLAP fact data (e.g metrics values) such as illustrated in Fig. 1 provided by the authors. The same technique is used by [14] to “integrate Goals with Process Warehouse for Business Process Analysis” .

Ontology-based semantic annotation : semantic annotation is another method proposed by authors to fill in the semantic gap within BI data. [15] for example proposed to enrich OLAP data cubes by annotating them with ontological descriptions. These annotations are then exploited to display the semantics attached to a dimension or a measure like for instance, how it is aggregated or calculated. Fig. 2 shows an example of semantic annotation regarding the calculation formula of the measure ROI (Return On Investment).

Fig.2. Example of semantic annotation from [15]

[16] also adopted the semantic annotation approach to “facilitate the exchange of business calculation definitions” between users and organizations and to “allow their automatic linking to specific data warehouses through semantic reasoning” .

Ontology-based ETL and OLAP : [17] and [18] propose the use of ontologies to conduct data extraction from their sources, and data integration into data warehouses and OLAP cubes. For this purpose, the authors define an OLAP ontology (Fig. 3) which describes the formal OLAP cube structure (e.g. dimensions, measures, etc.). Then, data sources are located and converted in an RDF “format that makes the semantics of the data explicit” . For each data source, a mapping ontology is used to convert the data in a way that matches the OLAP cube ontology. Thereafter, the OLAP ontology and the RDF data are used to construct the OLAP cube. Fig. 3 shows a graphical version of an ontological OLAP cube model proposed by the authors in [19]. [20] also provided an OLAP ontology (Fig. 4) to help integrate distributed energy sensor data and compose new data cubes from existing ones by alleviating schemata inconsistency such as “attribute differences, missing data, or semantic and functional gaps”.

Fig.3. OLAP cube ontology proposed by [19]

Fig.4. OLAP ontology proposed by [20]

As it can be noticed, existing solutions mainly focus on semantically enriching BI metadata (e.g. concepts/classes, attributes/properties) rather than BI data itself (i.e. occurrences of concepts/classes or values of attributes/properties).

But semantics is also about relations existing between data occurrences. And to the best of our knowledge, there is not yet a work regarding semantic enrichment of BI data by considering semantic relations which may exist between the data.

  • III.    Why Semantically Augment Geo/BI data

To highlight us how semantic-augmented Geo/BI data could enhance business analysis, let consider a realistic case study of a business professional named IdoBI Reason.

  • A.    Case study: BioWYNX sales activities

M. Reason is a sales analyst and strategist moved from Washington DC to Quebec City to reorganize and expand the local branch of BioWYNX.

BioWYNX is a multinational firm specialized in selling biological food products. To minimize delivery fees, BioWYNX disposes of at least one storehouse per district from which salespersons can supply customers with desired products.

BioWYNX has its own salespersons but also deals with other mobile salespersons working for partnering companies and in accordance with these companies. So, these shared salesmen, when selling their own company products, can also sell BioWYNX products, and are rewarded by BioWYNX (the companies as well) according to sales they realized. A salesperson can supervise other salespersons (e.g. a team) or a given company.

To monitor efficiently its business performance, BioWYNX has deployed a BI platform. Fig. 5 represents the snowflake schema of the data warehouse from which OLAP cubes and mini-cubes (Fig. 6) are built. The dimensions are Products, Seller, Time, Location, and Customer which has two hierarchies. The measures are number of sold products (NbProdUnits), sales amount of a given product (SalesAmount), average of offered discounts (Discount), and the unit price (avgUnitPrice).

Fig.5. Snowflake-schema model for warehousing sales data

That general context of BioWYNX business activities will be used throughout this paper to highlight the lack of semantics in current BI data, and the need and relevance for semantic relations to provide semantic-based analysis, exploration, and discovery of BI data and correlations within data.

  • B.    Lack of semantics between BI data

Let us consider say that IdoBI, after replacing the former salesperson in chief, is meeting on the field, some key salesmen (e.g. Jack, Jim, John). While discussing with Jack, M. Reason is also exploring (e.g. drill down, roll up, etc.) from his smartphone, a SOLAP mini-cube related to sales performed by salesmen (including Jack) over the last five years. And from time to time, to argue what he is saying or proposing to Jack, M. Reason shows him the analysis data related to his sales.

After exploring and discussing Jack’s sales over the last five years thanks to SOLAP mini cubes, M. Reason would like now to explore sales related to salesmen whose Jack is the supervisor (e.g. Jim), and the sales performed by the supervisors of Jack (e.g. John) in a way like:

In my current location (e.g. Beauport district in Quebec City), who are the supervisors and supervisees of Jack that cumulated more than $100.000 sales of chocolate family products last month, and that have their offices near to my hotel or near my current position?

Such a semantic-oriented Geo/BI request (“is a supervisor of” defines a semantic relation between salespersons in seller dimension) brings an interesting new way of analyzing BI data and may ease and speed up the discovery of correlations between data. Indeed, through that request, M. Reason is trying to discover a correlation between salespersons performance and their supervisors/supervisees performance.

Example of mini-cube for analysis of customers' needs / consumption of food over the time

Example of mini-cube for analysis of salesmen performance by product overthe time

Example of mini-cube for analysis of products distribution/consum ption by location overthe time

Location

Fig.6. Example of OLAP hypercube and mini-cubes which can be generated from the previous data warehouse model (Fig. 5) thanks to server-side OLAP tools

Customer

Given the current capabilities of OLAP/SOLAP cubes, the only way for M. Reason to explore and analyze data regarding sales performed by the supervisors and supervisees of Jack is to proceed as follows:

  • (i)    Firstly, identify the names of salespersons Jack is supervising or that are supervisors of Jack. Such information is not available in OLAP cubes nor in data warehouses. Thereby, given that M. IdoBI does not yet know by heart all his employees and their organizational hierarchy, he might have to interrupt his discussion with Jack to look for that information by calling his secretary, or remotely accessing the employees’ file, etc.

  • (ii)    A. Thereafter, manually browse all members of the dimension level “Salesman” in the dimension “Seller” (cf. Fig. 5) of the SOLAP mini-cube until he finds a name among the names of Jack’s supervisors (John, etc.) or supervisees (Jim, etc.).

  • B .    Or write a BI data request using the dedicated MDX (Multidimensional Expressions) language to select sales regarding the list of salespersons previously identified as being supervisors or supervisees of Jack.

Such additional search and request tasks might be time consuming and inappropriate for competitive decision making and will not even allow M. Reason to navigate directly from Jack’s sales figures to John’s (supervisor) or Jim’s (supervisee) sales figures and vice-versa.

The today’s difficulty to make decision makers benefit of such semantic-oriented BI requests is due to the lack of semantics (especially semantic relations) in OLAP cubes as well as in data warehouses from which cubes are built. Indeed, there is no information attached to the S/OLAP cube that indicates for example that Jack is supervised by John and supervises Jim (In the data warehouse model in Fig. 5, there is no relation indicating that a salesperson may have a link with another one).

  • C.    Need for semantic exploration, analysis, and discovery of BI data and correlations within data

Fig. 7 visualizes the situation aforementioned, points out the lack of semantics in S/OLAP cubes, and highlights an example of how semantic relations, if they were present, would have been taken advantage to provide business professionals with a semantic support which can offer an advanced and meaningful exploration, analysis and discovery of BI data. Different examples of semantic relations are illustrated (e.g. “is trainer of”, “is supervisor of”, “is friend of”) to bring a wider view of semantic relations richness. Red crosses express the absence of these relations in today’s S/OLAP technologies.

The lack of semantic relations between BI data does not concern only data within the same level of dimension as depicted in Fig. 7. This also concerns the members of different levels in the same dimension (e.g. to which organization the customer “M. Ido Buy” belongs to? in order to access directly sales made with that organization from sales made with “M. Ido Buy”), as well as data in different dimensions (e.g. Are there some client organizations which are in competition with selling companies?).

Indeed, let consider that M. IdoBI Reason is now performing a drill down operation from Team level (in Seller dimension) to Salesman level in order to explore in detail, sales performed by each salesman of a given team (e.g. BioTeamX1). The list of salespersons (e.g. John, Jack, Jim) is clearly known as “belonging” to the selected team (i.e. BioTeamX1). Now, if he wants to get back to the team’s sales figures (i.e. BioTeamX1) from one of its salespersons (e.g. Jack), by applying the inverse operation (i.e. roll-up), M. IdoBI will get the list of all teams (e.g. BioTeamX2…BioTeamX10) rather than the desired team. He will not then be able to identify and focus only on the team of Jack since he is not meant to memorize all the employees’ names and their related

Example of missing semantic relations

Product Dimension (Category level)

—► Is supervisor

Cube

Time Dimension (Quarter Level)

Seller Dimension (Salesman Level)

Fig.7. Lack of semantic support for a meaningful exploration, analysis and discover of BI data in today’s S/OLAP technologies

trainer of

Supervisors of Jack : John )fSupervisees of Jack : Ji

m^easures

Product

Time

Seiler     v

-SalesAmount

4 Chocolate

E Dec. 2011

Jack

550

p Milk

0 Dec 2011

Jack

1050

1 Rice

0 Dec 2011

Jack

950

1 Fish

E Dec. 2011

Jack

350

E Bread

Dec. 2011

Jack

425

Example of today’s analytic navigation into Bl data using client side S/OLAP tool (semantic supportfora meaningful exploration of Bl data is missing)

teams. This means that exploring, analyzing and discovering data from Parent-Level to Child-level is possible while getting back from a child (in a Child-level) to its exact parent (in a Parent-level) is not offered by today’s BI technologies. Putting a semantic relation between children and parents (e.g. belongs to) can overcome that issue.

Fig.8. Example of semantic relations that might exist between data within the same level, between levels of the same dimension, and between dimensions

Fig. 8 provides various examples of semantic relations that might exist between data within the same level, between levels of the same dimension, and between dimensions. For instance, a selling company may be in competition with another one or be a partner of a customer organization. It also underlines the relevance of the “Belongs to” relation for providing a direct roll up to the expected member instead of getting all members of the upper level.

Semantic spatial relations might also exist between the location dimension and other dimensions members, or within the location dimension members. Some examples of these spatial relations are emphasized in dark red in Fig. 8-A. For instance, a district may be near to/far from/adjacent to another district; a given city may be situated in the east, west, north or the south of a given country, etc. If Geo/BI systems usually provide spatial analysis capabilities that can easily compute topological relations (ex. adjacency) between geospatial objects, non-geospatial BI systems do not, and both BI and Geo/BI systems do not take into account semantic relative positions such as:

  • (i)    Relative distances: near, far, in front of, behind, etc.

  • (ii)    Relative levels: above, below, etc.

  • (iii)    Relative orientations: left, right, east, west, north, south, north-east, south-west, etc.

Given this semantic deficiency of BI data highlighted throughout this case study, let us review the various solutions proposed in the literature to semantically enrich data warehouses and data cubes.

  • IV.    Design Approach for Semantically Enriching Geo/BI data

Semantic relations already exist in OLTP databases, which are often used as data sources for data warehousing BI data. For instance, in Human Resources Management systems, one can know which employees supervise the others thanks to various joins connecting tables. By contrast, for the purpose of speeding up calculations and queries, and providing a quick, efficient and simplified access to analysis-oriented data, Geo/BI data structures are built to reduce as much as possible, these joins. This means that additional relations (joins) except those linking fact tables to dimensions tables are to avoid, or even not desired. Therefore, any solution aiming to add and establish semantic relations within BI data should be, somehow, external to Geo/BI data structures.

Db

IsBuiltFrom hasFactTable.-------------- Referenc< at least f FactFKcolumnj---- exactly

Список литературы Towards semantic Geo/BI: a novel approach for semantically enriching Geo/BI data with OWL ontological layers (OOLAP and ODW) to enable semantic exploration, analysis and discovery of geospatial business intelligence knowledge

  • Gartner.com, "Gartner Forecasts Global Business Intelligence Market to Grow 9.7 Percent in 2011," Gartner, 18 02 2011. [Online]. Available: http://www.gartner.com/it/page.jsp?id=1553215. [Accessed 26 05 2012].
  • Gartner.com, «Gartner Says Worldwide Business Intelligence and Analytics Market to Reach $18.3 Billion in 2017,» 17 02 2017. [En ligne]. Available: https://www.gartner.com/newsroom/id/3612617.
  • S. Rizzi, A. Abello, J. Lechtenborger et J. Trujillo, «Research in data warehouse modeling and design:dead or alive?,» DOLAP, pp. 3-10, 2006.
  • A. C. S. C. N. &. B. S. Sarkar, «Implementation of Graph Semantic Based Multidimensional Data Model: An Object Relational Approach,» International Journal of Computer Information System and Industrial Management Applications (IJCISIM), 3, pp. 127-136, 2011.
  • E. Malinowski and E. Zimányi, "Hierarchies in a multidimensional model: from conceptual modeling to logical representation," Data Knowl. Eng., 59 (2), p. 348–377, 2006.
  • C. A. Hurtado and A. O. Mendelzon, "OLAP dimension constraints," in ACM PODS, 2004.
  • J. Lechtenborger et G. Vossen, « Multidimensional normal forms for data warehouse design,» Information Systems, vol. 28, n° %15, p. 415–434, 2003.
  • J. Pardillo, J.-N. Maz´on et J. Trujillo, «Bridging the semantic gap in OLAP models: platform-independent queries,» DOLAP, p. 89–96, 2008.
  • O. Glorio et J. Trujillo, «An MDA Approach for the Development of Spatial Data Warehouses,» chez DaWak, Turin, Italy, 2008.
  • Wikipedia.org, «Model-driven architecture,» en.wikipedia.org, 26 11 2012. [En ligne]. Available: http://en.wikipedia.org/wiki/Model-driven_architecture. [Accès le 26 11 2012].
  • Wikipedia, «Object Constraint Language,» wikipedia.org, 26 11 2012. [En ligne]. Available: http://en.wikipedia.org/wiki/Object_Constraint_Language. [Accès le 26 11 2012].
  • Wikipedia.org, «QVT,» en.wikipedia.org, 26 11 2012. [En ligne]. Available: http://en.wikipedia.org/wiki/QVT. [Accès le 26 11 2012].
  • V. Stefanov et B. List, «Business Metadata for the DataWarehouse - Weaving Enterprise Goals and Multidimensional Models,» chez 10th IEEE Int.Enterprise Distributed Object Computing Conference Workshops, 2006.
  • K. Shahzad et J. Zdravkovic, «Towards Goal-driven access to Process Warehouse: Integrating Goals with Process Warehouse for Business Process Analysis,» chez Fifth IEEE International Conference on Research Challenges in Information Sciences (RCIS), Guadeloupe, France, 2011.
  • C. Diamantini et D. Potena, «Semantic Enrichtment of Strategic Datacubes,» chez 11th International Workshop on Data Warehousing and OLAP (DOLAP 2008), New York, 2008.
  • M. Kehlenbeck and M. Breitner, "Ontology-Based Exchange and Immediate Application of Business Calculation Definitions for Online Analytical Processing," in DaWaK 2009, 2009.
  • M. Niinimaki et T. Niemi, «An ETL Process for OLAP Using RDF/OWL Ontologies,» Journal on Data Semantics XIII. LNCS, vol. 5530, Springer, Heidelberg, p. 97–119, 2009.
  • T. Niemi, S. Toivonen, M. Niinimaki et J. Nummenmaa, « Ontologies with Semantic Web/grid in data integration for OLAP.,» International Journal on Semantic Web and Information Systems, Special Issue on Semantic Web and Data Warehousing 3(4), 2007.
  • T. Niemi et M. Niinimaki, «Ontologies and summarizability in OLAP,» chez ACM Symposium on Applied Computing SAC 10, 2010.
  • N. Shah, C. Tsai, M. Marinov, J. Cooper, P. Vitliemov et K. Chao, «Ontological On-line Analytical Processing for Integrating Energy Sensor Data,» Iete Technical Review 26, 375, 2009.
  • T. Gu, X. Wang, H. Pung et D. Zhang, «An ontology-based context model in intelligent environments,» chez Communication Networks and Distributed Systems Modeling and Simulation Conference, San Diego, CA, USA, 2004.
  • IHMC.US, "COE," 10 03 2012. [Online]. Available: http://www.ihmc.us/sandbox/groups/coe/wiki/welcome/attachments/d2a1b/COEmanual06.pdf. [Accessed 10 03 2012].
  • É. Dubé, «Conception et développement d’un service web de constitution de mini cubes solap pour clients mobiles,» Msc Dep. Sciences geomatiques, Centre de Recherche en geomatique, Univ. Laval, Quebec, QC, Canada, 2008.
  • OMG, «Common Warehouse Metamodel (CWM) specification,» Object Management Group, Inc., 2003.
Еще
Статья научная