Научные статьи \ Общие вопросы науки и культуры \ Информационные технологии. Вычислительная техника. Обработка данных \ Аппаратные средства. Техническое обеспечение

Evaluation of Quality for Semi-Structured Database System

Автор: Rita Ganguly, Anirban Sarkar

Журнал: International Journal of Computer Network and Information Security @ijcnis

Статья в выпуске: 12 vol.11, 2019 года.

Бесплатный доступ

The quality evaluation of transactional calculus for semi-structured database system develops metrics for data quality. A conceptual data model of higher quality leads to a higher quality information system. Quality of a data model may affect with effectiveness i.e. the quality of results and the efficiency like time, effort etc. of development of information system. Hence, boosting up the data model quality is also prone to improve quality of delivered system. An array of metrics for quality has been proposed for the semi-structured data model with proper blend of metrics framework suitable for transactional calculus for data model of semi-structured. This paper proposes a framework for quality evaluation of transactional calculus for semi-structured database system using TCSS X-Query. In the proposed quality evaluation, the viewpoint has been described using a set of proposed quality measurements. Each of these quality measurements is linked with set of related metrics. The framework comprised of direct and indirect metrics for the purpose of quality evaluation. The framework facilities a double-fold view point using a set of quality measurement. In quality evaluation two viewpoint quality dimensions are focused: like designer level viewpoint and user level viewpoint. The proposed metrics set and measurements have been validated empirically. The purpose of empirical validation is to establish the metrics are practically useful for the assessment of quality measurements and operability factor.

Еще

TCSS, Semi-structured, Metrics, Empirical Validation, GQL-SS, Quality Evaluation

Короткий адрес: https://sciup.org/15017006

IDR: 15017006 | DOI: 10.5815/ijcnis.2019.12.04

Текст научной статьи Evaluation of Quality for Semi-Structured Database System

Published Online December 2019 in MECS

{data($p1//mname)}

BI PIN

RASHI

SASHI

PRIYA

6. Find the project name and project id from the CSG Project1 and Project2

{data($p1//pname)}

{data($p1//pid)}

{data($p2//pname)}

{data($p2//pid)}

< table ID=”project”>

ABC XYZ DEF XYZ ABC

P1001 P1003 P1004 P1005 P1001

PQR

P1006

7. Find the details of publications where MName=”Bipin” from project1 and also find the details of publication where MName=”Priya” from Project2.

and $p2//mname = "PRIYA" return

{$p1//(puid,ptopics)}

{$p2//(puid,ptopics)}

P001

RRR

P007

NNN

VI. Framework for Evaluation of Quality

The quality evaluation in multidimensional data model of conceptual level is two-fold viewpoints. Set of criteria are associate with each viewpoint, which are further defined using proposed metrics. As stated earlier, the performance (quality of results) and productiveness (time, cost, effort) of information system development may be affected by the grade of conceptual data model. In quality evaluation, the two viewpoints are (1) Designer level viewpoint and (2) User level viewpoint .The criteria like transactional/query throughput and query performance are identified by the viewpoint of designer level and the criteria like effectiveness and analyzability are identified by the viewpoint of user level.

A. Transactional/Query Throughput (QthD)

Throughput measures the number of transactions executed per second. Generally, the speed of a database in a system measured by throughput. For execution of the full set of five queries in different order, a number of query users(S) are chosen, which described in section VII Case-I and Case-II. The throughput metric is computed as the total amount of works (Sx5), converted to hours from seconds (3600 seconds per hour) and divided the total elapsed time (T S )[Elapsed time is simply the amount of time that passes from the beginning of an event to its end.] required between the starting of first query and completion of the last one query.

Qt ℎ=

SX5X3600

B. Performance of the Query Execution (QEP)

Performance of query execution means the amount of time for execution of a specified query retuning with an appropriate resultant set i.e the time to make a successful one round trip.

C. Effectiveness (E)

A schema is said to be effectiveness when it represents user requirements in a natural way as well as semantic way. Measurement of effectiveness used the concept of some conceptual data model which is sufficient for exposed of some specified user requirement analysis in the system.

QthD

D. Analyzability

The Analyzability is the measurement of the flexibility of a user in a database model.

VII. Empirical Validations of Proposed Metrics

This portion of the article is focused on the proposed metrics and measurements empirical validation in order to prove their practical utility. The set of proposed metrics use the mechanism for guiding the grade of data models from a practical point of view that is known as the objective of empirical study. An experiment has been setup for analyzing the group of metrics and the proposed quality measurements like query throughput and query execution performance. The process of empirical validation also aims to recognize the metrics and measurement from the proposed set.

A. Experimental Settings

The desire definition of the experiment using TCSS model can be encapsulated as:

In order to analyze the set of metrics for TCSS for the purpose of evaluating if they are useful with respect of the measurement of quality of a specified data model and operability in the context of query.

Query :

To examine the scalability of proposed TCSS X-Query implementation, trying to perform an experimental evaluation using ORDER XML and PART XML database. The size of ORDER xml is 5571 KB and the size of PART xml is 1000KB respectively.

Cases :

According to the query and their types the database is organized into five basic types of queries: Selection, Retrieve, Union, Intersection and join; and the query subsets are categorized into Q1 to Q5. Trying to perform an experimental evaluation using ORDER XML database (data are taken anonymously)(Case I). The size of ORDER xml is 5571 KB(Case II).

Case I:

Trying to perform an experimental evaluation using ORDER XML database (data are taken anonymously). The size of ORDER xml is 5571 KB.

Q1: find the order status and order date from CSG order.

Q2: find the details of order where ORDERKEY=”2” and order CUSTOMERKEY= “781”.

Q3: find the details of order priority and order comment where O_ORDERKEY=”992” and “358”.

Q4: find the order STATUS which have the same CUSTOMERKEY=”317” and ORDERKEY=”998”.

Q5: find the CUSTOMERKEY and ORDERSTATUS of all orders where all the ORDERID are same.

Case II:

Trying to perform an experimental evaluation using PART XML database (data are taken anonymously). The size of PART xml is 1000KB.

Q1: find the parts name and parts Brand from CSG Part.
Q2: find the details of part where PARTKEY=”3” and Part RETAILPRICE=”903”.

Q3:find the details of PART BRAND and PART COMMENT where P_PARTKEY=”3” and “938”

Q4: find the part CONTAINER of all part which have the same PARTKEY=”1960” and

PARTBRAND=”BRAND#33”.

Q5: find the PART SIZE and PART TYPE of all parts where all the P_BRAND = “BRAND#32”.

Table 2. Metrics and Measurement Value of Each Case(in Scheam level)

CASE I Query	QthD	QEP	E
Q1	10.1882	2073	0.098
Q2	9.5377	2177	0.1048
Q3	9.9475	2093	0.1005
Q4	10.050	2094.5	0.0995
Q5	9.824	2139	0.1018

CASE II Query	QthD	QEP	E
Q1	15.3191	800.5	0.0653
Q2	16.129	806.5	0.0620
Q3	15.8765	736.5	0.0630
Q4	15.852	781.5	0.0631
Q5	6.375	736	0.0611

Table 3. Collected Operation Time in ms.((TCSS-X-query)

CASE		Q1	Q2	Q3	Q4	Q5	AVG-TIME
CASE I	Comp.Time	1469	1582	1544	1473	1509	1515.4
CASE I	Eva.Time	2045	2089	2040	2082	2086	2068.4
CASE II	Comp.Time	1442	1547	1591	1460	1487	1505.4
CASE II	Eva.Time	841	840	738	731	760	782

Hypotheses: the following hypotheses are used for the experiments [28]:

• Null hypothesis (H0): Into the set of metrics and Quality Measurement as well as feasible factor of data model have no significant relationship.
• Alternate hypothesis (H1): Into the set of metrics and Quality Measurements as well as feasible factor of data model have significant relationship.

B. Experimental Steps

In order to obtain the results of the experiment it is categorized into 2 phases. In the first phase, it is checked that there are no relationship among the group of schema metrics. In the second phase, there are relationship into the set of average compile time and the average operation time, which has been evaluated to identify the group of metrics. The feasible factor of the conceptual data model has been significantly influenced by the group of metrics [28].

This is performed the independency test using non-parametric chi-square test. In both type of analysis the level of significance is set to α=0.10. Thus in both types of analysis if ρ value ( 2tailed ) <0.10 the null hypothesis H0 will be rejected .

Phase- I: There is no relationship between the set of schema level metrics, which are tested using non-parametric chi-square test. The result has been shown in below(spss-22):

Chi-Square Tests

Value df Asymp. Sig. (2-sided) Pearson Chi-Square Likelihood Ratio N of Valid Cases 20.000a 16.094 5 16 16 .220 .446 a. 25 cells (100.0%) have expected count less than 5. The minimum expected count is .20.

Fig.2. Phase I Chi-Square Test Query Throughput* Query Name

Chi-Square Tests

Value df Asymp. Sig. (2-sided) Pearson Chi-Square 20.000a 16 16 .220 .446 Likelihood Ratio 16.094 N of Valid Cases 5 a. 25 cells (100.0%) have expected count less than 5. The minimum expected count is .20.

Fig.3. Phase I Chi-Square Test QEP* Query Name

Chi-Square Tests

Fig.4. Phase I Chi-Square Effectiveness * Query Name

The following Hypothesis is considered for the purpose. H01: Non significant relationship among all attributes. H11: Significant relationship among all attributes.

Reject H01, if p- value<0.10.

In Chi-Square Test all the obtained p- value are greater than α value 0.10. Hence it is significant that there is no significant relationship in all schema level metrics.

CASE I:

Hypothesis Test Summary

Chi-Square Tests

Value df Asymp. Sig. (2-sided) Pearson Chi-Square 20.000a 16 220 Likelihood Ratio 16.094 N of Valid Cases 5 16 .446 a. 25 cells (100.0%) have expected count less than 5. The minimum expected count is .20.

Fig.8. Phase II Chi-Square Test Evaluation Time*Query Name

Null Hypothesis Test Sig. Decision

The distribution of QueryThroughpi^1 Retain the

1 is the same across categories of k-r„J1|^ .406 null

QueryName. Wahis Test hypothesis.

I n d e p e n d e nt

2 The distribution of QEP is the sameSamples across categories of QueryName. Kruskal-

Wallis T est

Retain the .406 null hypothesis.

The distribution of Effectiveness is dependent- Retain the

3 the same across cateqories of LTmcJai .406 null

Q u e ryN a m e. Wallis est hyp oth esis.

Analyzing the above table, all the p-value obtained in Chi-Square Test is greater than α value 0.10 (its 0.22). it can be concluded that there exist a strong relation among compile time and evaluation time as the p-value>0.10, in each case. Hence, the proposed measure has significant influence on the Operability factor of conceptual level multidimensional data model.

Asymptotic significances are displayed. The significance level is .05.

Fig.5. CASE I Hypothesis Test Summary

CASE II:

Hypothesis Test Summary

	Null Hypothesis	Test	Sig.	Decision
■1	The distribution of QueryThroughpi^j¹®^^¹®^' is the same across categories of Q^ryName. К Test		.406	Retain the null hypothesis.
2	Independent- The distribution of QEP is the sameSamples across categories of QueryName. Kruskal" " Wallis Test		.406	Retain the null hypothesis.
3	The distribution of Effectiveness is the same across categories of QueryName.	IndependentSamples Kruskal-Wallis Test	.406	Retain the null hypothesis.

Asymptotic significances are displayed. The significance level is .05.

Fig.6. CASE II Hypothesis Test Summary

Phase-II: In this phase, there is a relation between proposed set of compile time and evaluation time. The set of average compile time and the average operation time, which has been evaluated to identify the existence of any significant influence of the group of metrics. The feasible factor of the conceptual data model has been significantly influenced by the group of metrics [26] .The result has been shown in below:

Chi-Square Tests

Value df Asymp. Sig. (2-sided) Pearson Chi-Square 20.000a 16 220 Likelihood Ratio 16.094 16 .446 N of Valid Cases 5 a. 25 cells (100.0%) have expected count less than 5. The minimum expected count is .20.

Fig.7. Phase II Chi-Square Test Compile Time*Query Name

VIII. Conclusion

In this article, a frame work of quality evaluation for conceptual level multidimensional data model has been discussed in general and for TCSS data model, in particular. The framework comprised of Direct and Indirect metrics. The set of proposed metrics use the mechanism for guiding the grade of data models from a practical point of view that is known as the objective of empirical study. An experiment has been setup for analyzing the group of metrics and the proposed quality measurements like query throughput and query execution performance. The process of empirical validation also aims to recognize the metrics and measurement from the proposed set. This article proposes a framework for quality evaluation of transactional calculus for semi-structured database system using TCSS X-Query with two different cases of data with different size.

This article also has been focused on empirical validation of the set of metrics and measurements to prove their practical utility. The several proposed metrics and operability factor of multidimensional conceptual data model significant influence is shown by empirical validation.

Список литературы Evaluation of Quality for Semi-Structured Database System

Conrad R., Scheffner D., Freytag J. C., "XML conceptual modeling using UML", 19thIntl. Conf. on Conceptual Modeling, PP: 558-574, 2000.
Anirban Sarkar, “Design of Semi-structured Database System: Conceptual Model to Logical Representation”,Book Titled: Designing, Engineering, and Analyzing Reliable and Efficient Software,Editors: H. Singh and K. Kaur, IGI Global Publications, USA, PP 74 – 95, 2013.
McHugh J., Abiteboul S., Goldman R., Quass D., Widom J., "Lore: a database management system for semistructured data", Vol. 26 (3), PP: 54 - 66, 1997.
Badia, A., "Conceptual modeling for semistructured data", 3rdInternational Conference on Web Information Systems Engineering, PP: 170 – 177, 2002.
Mani M., “EReX: A Conceptual Model for XML”, 2ndInternational XML Database Symposium, PP 128-142, 2004.
Suresh Jagannathan,Jan Vitek,Adam Welc, Antony Hosking , A Transactional Object Calculus, Dept of Comp.sc, Purdue University,West Lafayette,IN 47906, United States.
Liu H., Lu Y., Yang Q., "XML conceptual modeling with XUML", 28thInternational Conference onSoftware Engineering, PP: 973–976, 2006.
Combi C., Oliboni B., "Conceptual modeling of XMLdata", ACM Symposium on Applied Computing, PP: 467– 473, 2006.
Wu X., Ling T. W., Lee M. L., Dobbie G.,"Designing semistructured databases using ORA-SSmodel", 2ndInternational Conference on WebInformation Systems Engineering, Vol. 1, PP: 171 –180, 2001.
Seth Gilbert and Nancy Lynch. Brewer’s conjecture and the feasibility of consistent available, partition tolerant web services.SigActNews,June2002.
Rita Ganguly, Rajib Kumar Chatterjee ,Anirban Sarkar. “Graph Semantic based Approach for Querying Semi-structured Database System.”22nd International Conference on SEDE-2013,pp:79-84.
Seth Gilbert National University of Singapore and Nancy Lynch. Brewer’sMassachusetts Institute of TechnologyPerspectiveson the CAP Theorem.
Soichiro Hidaka Zhenjiang Hu Kazuhiro Inaba Hiroyuki Kato , “Bidirectionalizing Structural Recursion on Graphs”,Techical Report, National Institute of Informatics, The University of Tokyo/JSPS Research Fellow, The University of Electro-Communications, August 31, 2009.
Data Validation, Data Integrity, Designing Distributed Applications with Visual Studio NET,Arkady Maydanchik (2007), "Data Quality Assessment", Technics Publications, LLC
Object Oriented Transaction Processing in the KeyKOS® Microkernel. William S. Frantz ,Periwinkle Computer Consulting ,16345 Englewood Ave. Los Gatos, CA USA 95032 rantz@netcom.com Charles R. Landau ,Tandem Computers Inc. 19333 Vallco Pkwy, Loc 3-22 ,Cupertino, CA USA 95014 landau_charles@tandem.com.
Introduction to Object-Oriented Databases.Prof. Kazimierz Subieta, subieta@pjwstk.edu.pl,http://www.ipipan.waw.pl/~subieta Ni W., Ling T. W., “GLASS: A Graphical Query Language for Semi-structured Data”, 8th International Conference onDatabase Systems for AdvancedApplications, PP 363 –370, 2003.
R. K. Lomotey and R. Deters, “Datamining from document-append NoSQL,” Int. J. Services Comput., vol. 2, no. 2, pp. 17–29, 2014.
Braga, D., Campi, A. and Ceri, S., “XQBE (XQuery By Example): A visual interface to the standard XML querylanguage”, ACM Transactions on Database Systems(TODS), Vol.30 (5), pp. 398 – 443, 2003.
AnirbanSarkar, "Conceptual Level Design of Semi-structured Database System: Graph-semantic Based Approach", International Journal of Advanced Computer Science and Applications, The SAI Pubs. , New York, USA, Vol. 2, Issue 10, pp112– 121,November, 2011. [ISSN: 2156-5570(Online) &ISSN : 2158-107X(Print)].
T. W. Ling. A normal form for sets of not-necessarily normalized relations. In Proceedings of the 22nd Hawaii International Conference on System Sciences, pp. 578-586. United States: IEEE Computer Society Press, 1989.
T. W. Ling and L. L. Yan. NF-NR: A Practical Normal Form for Nested Relations. Journal of Systems Integration. Vol4, 1994, pp309-340.
Rita Ganguly,Anirban Sarkar “ Evaluations of Conceptual Models for Semi-structured Database system “.International Journal of Computer Applications.Vol 50, Issue 18, PP 5- 12,july,2012.[ISBN:973- 93-80869-67-3].
Rami Sellami, Sami Bhiri , and Bruno Defude, “Supporting Multi Data Stores Applications in cloud Environments.” IEEE Transactions on services computing, vol-9, No-1,pp-59-71, January/February2016.
O. Cur_e, R. Hecht, C. Le Duc, and M. Lamolle, “Data integration over NoSQL stores using access path based mappings,” inProc. 22nd Int. Conf. Database Expert Syst. Appl., Part I, 2011, pp. 481–495.
ACID vs. BASE: The Shifting pH of Database Transaction Processing,By Charles Roe ,www.dataversity net.
Basili, V.R., and Weiss, D.M.,1984, “ A Methodology for Collecting Valid Software Engineering Data,” IEEE Transactions on Software Engineering, Vol.SE-10, No.6.,November,pp.728-738.
Rita Ganguly , Anirban Sarkar,” An Approach to Develop a Transactional Calculus for Semi- Structured Database System.” International Journal of Computer Network and Information Security(IJCNIS), Vol.11, No.9, pp.24-39,2019. DOI:10.5815/ijcnis.2019.09.04
N.G.Das., Satistical Methods, Vol.I and II ,PP :546-558,2013.

Еще