Evaluation of Quality for Semi-Structured Database System
Автор: Rita Ganguly, Anirban Sarkar
Журнал: International Journal of Computer Network and Information Security @ijcnis
Статья в выпуске: 12 vol.11, 2019 года.
Бесплатный доступ
The quality evaluation of transactional calculus for semi-structured database system develops metrics for data quality. A conceptual data model of higher quality leads to a higher quality information system. Quality of a data model may affect with effectiveness i.e. the quality of results and the efficiency like time, effort etc. of development of information system. Hence, boosting up the data model quality is also prone to improve quality of delivered system. An array of metrics for quality has been proposed for the semi-structured data model with proper blend of metrics framework suitable for transactional calculus for data model of semi-structured. This paper proposes a framework for quality evaluation of transactional calculus for semi-structured database system using TCSS X-Query. In the proposed quality evaluation, the viewpoint has been described using a set of proposed quality measurements. Each of these quality measurements is linked with set of related metrics. The framework comprised of direct and indirect metrics for the purpose of quality evaluation. The framework facilities a double-fold view point using a set of quality measurement. In quality evaluation two viewpoint quality dimensions are focused: like designer level viewpoint and user level viewpoint. The proposed metrics set and measurements have been validated empirically. The purpose of empirical validation is to establish the metrics are practically useful for the assessment of quality measurements and operability factor.
TCSS, Semi-structured, Metrics, Empirical Validation, GQL-SS, Quality Evaluation
Короткий адрес: https://sciup.org/15017006
IDR: 15017006 | DOI: 10.5815/ijcnis.2019.12.04
Текст научной статьи Evaluation of Quality for Semi-Structured Database System
Published Online December 2019 in MECS
-
6. Find the project name and project id from the CSG Project1 and Project2
< table ID=”project”>
-
7. Find the details of publications where MName=”Bipin” from project1 and also find the details of publication where MName=”Priya” from Project2.
and $p2//mname = "PRIYA" return
-
VI. Framework for Evaluation of Quality
The quality evaluation in multidimensional data model of conceptual level is two-fold viewpoints. Set of criteria are associate with each viewpoint, which are further defined using proposed metrics. As stated earlier, the performance (quality of results) and productiveness (time, cost, effort) of information system development may be affected by the grade of conceptual data model. In quality evaluation, the two viewpoints are (1) Designer level viewpoint and (2) User level viewpoint .The criteria like transactional/query throughput and query performance are identified by the viewpoint of designer level and the criteria like effectiveness and analyzability are identified by the viewpoint of user level.
-
A. Transactional/Query Throughput (QthD)
Throughput measures the number of transactions executed per second. Generally, the speed of a database in a system measured by throughput. For execution of the full set of five queries in different order, a number of query users(S) are chosen, which described in section VII Case-I and Case-II. The throughput metric is computed as the total amount of works (Sx5), converted to hours from seconds (3600 seconds per hour) and divided the total elapsed time (T S )[Elapsed time is simply the amount of time that passes from the beginning of an event to its end.] required between the starting of first query and completion of the last one query.
Qt ℎ=
SX5X3600
-
B. Performance of the Query Execution (QEP)
Performance of query execution means the amount of time for execution of a specified query retuning with an appropriate resultant set i.e the time to make a successful one round trip.
-
C. Effectiveness (E)
A schema is said to be effectiveness when it represents user requirements in a natural way as well as semantic way. Measurement of effectiveness used the concept of some conceptual data model which is sufficient for exposed of some specified user requirement analysis in the system.
QthD
-
D. Analyzability
The Analyzability is the measurement of the flexibility of a user in a database model.
-
VII. Empirical Validations of Proposed Metrics
This portion of the article is focused on the proposed metrics and measurements empirical validation in order to prove their practical utility. The set of proposed metrics use the mechanism for guiding the grade of data models from a practical point of view that is known as the objective of empirical study. An experiment has been setup for analyzing the group of metrics and the proposed quality measurements like query throughput and query execution performance. The process of empirical validation also aims to recognize the metrics and measurement from the proposed set.
-
A. Experimental Settings
The desire definition of the experiment using TCSS model can be encapsulated as:
In order to analyze the set of metrics for TCSS for the purpose of evaluating if they are useful with respect of the measurement of quality of a specified data model and operability in the context of query.
Query :
To examine the scalability of proposed TCSS X-Query implementation, trying to perform an experimental evaluation using ORDER XML and PART XML database. The size of ORDER xml is 5571 KB and the size of PART xml is 1000KB respectively.
-
Cases :
According to the query and their types the database is organized into five basic types of queries: Selection, Retrieve, Union, Intersection and join; and the query subsets are categorized into Q1 to Q5. Trying to perform an experimental evaluation using ORDER XML database (data are taken anonymously)(Case I). The size of ORDER xml is 5571 KB(Case II).
-
Case I:
Trying to perform an experimental evaluation using ORDER XML database (data are taken anonymously). The size of ORDER xml is 5571 KB.
Q1: find the order status and order date from CSG order.
Q2: find the details of order where ORDERKEY=”2” and order CUSTOMERKEY= “781”.
Q3: find the details of order priority and order comment where O_ORDERKEY=”992” and “358”.
Q4: find the order STATUS which have the same CUSTOMERKEY=”317” and ORDERKEY=”998”.
Q5: find the CUSTOMERKEY and ORDERSTATUS of all orders where all the ORDERID are same.
-
Case II:
Trying to perform an experimental evaluation using PART XML database (data are taken anonymously). The size of PART xml is 1000KB.
-
Q1: find the parts name and parts Brand from CSG Part.
-
Q2: find the details of part where PARTKEY=”3” and Part RETAILPRICE=”903”.
Q3:find the details of PART BRAND and PART COMMENT where P_PARTKEY=”3” and “938”
-
Q4: find the part CONTAINER of all part which have the same PARTKEY=”1960” and
PARTBRAND=”BRAND#33”.
-
Q5: find the PART SIZE and PART TYPE of all parts where all the P_BRAND = “BRAND#32”.
Table 2. Metrics and Measurement Value of Each Case(in Scheam level)
CASE I Query |
QthD |
QEP |
E |
Q1 |
10.1882 |
2073 |
0.098 |
Q2 |
9.5377 |
2177 |
0.1048 |
Q3 |
9.9475 |
2093 |
0.1005 |
Q4 |
10.050 |
2094.5 |
0.0995 |
Q5 |
9.824 |
2139 |
0.1018 |
CASE II Query |
QthD |
QEP |
E |
Q1 |
15.3191 |
800.5 |
0.0653 |
Q2 |
16.129 |
806.5 |
0.0620 |
Q3 |
15.8765 |
736.5 |
0.0630 |
Q4 |
15.852 |
781.5 |
0.0631 |
Q5 |
6.375 |
736 |
0.0611 |
Table 3. Collected Operation Time in ms.((TCSS-X-query)
CASE |
Q1 |
Q2 |
Q3 |
Q4 |
Q5 |
AVG-TIME |
|
CASE I |
Comp.Time |
1469 |
1582 |
1544 |
1473 |
1509 |
1515.4 |
Eva.Time |
2045 |
2089 |
2040 |
2082 |
2086 |
2068.4 |
|
CASE II |
Comp.Time |
1442 |
1547 |
1591 |
1460 |
1487 |
1505.4 |
Eva.Time |
841 |
840 |
738 |
731 |
760 |
782 |
Hypotheses: the following hypotheses are used for the experiments [28]:
-
• Null hypothesis (H0): Into the set of metrics and Quality Measurement as well as feasible factor of data model have no significant relationship.
-
• Alternate hypothesis (H1): Into the set of metrics and Quality Measurements as well as feasible factor of data model have significant relationship.
-
B. Experimental Steps
In order to obtain the results of the experiment it is categorized into 2 phases. In the first phase, it is checked that there are no relationship among the group of schema metrics. In the second phase, there are relationship into the set of average compile time and the average operation time, which has been evaluated to identify the group of metrics. The feasible factor of the conceptual data model has been significantly influenced by the group of metrics [28].
This is performed the independency test using non-parametric chi-square test. In both type of analysis the level of significance is set to α=0.10. Thus in both types of analysis if ρ value ( 2tailed ) <0.10 the null hypothesis H0 will be rejected .
Phase- I: There is no relationship between the set of schema level metrics, which are tested using non-parametric chi-square test. The result has been shown in below(spss-22):
Chi-Square Tests
Fig.2. Phase I Chi-Square Test Query Throughput* Query Name
Chi-Square Tests
Fig.3. Phase I Chi-Square Test QEP* Query Name
Chi-Square Tests
Fig.4. Phase I Chi-Square Effectiveness * Query Name
The following Hypothesis is considered for the purpose. H01: Non significant relationship among all attributes. H11: Significant relationship among all attributes.
Reject H01, if p- value<0.10.
In Chi-Square Test all the obtained p- value are greater than α value 0.10. Hence it is significant that there is no significant relationship in all schema level metrics.
CASE I:
Hypothesis Test Summary
Chi-Square Tests
Fig.8. Phase II Chi-Square Test Evaluation Time*Query Name
Null Hypothesis Test Sig. Decision
The distribution of QueryThroughpi^1 Retain the
1 is the same across categories of k-r„J1|^ .406 null
QueryName. Wahis Test hypothesis.
I n d e p e n d e nt
2 The distribution of QEP is the sameSamples across categories of QueryName. Kruskal-
Wallis T est
Retain the .406 null hypothesis.
The distribution of Effectiveness is dependent- Retain the
3 the same across cateqories of LTmcJai .406 null
Q u e ryN a m e. Wallis est hyp oth esis.
Analyzing the above table, all the p-value obtained in Chi-Square Test is greater than α value 0.10 (its 0.22). it can be concluded that there exist a strong relation among compile time and evaluation time as the p-value>0.10, in each case. Hence, the proposed measure has significant influence on the Operability factor of conceptual level multidimensional data model.
Asymptotic significances are displayed. The significance level is .05.
Fig.5. CASE I Hypothesis Test Summary
CASE II:
Hypothesis Test Summary
Null Hypothesis |
Test |
Sig. |
Decision |
|
■1 |
The distribution of QueryThroughpi^j1®^^1®^' is the same across categories of Q^ryName. К Test |
.406 |
Retain the null hypothesis. |
|
2 |
Independent- The distribution of QEP is the sameSamples across categories of QueryName. Kruskal" " Wallis Test |
.406 |
Retain the null hypothesis. |
|
3 |
The distribution of Effectiveness is the same across categories of QueryName. |
IndependentSamples Kruskal-Wallis Test |
.406 |
Retain the null hypothesis. |
Asymptotic significances are displayed. The significance level is .05.
Fig.6. CASE II Hypothesis Test Summary
Phase-II: In this phase, there is a relation between proposed set of compile time and evaluation time. The set of average compile time and the average operation time, which has been evaluated to identify the existence of any significant influence of the group of metrics. The feasible factor of the conceptual data model has been significantly influenced by the group of metrics [26] .The result has been shown in below:
Chi-Square Tests
Fig.7. Phase II Chi-Square Test Compile Time*Query Name
-
VIII. Conclusion
In this article, a frame work of quality evaluation for conceptual level multidimensional data model has been discussed in general and for TCSS data model, in particular. The framework comprised of Direct and Indirect metrics. The set of proposed metrics use the mechanism for guiding the grade of data models from a practical point of view that is known as the objective of empirical study. An experiment has been setup for analyzing the group of metrics and the proposed quality measurements like query throughput and query execution performance. The process of empirical validation also aims to recognize the metrics and measurement from the proposed set. This article proposes a framework for quality evaluation of transactional calculus for semi-structured database system using TCSS X-Query with two different cases of data with different size.
This article also has been focused on empirical validation of the set of metrics and measurements to prove their practical utility. The several proposed metrics and operability factor of multidimensional conceptual data model significant influence is shown by empirical validation.
Список литературы Evaluation of Quality for Semi-Structured Database System
- Conrad R., Scheffner D., Freytag J. C., "XML conceptual modeling using UML", 19thIntl. Conf. on Conceptual Modeling, PP: 558-574, 2000.
- Anirban Sarkar, “Design of Semi-structured Database System: Conceptual Model to Logical Representation”,Book Titled: Designing, Engineering, and Analyzing Reliable and Efficient Software,Editors: H. Singh and K. Kaur, IGI Global Publications, USA, PP 74 – 95, 2013.
- McHugh J., Abiteboul S., Goldman R., Quass D., Widom J., "Lore: a database management system for semistructured data", Vol. 26 (3), PP: 54 - 66, 1997.
- Badia, A., "Conceptual modeling for semistructured data", 3rdInternational Conference on Web Information Systems Engineering, PP: 170 – 177, 2002.
- Mani M., “EReX: A Conceptual Model for XML”, 2ndInternational XML Database Symposium, PP 128-142, 2004.
- Suresh Jagannathan,Jan Vitek,Adam Welc, Antony Hosking , A Transactional Object Calculus, Dept of Comp.sc, Purdue University,West Lafayette,IN 47906, United States.
- Liu H., Lu Y., Yang Q., "XML conceptual modeling with XUML", 28thInternational Conference onSoftware Engineering, PP: 973–976, 2006.
- Combi C., Oliboni B., "Conceptual modeling of XMLdata", ACM Symposium on Applied Computing, PP: 467– 473, 2006.
- Wu X., Ling T. W., Lee M. L., Dobbie G.,"Designing semistructured databases using ORA-SSmodel", 2ndInternational Conference on WebInformation Systems Engineering, Vol. 1, PP: 171 –180, 2001.
- Seth Gilbert and Nancy Lynch. Brewer’s conjecture and the feasibility of consistent available, partition tolerant web services.SigActNews,June2002.
- Rita Ganguly, Rajib Kumar Chatterjee ,Anirban Sarkar. “Graph Semantic based Approach for Querying Semi-structured Database System.”22nd International Conference on SEDE-2013,pp:79-84.
- Seth Gilbert National University of Singapore and Nancy Lynch. Brewer’sMassachusetts Institute of TechnologyPerspectiveson the CAP Theorem.
- Soichiro Hidaka Zhenjiang Hu Kazuhiro Inaba Hiroyuki Kato , “Bidirectionalizing Structural Recursion on Graphs”,Techical Report, National Institute of Informatics, The University of Tokyo/JSPS Research Fellow, The University of Electro-Communications, August 31, 2009.
- Data Validation, Data Integrity, Designing Distributed Applications with Visual Studio NET,Arkady Maydanchik (2007), "Data Quality Assessment", Technics Publications, LLC
- Object Oriented Transaction Processing in the KeyKOS® Microkernel. William S. Frantz ,Periwinkle Computer Consulting ,16345 Englewood Ave. Los Gatos, CA USA 95032 rantz@netcom.com Charles R. Landau ,Tandem Computers Inc. 19333 Vallco Pkwy, Loc 3-22 ,Cupertino, CA USA 95014 landau_charles@tandem.com.
- Introduction to Object-Oriented Databases.Prof. Kazimierz Subieta, subieta@pjwstk.edu.pl,http://www.ipipan.waw.pl/~subieta Ni W., Ling T. W., “GLASS: A Graphical Query Language for Semi-structured Data”, 8th International Conference onDatabase Systems for AdvancedApplications, PP 363 –370, 2003.
- R. K. Lomotey and R. Deters, “Datamining from document-append NoSQL,” Int. J. Services Comput., vol. 2, no. 2, pp. 17–29, 2014.
- Braga, D., Campi, A. and Ceri, S., “XQBE (XQuery By Example): A visual interface to the standard XML querylanguage”, ACM Transactions on Database Systems(TODS), Vol.30 (5), pp. 398 – 443, 2003.
- AnirbanSarkar, "Conceptual Level Design of Semi-structured Database System: Graph-semantic Based Approach", International Journal of Advanced Computer Science and Applications, The SAI Pubs. , New York, USA, Vol. 2, Issue 10, pp112– 121,November, 2011. [ISSN: 2156-5570(Online) &ISSN : 2158-107X(Print)].
- T. W. Ling. A normal form for sets of not-necessarily normalized relations. In Proceedings of the 22nd Hawaii International Conference on System Sciences, pp. 578-586. United States: IEEE Computer Society Press, 1989.
- T. W. Ling and L. L. Yan. NF-NR: A Practical Normal Form for Nested Relations. Journal of Systems Integration. Vol4, 1994, pp309-340.
- Rita Ganguly,Anirban Sarkar “ Evaluations of Conceptual Models for Semi-structured Database system “.International Journal of Computer Applications.Vol 50, Issue 18, PP 5- 12,july,2012.[ISBN:973- 93-80869-67-3].
- Rami Sellami, Sami Bhiri , and Bruno Defude, “Supporting Multi Data Stores Applications in cloud Environments.” IEEE Transactions on services computing, vol-9, No-1,pp-59-71, January/February2016.
- O. Cur_e, R. Hecht, C. Le Duc, and M. Lamolle, “Data integration over NoSQL stores using access path based mappings,” inProc. 22nd Int. Conf. Database Expert Syst. Appl., Part I, 2011, pp. 481–495.
- ACID vs. BASE: The Shifting pH of Database Transaction Processing,By Charles Roe ,www.dataversity net.
- Basili, V.R., and Weiss, D.M.,1984, “ A Methodology for Collecting Valid Software Engineering Data,” IEEE Transactions on Software Engineering, Vol.SE-10, No.6.,November,pp.728-738.
- Rita Ganguly , Anirban Sarkar,” An Approach to Develop a Transactional Calculus for Semi- Structured Database System.” International Journal of Computer Network and Information Security(IJCNIS), Vol.11, No.9, pp.24-39,2019. DOI:10.5815/ijcnis.2019.09.04
- N.G.Das., Satistical Methods, Vol.I and II ,PP :546-558,2013.