Нечёткий анализ формальных понятий при разработке онтологий
Автор: Офицеров В.П., Смирнов С.В.
Журнал: Онтология проектирования @ontology-of-designing
Рубрика: Инжиниринг онтологий
Статья в выпуске: 4 (26) т.7, 2017 года.
Бесплатный доступ
Анализ формальных понятий (АФП) - строгая математическая теория анализа данных, в которой отражен классический подход к понятию как к фундаментальному эпистемологическому элементу, определяемому объемом и содержанием. АФП пригоден для вывода формальных онтологий из экспериментальных данных, представляющих предметные области, и в этом смысле нечёткий АФП (НАФП) - адаптация метода к реальному характеру этой информации. Новым является исследование генезиса нечёткости формальных контекстов, что вызывает необходимость включения в орбиту вывода онтологий специальных этапов первичной обработки данных. Показано, что некоторые причины рассматриваемой нечёткости имманентны технологии порождения формального контекста из экспериментальных данных. Другие факторы этой нечёткости выявлены в ходе морфологического анализа базовой эмпирической структуры - таблицы «объекты-свойства». Показано, что интерпретация дополнительной информации возможна с помощью элементарных приёмов нечёткого вывода. С критических позиций проанализированы варианты применения НАФП для построения нечётких онтологий.
Анализ формальных понятий, формальный контекст, формальная онтология, нечёткий вывод, нечёткое понятие
Короткий адрес: https://sciup.org/170178769
IDR: 170178769 | DOI: 10.18287/2223-9537-2017-7-4-487-495
Текст научной статьи Нечёткий анализ формальных понятий при разработке онтологий
FCA [1] provides effective ways of solving the problem of automatic formation of conceptual structures, describing the DI, relevant for researchers, according to the classical principles of analytical philosophy and mathematic foundations.
-
[2] covers the issues of “objective” formation of primary Formal Contexts (FC) of the DI, required for FCA and, in particular, provides the solution of the problem description of structural relations between DI objects. The offered method of knowledge extraction from the empirical data allows the construction of non-uniform semantic networks, which well correspond to the modern vision of computer specifications of DI ontologies [3], which in turn allows naming this technology as Ontological Data Analysis (ODA).
ODA establishes the relation between classical data analysis [4] and FCA, basing on the standard point of view, that the experimental material representing DI has the form of the “ Objects-Properties ” Table (OPT). However, it is stated that any measurement can give the special result “ None ”. This result means that either the analyzed object and the measuring procedure are not semantically compatible, or that the measured value is outside of the sensitivity interval or the range of measuring equipment. On the other hand, in FCA similar effects are reached as a result of performance the cognitive procedure called “conceptual scaling” [1, 5]. Its essence is subjective splitting of ranges of measurement means for formation of new distinctive objects properties. Somehow or other, “None-conception” considerably changes a paradigm of the experimental data analysis, and OPT can be transformed in FC of DI.
A FC is a triple ( G *, M , I ) consists of two finite set of objects G * (an empirical samples) and set of properties M (the arsenal of measuring procedures which the researcher has), and a binaryrelation I between the objects and the properties (i.e., I c G x M) . Each element b y e I is a truth value of the Basic Semantic Proposition (BSP) which has a form of “ g i object has m j property”, g i e G * , m j e M .
FC contains by design three of the four main semantic abstractions - classification , aggregation , association. According to FCA the constructed FC generates the lattice of formal concepts, in which the ordering relationship implements the fourth semantic abstraction – generalization (“ is a ” relation) – as well.
In this way, ODA automates construction of ontologies on the basis of DI measurements. At the same time, practical problems show that the truth value of BSP is quite often vague, for example, it is formed by an expert, on the basis of experience and intuition. That’s why for the indication of validity BSP it is more natural to use the truth values entered by fuzzy or multivalued logics. Therefore one of real problems is the revision of FFCA use results [6-8] for construction of ontologies. For example in FFCA publications practically ignore the very important question on genesis of an of the input data fuzziness. Similar circumstances have induced to analyze sources, the description and the processing of fuzzy FC during construction of ontologies on the basis of the FCA.
1 Genesis of the FC fuzziness
According to traditional OPT methodologies the OPT lines correspond to the objects which were selected by the researcher during DI analysis (i.e. have formed the empirical sample of objects), and the OPT columns reflect the a priori equipment of the researcher in terms of the measuring procedures .
The arsenal of measuring procedures is formed by the researcher subjectively, according to a priori hypotheses about the existence of “simple” measurable properties (Hypotheses about the Properties - PH- hypotheses) of empirical objects, or about the participation of the empirical objects in structural relations (Hypotheses about the Structural relations - SH- hypotheses, SH n PH = 0 ). At the same time in the general case, for the research of each SH- hypothesis the number of measuring procedures needs to be equal to the arity of the corresponding structural relation. However it is clear, that it is sufficient to limit the analysis by considering only binary relations between objects without impact on the correctness of the conceptual structures description. (We can notice that “simple” properties can be considered as unary relations; however in ODA properties and relations are strictly different. Moreover, the presence of relations between objects is treated as display of the object’s inner properties [2].)
Unlike the traditional applied data analysis which in fact proceeds from a priori consistency of all starting hypotheses SH и PH , ODA investigates the common case when the result None can be observed during the execution of any of the measuring procedures. It means that the result of the experiment was inconsistent with the corresponding hypothesis.
Such understanding of the initial DI data formation stage allows to derive an algorithm of FC construction which describes classes of empirical DI objects in terms of their heterogeneity, both by the structure of measurable properties, and by the ability to participate in structural relations:
-
1) Transform the OPT – matrix A = ( a ij ) i =1,…, r ; j =1,…, s - into the incidence matrix “Objects-Properties” I = ( b ij ) i =1,…, r ; j =1,…, s :
b = i1
" 1, if a i, * None ,
< 0 in the oppositecase.
-
2) Exclude from consideration PH and SH hypotheses which have turned out completely inconsistent in the selected set of empirical objects, which means remove zero columns from I (in case SH -hypotheses remove from I pairs of zero columns, corresponding to each hypothesis).
-
3) If zero lines are present in I , state the existence of a class of the unidentified objects in DI and introduce a posterior PH -hypothesis of existence of such class of objects. This is done by adding a new column to I, describing the incidence of the introduced special hypothesis and the class of unidentified objects.
-
4) If only one zero column of a pair of columns, corresponding to an SH -hypothesis is present in I , state (due to “one-way” confirmation of the SH -hypothesis) the existence of some special class of objects in DI, which are not represented in the empirical sample. This is fixed by adding a new line to I , describing the incidence of the newly introduced class of objects and SH -property, which is not validated by the input empirical material.
Step 2 of the algorithm reduces and steps 3 and 4 expand I . The resulting binary matrix which determines the sought FC, will have dimension p x q , 1 < p < r + | SH , 1 < q < 5 + 1.
-
1.1 The immanent fuzziness of a FC
The analysis of the ODA formal context construction algorithm allows to specify its three sources of fuzziness.
First of all, undoubtedly, step 4 of the algorithm describes only one action options of 2 s -1 possible at formation of a line for the unidentified object. Strictly speaking, the incidence matrix I should be expanded not by one, but by 2 s -1 lines which will be as a whole the “model” of incompleteness of the input empirical material, which is determined as fulfillment of step 4 precondition. Certainly, this decision is unreasonable.
If we allow using statements of fuzzy logic in ODA , the initial data incompleteness, considered in step 4 can be fixed as different grade of belonging of hypothetically allowable properties to the objects of the newly introduced class:
-
• for the SH -property, which is not confirmed by empirical material, the grade of belonging is set equal to 1;
-
• for all other 5 - 1 properties it is equal to 0.5.
Any measuring procedure can give special result “ Failure ” which means default of a task of measurement (breakdown, failure of measuring means, abstention at voting, etc.). This is second source of FC fuzziness. Detection of value Failure in OPT cell is reasonable for reflecting in corresponding FC element as the greatest fuzziness of relation “Objects-Properties”, i.e. 0.5.
At last, the internal reason of FC fuzziness can be application of fuzzy scales to the conceptual scaling for uniformity elimination of empirical sample G*.
For example, if property m j is exposed nominal scaling [5], then OPT column j “is split”, i.e. is replaced k ( k > 2) columns which are compared with “base m , - terms” of used conceptual scale. The result of measurement of m j determines membership values to the m j - terms entered by a conceptual scale. These values (in case of precise scales - from set {0, 1}, in case of fuzzy scales - from a segment [0, 1]) place in again formed columns of the OPT.
-
1.2 Extended view for the empirical OPT
Let us analyze the appearance of FC fuzziness which is caused by possible variations of the structure and contents of the input information about DI, taken as the extended view on the empirical OPT.
-
1.2.1 Presence of the data on repeated object measurements
Usually it is considered, that each measuring procedure, applied to the observed object delivers to OPT the single value a ij . Generalizing this statement, it is possible to admit that OPT is a hypermatrix A = ( ay ) i =1,_, r; j =1,_, s , where ay = ( a ( j ) l ) l =1 ,..., l is a vector of values which records the repeated measurements of property m j of object g i .
Then in view of told in subitem 1.1 the step 1 of the FC construction algorithm should be executed as follows:
-
• < 1.a ) Transform OPT - hypermatrix A to binary hypermatrix I ( h ) = ( b y ) i =1 r ; j =1 s , where by = ( b ( j ) i ) i =1 ,..., i j and
a(iy)l, if the column j is the my term, b (iy ) i =
0, if a(y^i = None ,
°.5, if a ( y ) i = Failure , J in the oppositecase.
-
• < 1.b ) Construction of the fuzzy relation “Objects-Properties” I uniting results of repeated properties measurements of objects. Hypermatrix I ( h ) contains these results as sets of independent estimations of the truth value for everyone BSP determined by this matrix. The fuzzy logic supposes various ways for combination of these estimations. We prefer a method of “amplificationaveraging” - to a special case of combination on the basis of composite addition according to triangulated s -norm x Ф y = min(1, x + y ):
I = ( b i y ) i =1,..., r ; y =1,..., s , b iy =у ^ = 1 b ( iy ) i .
lij
-
1.2.2 Considering the level of trust to sources
Commonly, all measuring procedures are by default considered as the set of authentic data sources about DI. It is easy to imagine a situation when the researcher differentiates his trust and supplies the OPT with a vector ( t y ) y=1,_, s , where t y e [0, 1] - is the degree of belonging of measuring procedure j to the set of authentic sources.
The degree t j is to combine with the truth value of BSP which was made by the source j . Among possible ways of fuzzy measures combination here we prefer composite multiplication according to triangulated t -norm x • y = xy . Thus, the step 1 of the FC construction algorithm should be continued by the following transformation of the “Objects-Properties” relation:
-
• ( 1.c ) I ^ I : b iy : = t y b iy .
-
1.2.3 Plurality of substantially equivalent sources
A rather widespread practical approach in research is the use of several independent authentic sources for evaluating the same factor. It is obvious that this situation does not differ from analyzed above a case of repeated object measurements. As before for the complex estimation of the truth value of everyone BSP reasonable a method of “amplification-averaging”. Therefore the step 1 of the FC construction algorithm should be supplemented with one more transformation of the “Ob-jects-Properties” relation:
-
• < 1.d > I о I : btu л : = . bM ,
(Jm) J yLjj e Jm y where J1 ,...,Jl is the sets of congruent column indexes of OPT, Jn n Jm = 0 at m + n (m, n = 1,..., 1), |Jm| > 1.
Thus the number of columns of I decreases up to the value 5 + 1 - ^ 1 1 1 Jm |.
2 Fuzzy formal context processing
It’s assumed that a special type of FCA is used for fuzzy FC procession – fuzzy FCA or FFCA. It’s only partially true, because FFCA combines quite dissimilar group of methods:
-
• alpha-section method for fuzzy FC which used for crisp sets output into ODA [8];
-
• alpha-section method for fuzzy FC, when FC interprets as a complex of fuzzy properties each of which describes one of the fuzzy FC objects [8-10]. That one-sided preference for objects is used for fuzzy concept lattices construction, which can be considered as fuzzy ontologies “ske l etons”. Theoretically there is an alternative view, when preference is given to properties (that’s why another name of this method is asymmetric threshold scheme);
-
• approach that uses fuzzy set closure operator [11]. This approach represents fuzzy FC as a whole (i.e. without preference for objects or properties) and doesn’t use threshold. Today this complicated in theoretical and computational ways method arouses only academic interest because it generates huge amount of fuzzy concepts even for small- sized “sparse” fuzzy FC.
-
2.1 Crisp ontologies output in ODA
Let us take a detailed look on alpha-section fuzzy FC method variations.
The correspondence I of a FC fuzzy (as any fuzzy relation) can be decomposed by its crisp relations of level a e (0, 1]:
I = ^a e (0, 1] a 1 a\ b ( a ) \V if b y >a; \
[ 0 in the opposit ecas e.
Every crisp (binary) relation I a ), or a- approximation fuzzy relation I , clearly determines crisp FC in logical sense:
-
• all BSP of initial fuzzy FC are preserved;
-
• all BSP, which truth value doesn’t reach a - chosen by user DI initial data confidence threshold, - are considered as false, the rest are considered as true.
-
2.2 Fuzzy concepts
ODA is limited by this well-defined method1 and crisp ontology output from fuzzy FC alphasection by classic FCA.
It is easy to show, that finite number of different DI crisp ontologies can be obtained by varying of a threshold in this fuzzy FC [7]. In this case, and with hardening or considerably easing requirements for BSP truth value, the impoverishment effect of ontology specifications – the amount of concepts reduction and defined in this concept set order degradation - can be theoretically predicted and experimentally approved.
According to the scheme of the asymmetric threshold the construction of crisp conceptual structure, which is considered to be the final result in ODA, is only the first stage. The second and final stage of this method is the fuzzification of the created crisp formal concepts. Detected at the first stage partial order relation «is a» between the concepts remains crisp.
In the context of « α -crisp» FC ( G *, M , I ( α )) a formal concept is defined by the volume X ⊆ G* and content Y ⊆ M , where X' = Y and Y' = X , and « ' » is a Galois operator [1]. Asymmetric threshold scheme prescribes to convert each found crisp formal concept ( X , Y ) into the fuzzy one with saving a crisp content, but with the reconstruction of fuzzy volume based on initial fuzzy FC ( G *, M , I ):
(X, Y) ~> (Xf, Y), where Xf – is a fuzzy set with the universum X, such that for every x ∈ X membership value to Xf is defined by the truth value of BSP conjunction for each Y - properties that make up the content of a fuzzy concept. Usually it is offered to evaluate this membership value by using the minconjunction:
µ ( x ∈ X f ) = min y ∈ Y I ( x , y ).
It seems that protagonists of the scheme of the asymmetric threshold in FFCA make two diverse but related errors.
At first, the fundamental methodological error should be noted. The proposed method of constructing fuzzy concepts is positioned as data mining technique, but its result directly involves a training empirical selection of DI items! It's like to find in Newton's second law the weight of an apple fallen on his head!
Indeed, the obtained membership function of an arbitrary DI object to the volume of fuzzy concept ( X f , Y ) is defined in the end only as parts of training sample– for x ∈ X ⊆ G* . This means that, generally speaking, it is impossible to attribute an arbitrary object of investigated DI to the any of constructed fuzzy sets (i.e. to classify object). Equally it is impossible to use constructed concept system to describe an arbitrary DI object (i.e. to generate an information model of an object).
In general, it must be noted that this actively promoted approach to the construction of fuzzy ontology needs a radical development of ways to eliminate the influence of the training sample on the empirical data analysis result. In this sense, change of preference in asymmetry scheme in favor of DI properties seems to be more promising.
The second error of an asymmetric threshold scheme is the thesis that conjunction of BSP for all properties of fuzzy concept is a unique requirement of fuzzy concept definition. Calculation of an estimation of the truth value of specified BSP conjunction looks as unsuccessful attempt of empirical data generalization, which only masks the basic methodological problem of considered method FFCA. To recognize a similar estimation as the description of fuzzy concept, its calculation needs to be anticipated, at least, combination of BSP truth value on each property (here would be again reasonable to use a method of “amplification - averaging”).
3 About the practical application
We used elements of the approach, presented in the paper, in many cases where there was a need for structured object domains description in decision support applications, in particular:
-
■ designing color scheme of UI elements of software tools in order to improve usability;
-
■ designing ontology driven subject-oriented interface to large relational databases;
-
■ determining target population groups during the formation of state social support programs;
-
■ market research.
Unfortunately, corresponding examples are too big for the paper, but we can relate to the published result of car market research, based on users’ preferences [15].
In any case, these heterogeneous examples are common due to well-defined input data characteristics. Opinions about object attributes were presented by expert focus groups, users or just non- related persons, and this data was both complementing and contradictory. One source could contain much more data than the other, while customers trust was uneven to different sources. We found that in general case, the result of such consolidation is a fuzzy formal context, which is processed differently from the conventional one (but it reduces to the usual case).
Conclusion
-
■ In the paper the need for using the fuzzy logic paradigm in the method of ontologies construction on the basis of Formal Concept Analysis has been proven.
-
■ The morphological analysis of the possible extensions of the “Objects-Properties” Table - the standard form of the initial information about the object domain being researched, and the use of basic algorithms of fuzzy conclusions allowed to construct additional models of various situations which result in the formation of fuzzy Formal Contexts describing the researched domain of interest.
-
■ Obtaining the intermediate result of the Ontological Data Analysis in the form of fuzzy Formal Context does not lead to the revision of the ontology construction method itself, based on principles of Formal Concept Analysis, but additionally demands making decisions about the value of trust to the input data threshold.
-
■ The current approach to construction of fuzzy ontologies based on Formal Concept Analysis is criticized because of obvious methodological mistakes. The analysis of these mistakes allows us to hope for constructive development of a method of creating fuzzy conceptual structures.
Список литературы Нечёткий анализ формальных понятий при разработке онтологий
- Ganter B, Wille R. Formal Concept Analysis. Mathematical foundations. Springer Berlin-Heidelberg, 1999.
- Smirnov SV. Ontological analysis of modeling domain [In Russian]. Bulletin of the Samara Scientific Center of RAS, 2001. 3(1): 62-70.
- Guarino N. Formal ontology, conceptual analysis and knowledge representation. Int. J. of Human Computer Studies 1995; 43(5/6): 625-640.
- Zagoruyko NG. Applied methods of data and knowledge analysis [In Russian]. Novosibirsk: Sobolev Institute of Mathematics, SB RAS, 1999.
- Ganter B, Wille R. Conceptual scaling. In: F. Roberts (Ed.): Applications of Combinatorics and Graph Theory to the Biological and Social Sciences. Springer-Verlag New York; 1989: 139-167.