Prediction of Missing Values for Decision Attribute
Автор: T. Medhat
Журнал: International Journal of Information Technology and Computer Science(IJITCS) @ijitcs
Статья в выпуске: 11 Vol. 4, 2012 года.
Бесплатный доступ
The process of determining missing values in information system is an important issue for decision making especially when the missing values are in the decision attribute. The main goal for this paper is to introduce algorithm for finding missing values of decision attribute. Our approach is depending on distance function between existing values. These values can be calculated by distance function between the conditions attributes values for the complete information system and incomplete information system. This method can deal with the repeated small distance by eliminating a condition attribute which has the smallest effect on the complete information system. This algorithm will be discussed in detail with an example of a case study.
Rough Sets, Degree of Dependency, Distance Function, Missing Values
Короткий адрес: https://sciup.org/15011783
IDR: 15011783
Текст научной статьи Prediction of Missing Values for Decision Attribute
Published Online October 2012 in MECS
Classical rough set theory developed by Professor Z. Pawlak in 1982 has made a great success in knowledge acquisition in recent years [1,12]. In Rough set theory, knowledge is represented in information systems. An information system is a data set represented in a table, decision table [2]. Each row in the table represents an object, for example a case or an event. Each column in the table represents an attribute, for instance a variable, an observation or a property. To each object (row) some attribute values are assigned. One of the disadvantages of rough set theory is its dependence on complete information systems i.e. For a decision table to be processed, it must be complete and its all objects values must be known [3]. But in real-life applications, due to measurement errors, miscomprehension, access limitation and disoperation in register, etc, information systems with missing values often occur in knowledge acquisition. Information systems with missing data, or, in different words, the corresponding decision tables are incompletely specified, are called incomplete information systems [4]. For simplicity, incompletely specified decision tables will be called incomplete decision tables.
Often, intelligent techniques such as neural networks, decision trees, fuzzy theory, etc. [5] are based on quite strong assumptions (e.g. knowledge about dependencies, probability distributions, large number of experiments). They cannot derive conclusions from incomplete knowledge, or manage inconsistent information.
Rough set theory [6,12,13] can deal with uncertainty and incompleteness in data analysis. It deems knowledge as a kind of discriminability. The attribute reduction algorithm removes redundant information or features and selects a feature subset that has the same discernibility as the original set of features. From the medical point of view, this aims at identifying subsets of the most important attributes influencing the treatment of patients. Rough set rule induction algorithms generate decision rules [10], which may potentially reveal profound medical knowledge and provide new medical insight. These decision rules are more useful for medical experts to analyze and gain understanding into the problem at hand. Rough sets have been a useful tool for medical applications. Hassanien [2] applies rough set theory to breast cancer data analysis. Tsumoto [15] proposed a rough set algorithm to generate diagnostic rules based on the hierarchical structure of differential medical diagnosis. The induced rules can correctly represent experts’ decision processes. Komorowski and Ohrn [6] use a rough set approach for identifying a patient group in need of a scintigraphic scan for subsequent modeling. Bazan [1] compares rough set-based methods, in particular dynamic reducts, with statistical methods, neural networks, decision trees and decision rules. He analyzes medical data, i.e. lymphography, breast cancer and primary tumors, and finds that error rates for rough sets are fully comparable as well as often significantly lower than that for other techniques. In Ref. [3,14], a rough set classification algorithm exhibits higher classification accuracy than decision tree algorithms. The generated rules are more understandable than those produced by decision tree methods.
The core of the proposed approach is how to predict the value of the decision attribute by using the distance function and degree of dependency.
The approach we used depends on determining the decision attribute values for missing values of decision attributes. By using a distance function between complete decision table and incomplete decision table, we can predict the decision of missing values.
In this paper, we apply rough sets to predict the decision of missing values. A rough set feature selection algorithm is used to select feature subsets that are more efficient (we say the feature subset is ‘more efficient’ because, by the rough set approach, redundant features are discarded and the selected features can describe the decisions as well as the original whole feature set, leading to better prediction accuracy. The selected features are those that influence the decision concepts, so will be helpful for cause-effect analysis). The chosen subsets are then employed within a decision rule generation process, creating descriptive rules for the classification task. The rough set rule-based method can achieve higher classification accuracy than other intelligent analysis methods.
The present paper is organized as follows. In section 2, the main concepts of rough sets are introduced. The proposed new method for giving a decision for missing values is demonstrated in Section 3. The algorithm and classification method are described in Section 4 by an example. Section 5 is conclusion.
NEG ( Q ) = U - ∪ PX (5)
P X ∈ U / Q
BND ( Q ) = ∪ PX - ∪ P X
P X ∈ U / Q X ∈ U / Q
The positive region of the partition U/D with respect to C, POS ( D ) , is the set of all objects of U that can be certainly classified to blocks of the partition U/D by means of c. d depends on c in a degree k ( 0 ≤ k ≤ 1 ) denoted C ⇒ D
k = γ c ( D ) =
I POSc ( D ) I
U I
If k=1, D depends totally on C, if 0
II. Basic Concepts
Let I = ( U , A ∪ {d} ) be an information system, where U is the universe with a non-empty set of finite objects. A is a nonempty finite set of condition attributes, and d is the decision attribute (such a table is also called decision table). ∀ a ∈ A there is a corresponding function fa : U → Va , where Va is the set of values of a . If P ⊆ A , there is an associated equivalence relation [7,10,11]:
IND(P) = {(x, y) ∈ U × U|∀a ∈ P, f a (x) = f a (y)} (1)
The partition of U , generated by IND( P ) is denoted U / P . If ( x , y ) ∈ IND( P ), then x and y are indiscernible by attributes from P .
The equivalence classes of the P -indiscernibility relation are denoted [ x ] P . Let X ⊆ U , the P -lower approximation P X and P -upper approximation ¯PX of set X can be defined as :
PX = {x ∈ U | [x] ⊆ X}
PX ={x∈U |[x] ∩X ≠φ}
Let P , Q ⊆ A be equivalence relations over U, then the positive, negative and boundary regions can be defined as:
POSP(Q)=X∈∪U/QPX(4)
III. The New Method
We will introduce a method (depending on the distance function) to detect the decision for missing values. This will be done by calculating the distance function between complete decision table and incomplete decision table. If the small distance is repeated with more than one object, then we must eliminate one of the condition attribute which has a small effect on the information system, by using the quality of classification, and then calculating the distance function again. The decision for missing values equals the decision of object which the smallest distance will be found between them.
3.1 Distance Function
The distance between the complete decision table and incomplete decision table can be calculated by the following function:
dis ( O incomp ._ i
, O
comp ._ i
) =
N
∑ [ c i ( O incomp . ) - c i ( O comp . )] i = 1
comp ._ i , incomp ._ i ∈ ,
O is an incomplete decision object ,
O is a complete decision object, c ∈C ,N =C ; number of condition attributes where c(O ) is the value of condition attribute c comp.
with respect to the object O .
-
3.2 New method
Calculate the distance function.
Put all of distance functions in a new array A(m).
Compute the smallest number and its order of the array A(m).
The smallest number means that the decision for missing values of the incomplete decision table equals the decision value of the objects which its order is the order of the smallest number.
We can make this as follows:
-
• If incomplete object O. . has smallest
ncomp .
distance with only one complete object
O , then the decision of the comp . i incomplete object Oincomp._i equals the decision of the complete object O .
•
•
If the incomplete object incomp._i has the smallest distance with more than one complete objects comp._i ,…. and
O comp ._k , where the decision of these complete objects equals, then the decision of incomplete object incomp._i equals the same decision of these complete objects.
If the incomplete object incomp._i has the smallest distance with more than one complete objects comp._i ,…. and
O comp._k , but its decision is different, then we can't give the decision of the incomplete
O object incomp._i . To give the decision of incomplete object incomp._i , we need to eliminate the attribute which has small effects on the information table by using the classification
γ C ( D )
.
After
quality of deleting the
attribute which have small effects, we determine the distance between the incomplete object incomp._i and complete
O comp ._k
According put the
objects comp._i ,…. and not all completed objects.
to the new small distance, we can decision of missing values as mentioned in the previous steps.
-
IV. The New Algorithm
We need to make an algorithm that can be used for detecting the value of missing values in the incomplete decision table, according to the given complete decision table.
The algorithm depends on 6 main steps:
-
a) Reading complete decision table and incomplete decision table.
-
b) Calculating the distance between incomplete objects and complete objects.
-
c) Arranging the values of the small distance.
-
d) Detecting repeated small distance.
-
e) Putting the decision of missing values.
-
f) Determining the decision of incomplete object which has decision value x(non).
These steps will be shown in more details as shown below:
-
a) Read complete decision table and incomplete decision table:
> Read complete objects condition data "condition attributes values of complete objects", put it in an array a(i,j)
> Read complete objects decision data " decision attribute values of complete objects", put it in an array oldd(i)
> Read incomplete objects condition data "condition attributes values of incomplete objects", put it in an array nd(I,j)
> Let m = the number of complete objects, n = the number of condition attributes
> Let mm = the number of incomplete objects.
-
b) Calculate the distance between incomplete objects and complete objects.
> Put the value of distance function in array dis(I,j)
> Put the value of incomplete object number , complete object number, small distance and decision of complete object in array new_dis(I,j)
For i = 1 To mm
For j = 1 To m d = 0
For k = 1 To n d = d + (nd(i, k) - a(j, k)) 2
Next k dis(i, j) = d 0.5
new_dis(j + i * m - m, 1) = i, new_dis(j + i * m -m, 2) = j new_dis(j + i * m - m, 3) = dis(i, j), new_dis(j + i * m - m, 4) = oldd(j)
Next j
Next i
-
e) Putting the decision of missing values:
-
c) Arrange the values of array new_dis(i,j) as ascending order:
For k = 0 To mm - 1
For i = 1 To m - 1
For j = 1 + i To m
If (new_dis(i + k * m, 3) > new_dis(j + k * m, 3)) Then x = new_dis(i + k * m, 3), new_dis(i + k * m, 3) = new_dis(j + k * m, 3)
new_dis(j + k * m, 3) = x, x = new_dis(i + k * m, 1), new_dis(i + k * m, 1) = new_dis(j + k * m, 1)
new_dis(j + k * m, 1) = x x = new_dis(i + k * m, 2), new_dis(i + k * m, 2) = new_dis(j + k * m, 2)
new_dis(j + k * m, 2) = x x = new_dis(i + k * m, 4), new_dis(i + k * m, 4) = new_dis(j + k * m, 4)
new_dis(j + k * m, 4) = x
End If
Next j
Next i
Next k
-
d) Putting the values of array new_dis(i,j) into array new_decision(i, j) when finding repeated small distance:
For i = 1 To k
If (new_decision(i, 1) = new_decision(i + 1, 1)) Then
i = i + 1
new_decision(i + 1, 2); new_decision(i + 1, 3); "x"
i = i + 1
End If
End If
Next i
-
f) Determine the decision of incomplete object which has decision value x(non).
Eliminate the attribute which has a small effect on the system, and try the algorithm again. This will be done by calculating the quality of classification of data for each condition attribute as in equation (7):
-
Example 1
The optician's decisions data set concerns an optician's decisions as to whether or not a patient is suited to contact lens use. The set of all possible decisions is listed in Table 1.
Let k = 0, c = 0
For i = 1 To mm * m
-
k = k + 1
If (new_dis(i, 3) ≠ new_dis(i + 1, 3)) Then new_decision(k, 1) = new_dis(i, 1), new_decision(k, 2) = new_dis(i, 2)
new_decision(k, 3) = new_dis(i, 3), new_decision(k, 4) = new_dis(i, 4)
i = i + m - c - 1
-
c = 0
Else new_decision(k, 1) = new_dis(i, 1), new_decision(k, 2) = new_dis(i, 2)
new_decision(k, 3) = new_dis(i, 3), new_decision(k, 4) = new_dis(i, 4)
-
c = c + 1
End If
Next i
Experimental Results
By converting the data in table 1 as follows, Converting condition attributes as follows:
a- Age
Young ^ 10
Pre-presbyopic ^ 20
Presbyopic ^ 30
c- Astigmatic
No ^ 10
Yes ^ 20
b- Spectacle
Myope ^ 10
Hypermetrope ^ 20
d- Tear production rate
Normal ^ 10
Reduced ^ 20
And converting the decision attribute as follows:
D- Optician's decisions hard contact lenses ^10, soft contact lenses ^20, no contact lenses ^30
Table 1: The optician's decisions data set
U/A |
Condition attributes |
Decision attribute (Optician's decision) |
|||
Age |
Spectacle |
Astigmatic |
Tear production rate |
||
P1 |
Young |
Hypermetrope |
No |
Reduced |
? |
P2 |
Young |
Hypermetrope |
No |
Normal |
soft contact lenses |
P3 |
Pre-presbyopic |
Hypermetrope |
No |
Reduced |
no contact lenses |
P4 |
Pre-presbyopic |
Hypermetrope |
No |
Normal |
soft contact lenses |
P5 |
Presbyopic |
Hypermetrope |
No |
Reduced |
no contact lenses |
P6 |
Presbyopic |
Hypermetrope |
No |
Normal |
soft contact lenses |
P7 |
Young |
Hypermetrope |
Yes |
Reduced |
? |
P8 |
Young |
Hypermetrope |
Yes |
Normal |
hard contact lenses |
P9 |
Pre-presbyopic |
Hypermetrope |
Yes |
Reduced |
no contact lenses |
P10 |
Pre-presbyopic |
Hypermetrope |
Yes |
Normal |
no contact lenses |
P11 |
Presbyopic |
Hypermetrope |
Yes |
Reduced |
no contact lenses |
P12 |
Presbyopic |
Hypermetrope |
Yes |
Normal |
no contact lenses |
P13 |
Young |
Myope |
No |
Reduced |
? |
P14 |
Young |
Myope |
No |
Normal |
? |
P15 |
Pre-presbyopic |
Myope |
No |
Reduced |
no contact lenses |
P16 |
Pre-presbyopic |
Myope |
No |
Normal |
soft contact lenses |
P17 |
Presbyopic |
Myope |
No |
Reduced |
no contact lenses |
P18 |
Presbyopic |
Myope |
No |
Normal |
no contact lenses |
P19 |
Young |
Myope |
Yes |
Reduced |
? |
P20 |
Young |
Myope |
Yes |
Normal |
? |
P21 |
Pre-presbyopic |
Myope |
Yes |
Reduced |
no contact lenses |
P22 |
Pre-presbyopic |
Myope |
Yes |
Normal |
hard contact lenses |
P23 |
Presbyopic |
Myope |
Yes |
Reduced |
no contact lenses |
P24 |
Presbyopic |
Myope |
Yes |
Normal |
hard contact lenses |
Then, we get the following table (Table 2) which can table and Table 4 of incomplete information table as be converted into two; Table 3 of complete information follows:
Table 2: The optician's decisions data set after converting attribute values into numbers
U/A |
C |
D |
|||
a |
B |
c |
d |
||
p1 |
10 |
20 |
10 |
20 |
? |
p2 |
10 |
20 |
10 |
10 |
20 |
p3 |
20 |
20 |
10 |
20 |
30 |
p4 |
20 |
20 |
10 |
10 |
20 |
p5 |
30 |
20 |
10 |
20 |
30 |
p6 |
30 |
20 |
10 |
10 |
20 |
p7 |
10 |
20 |
20 |
20 |
? |
p8 |
10 |
20 |
20 |
10 |
10 |
p9 |
20 |
20 |
20 |
20 |
30 |
p10 |
20 |
20 |
20 |
10 |
30 |
p11 |
30 |
20 |
20 |
20 |
30 |
p12 |
30 |
20 |
20 |
10 |
30 |
p13 |
10 |
10 |
10 |
20 |
? |
p14 |
10 |
10 |
10 |
10 |
? |
p15 |
20 |
10 |
10 |
20 |
30 |
p16 |
20 |
10 |
10 |
10 |
20 |
p17 |
30 |
10 |
10 |
20 |
30 |
p18 |
30 |
10 |
10 |
10 |
30 |
p19 |
10 |
10 |
20 |
20 |
? |
p20 |
10 |
10 |
20 |
10 |
? |
p21 |
20 |
10 |
20 |
20 |
30 |
p22 |
20 |
10 |
20 |
10 |
10 |
p23 |
30 |
10 |
20 |
20 |
30 |
p24 |
30 |
10 |
20 |
10 |
10 |
Table 3: Complete decision system
U/A |
C |
D |
|||
a |
b |
c |
D |
||
p2 |
10 |
20 |
10 |
10 |
20 |
p3 |
20 |
20 |
10 |
20 |
30 |
p4 |
20 |
20 |
10 |
10 |
20 |
p5 |
30 |
20 |
10 |
20 |
30 |
p6 |
30 |
20 |
10 |
10 |
20 |
p8 |
10 |
20 |
20 |
10 |
10 |
p9 |
20 |
20 |
20 |
20 |
30 |
p10 |
20 |
20 |
20 |
10 |
30 |
p11 |
30 |
20 |
20 |
20 |
30 |
p12 |
30 |
20 |
20 |
10 |
30 |
p15 |
20 |
10 |
10 |
20 |
30 |
p16 |
20 |
10 |
10 |
10 |
20 |
p17 |
30 |
10 |
10 |
20 |
30 |
p18 |
30 |
10 |
10 |
10 |
30 |
p21 |
20 |
10 |
20 |
20 |
30 |
p22 |
20 |
10 |
20 |
10 |
10 |
p23 |
30 |
10 |
20 |
20 |
30 |
p24 |
30 |
10 |
20 |
10 |
10 |
Table 4: Incomplete decision system
U/A |
C |
D |
|||
a |
b |
c |
D |
||
p1 |
10 |
20 |
10 |
20 |
? |
p7 |
10 |
20 |
20 |
20 |
? |
p13 |
10 |
10 |
10 |
20 |
? |
p14 |
10 |
10 |
10 |
10 |
? |
p19 |
10 |
10 |
20 |
20 |
? |
p20 |
10 |
10 |
20 |
10 |
? |
By calculating the distance function according to the new method and algorithm; we get the results in Table 5
In Table 5; we see that:
Object P13 has the smallest distance with only one object P15, so the decision of object P13 has the decision of object P15 (which be 30).
Also, the decision of object P19 has the decision of object P21 (which be 30)
But the object P14 has the smallest distance with two objects P2 and P16, where the decision of P2 and P16 are equal (which be 20), so the decision of object P14 is also equal 20.
In Addition, the decision of P20 equal 10.
But the object P1 has the smallest distance with objects P2 and P3, where its decision is different (20 and 30).
So , we can't give the decision of the object P1 or P7.
Table 5: Decision Table of Missing Values of Some Objects
Objects with no decision |
Objects with decision |
Small distance |
Old decision |
New decision |
P1 |
P2 |
1 |
20 |
? |
P3 |
1 |
30 |
? |
|
P7 |
P8 |
1 |
10 |
? |
P9 |
1 |
30 |
? |
|
P13 |
P15 |
1 |
30 |
30 |
P14 |
P2 |
1 |
20 |
20 |
P16 |
1 |
20 |
||
P19 |
P21 |
1 |
30 |
30 |
P20 |
P8 |
1 |
10 |
10 |
P22 |
1 |
10 |
To give the decision of objects P1 and P7, we need to eliminate the attribute which has small effects on the information table according to the degree of dependency, as shown below:
U/IND(D)={
{P8,P22,P24},
{P2,P4,P6,P16},
{P3,P5,P9,P10,P11,P12,P15,P17,P18,P21,P 23}
}
U/IND({a})={
{P2, P8},
{P3, P4, P9, P10, P15, P16, P21, P22}, {P5, P6, P11, P12, P17, P18, P23, P24} }
U/IND({b})={
{P15, P16, P17, P18, P21, P22, P23, P24},
{P2, P3, P4, P5, P6, P8, P9, P10, P11, P12}
}
U/IND({c})={
{P2, P3, P4, P5, P6, P15, P16, P17, P18},
{P8, P9, P10, P11, P12, P21, P22, P23, P24}
}
U/IND({d})={
{P3, P5, P9, P11, P15, P17, P21, P23},
{P2, P4, P6, P8, P10, P12, P16, P18, P22, P24}
}
U/IND(C)={
{P2},{P3},{P4},{P5},{P6},{P8},{P9}, {P10},{P11},{P12},{P15},{P16},{P17}, {P18},{P21},{P22},{P23},{P24} }
POS ( D ) ={P2,P3,P4,P4,P5,P6,P8,P9,P10,P11,P12,P
15,P16,P17,P18,P21,P22,P23,P24}
POS (D)
k = ус (D) = C = 1
CU
U/IND(C-{a})={
{P15, P17}, {P16, P18}, {P21, P23},
{P22, P24}, {P3, P5}, {P2, P4, P6}, {P9, P11}, {P8, P10, P12}
}
U/IND(C-{b})={
{P2}, {P6}, {P3, P15}, {P4, P16}, {P9, P21}, {P10, P22}, {P5, P17}, {P6, P18}, {P11, P23}, {P12, P24} }
U/IND(C-{c})={
{P2, P8}, {P15, P21}, {P16, P22},
{P3, P9}, {P4, P10}, {P17, P23}, {P18, P24}, {P5, P11}, {P6, P12} }
U/IND(C-{d})={
{P2}, {P6}, {P15, P16}, {P21, P22},
{P3, P4}, {P9, P10}, {P17, P18}, {P23, P24}, {P5, P6}, {P11, P12} }
POS( k = Yo, AD) = C-T} = -3 = 0.722
C-{a’v 7 \U\18
POS ( D ) |
12 |
k = Y C -{ ь } ( D ) = p^’ ' |
= = 0.666 18 |
POS( k = Y fAD) = C-{T} = — = 0.444
C {T} UI
POS( k = Y; (DD) = C-{d} = — = 0.444
C{ d} U18
We see that: we can eliminate attribute a, which has small effects.
Note:
After deleting the attribute which has small effects, we determine the distance between the objects which has no decision and objects which have decision for only the objects which have the same small distance.
See table 5:
Table 6: Decision of Two Objects P1 and P7 after Elimination of Attribute which has a Small Effect
Objects with no decision |
Objects with decision |
small distance |
Old decision |
New decision |
P1 |
P2 |
1 |
20 |
30 |
P3 |
0 |
30 |
||
P7 |
P8 |
1 |
10 |
30 |
P9 |
0 |
30 |
The following table gives the decision of all objects:
Table 7: Complete Decision Table for All Objects
U/A |
a |
b |
c |
d |
D |
p1 |
10 |
20 |
10 |
20 |
30 |
p2 |
10 |
20 |
10 |
10 |
20 |
p3 |
20 |
20 |
10 |
20 |
30 |
p4 |
20 |
20 |
10 |
10 |
20 |
p5 |
30 |
20 |
10 |
20 |
30 |
p6 |
30 |
20 |
10 |
10 |
20 |
p7 |
10 |
20 |
20 |
20 |
30 |
p8 |
10 |
20 |
20 |
10 |
10 |
p9 |
20 |
20 |
20 |
20 |
30 |
p10 |
20 |
20 |
20 |
10 |
30 |
p11 |
30 |
20 |
20 |
20 |
30 |
p12 |
30 |
20 |
20 |
10 |
30 |
p13 |
10 |
10 |
10 |
20 |
30 |
p14 |
10 |
10 |
10 |
10 |
20 |
p15 |
20 |
10 |
10 |
20 |
30 |
p16 |
20 |
10 |
10 |
10 |
20 |
p17 |
30 |
10 |
10 |
20 |
30 |
p18 |
30 |
10 |
10 |
10 |
30 |
p19 |
10 |
10 |
20 |
20 |
30 |
p20 |
10 |
10 |
20 |
10 |
10 |
p21 |
20 |
10 |
20 |
20 |
30 |
p22 |
20 |
10 |
20 |
10 |
10 |
p23 |
30 |
10 |
20 |
20 |
30 |
p24 |
30 |
10 |
20 |
10 |
10 |
Table 8: The Optician's Decisions Data Set after Converting Attribute values into another Numerical Coding
U/A |
C |
D |
|||
a |
b |
c |
d |
||
p1 |
150 |
100 |
150 |
150 |
? |
p2 |
150 |
100 |
150 |
100 |
100 |
p3 |
100 |
100 |
150 |
150 |
50 |
p4 |
100 |
100 |
150 |
100 |
100 |
p5 |
50 |
100 |
150 |
150 |
50 |
p6 |
50 |
100 |
150 |
100 |
100 |
p7 |
150 |
100 |
100 |
150 |
? |
p8 |
150 |
100 |
100 |
100 |
150 |
p9 |
100 |
100 |
100 |
150 |
50 |
p10 |
100 |
100 |
100 |
100 |
50 |
p11 |
50 |
100 |
100 |
150 |
50 |
p12 |
50 |
100 |
100 |
100 |
50 |
p13 |
150 |
150 |
150 |
150 |
? |
p14 |
150 |
150 |
150 |
100 |
? |
p15 |
100 |
150 |
150 |
150 |
50 |
p16 |
100 |
150 |
150 |
100 |
100 |
p17 |
50 |
150 |
150 |
150 |
50 |
p18 |
50 |
150 |
150 |
100 |
50 |
p19 |
150 |
150 |
100 |
150 |
? |
p20 |
150 |
150 |
100 |
100 |
? |
p21 |
100 |
150 |
100 |
150 |
50 |
p22 |
100 |
150 |
100 |
100 |
150 |
p23 |
50 |
150 |
100 |
150 |
50 |
p24 |
50 |
150 |
100 |
100 |
150 |
If the condition attribute values are symbols, then we must convert them into integers according to the order of symbols. If there are three values as high, medium and low, then we can convert them into 3, 2 and 1 respectively.
Remark (1):
By converting the information table "Table 1" into another numerical coding as in Table 8, we get the same result of prediction. This mean that our method is independent to numerical assumptions "coding". i.e. if you make many numerical coding to the information system table, then you will get the same prediction result.
-
V. Conclusion
By calculating the distance function between complete decision table and incomplete decision table, we can put a decision for missing values according to the algorithm which is explained in section 4. When a small distance is repeated with more than one object, we make an elimination of a condition attribute which has a small effect on the information system, and then we calculate the distance function again, and apply the algorithm.
Acknowledgement
The author would like to thank Prof. Dr. A. M. Kozae, for his encouragement and support, and sincerely thank the anonymous reviewers whose comments have greatly helped clarify and improve this paper.
Список литературы Prediction of Missing Values for Decision Attribute
- Bazan J., "A Comparison of dynamic and nondynamic rough set methods for extracting laws from decision tables", Rough Sets in Knowledge Discovery, Physica Verlag, 1998.
- Hala S. Own, Aboul Ella Hassanien, "Rough Wavelet Hybrid Image Classification Scheme", JCIT, Vol. 3, No. 4, pp. 65 ~ 75, 2008.
- Hu K.Y., Lu Y.C., Shi C.Y., "Feature ranking in rough sets", AI Commun. 16 (1), 41~50, 2003.
- Hu, X., Cercone N., Han, J., Ziarko, W, "GRS: A Generalized Rough Sets Model", in Data Mining, Data Mining, Rough Sets and Granular Computing, T.Y. Lin, Y.Y.Yao and L. Zadeh (eds), Physica-Verlag, 447~ 460, 2002.
- Jin-Cherng Lin and Kuo-Chiang Wu, "Using Rough Set and Fuzzy Method to Discover the Effects of Acid Rain on the Plant Growth", JCIT, Vol. 2, No. 1, pp. pp ~ 48, 2007.
- Komorowski J., Ohrn A., "Modelling prognostic power of cardiac tests using rough sets", Artif. Intell. Med. 15, 167~191, 1999.
- Lashin E.F, Kozae A.M., Abo Khadra A.A., and Medhat T., "Rough set theory for topological spaces", International Journal of Approximate Reasoning, Vol. 40, No. 1-2, 35~43, 2005.
- Li G.Z., Yang J., Ye C.Z., Geng D.Y., "Degree prediction of malignancy in brain glioma using support vector machines", Comput. Biol. Med. Vol. 36, No. 3, 313~325, 2006.
- Lin T.Y., "Granular computing on binary relations I: data mining and neighborhood systems, II: rough set representations and belief functions", In: Rough Sets in Knowledge Discovery , Lin T.Y., Polkowski L., Skowron A., (Eds.). Physica-Verlag, Heidelberg ,107~140, 1998.
- Lin T.Y., Yao Y.Y., Zadeh L.A., (Eds.) " Rough Sets, Granular Computing and Data Mining", Physica-Verlag, Heidelberg, 2002.
- Medhat T., "Missing Values Via Covering Rough Sets", IJMIA: International Journal on Data Mining and Intelligent Information Technology Applications, Vol. 2, No. 1, pp. 10 ~ 17, 2012.
- Pawlak Z., "Rough set approach to multi-attribute decision analysis", European Journal of Operational Research, Vol. 72, No. 3, 443~459, 1994.
- Pawlak Z., "Rough Sets - Theoretical Aspects of Reasoning about data.", Kluwer Academic Publishers, Dordrecht, Boston, London, 1991.
- Shifei Ding, Yu Zhang, Li Xu, Jun Qian, "A Feature Selection Algorithm Based on Tolerant Granule", JCIT, Vol. 6, No. 1, pp. 191 ~ 195, 2011.
- Tsumoto S., "Mining diagnostic rules from clinical databases using rough sets and medical diagnostic model", Inform. Sci. Vol. 162, 65~80, 2004.