Recognition and Classification of Human Behavior in Intelligent Surveillance Systems using Hidden Markov Model
Автор: Adeleh Farzad, Rahebeh Niaraki Asli
Журнал: International Journal of Image, Graphics and Signal Processing(IJIGSP) @ijigsp
Статья в выпуске: 12 vol.7, 2015 года.
Бесплатный доступ
Nowadays, the human behavior analysis by computer vision techniques has been an interesting issue for researchers. Automatic recognition of actions in video allows automation of many otherwise manually intensive tasks such as video surveillance. Video surveillance system especially for elderly care and their behavior analysis has an important role to take care of aged, impatient or bedridden persons. In this paper, we propose a high accuracy human action classification and recognition method using hidden Markov model classifier. In our approach, first, we use star skeleton feature extraction method to extract extremities of human body silhouette to produce feature vectors as inputs of hidden Markov model classifier. Then, hidden Markov model, which is learned and used in our proposed surveillance system, classifies the investigated behaviors and detects abnormal actions with high accuracy in comparison by other abnormal detection reported in previous works. The accuracy about 94% resulted from confusion matrix approve the efficiency of the proposed method when compared with its counterparts for abnormal action detection.
Video surveillance, human action recognition, star skeleton method, feature extraction, hidden Markov model
Короткий адрес: https://sciup.org/15013933
IDR: 15013933
Текст научной статьи Recognition and Classification of Human Behavior in Intelligent Surveillance Systems using Hidden Markov Model
Published Online November 2015 in MECS
Human action recognition and classification methods have many different applications useful in human life. Video surveillance is one of its attractive utilization which is applied in intelligent supervision systems in banks, parking lots and smart buildings [1, 2]. Interaction between human and machine to order and communication is another important issue, which is done by various techniques such as speech recognition [3] and hand gesture classification [4]. The processing of video frames come from security cameras with the aim of controlling and recognizing abnormal behaviors create an automatic care monitoring system as a human action recognizer. On the other side, the number of the elderly and the sick who live alone and need to be checked by continuous monitoring are increasing thus intelligent systems are useful and necessary for elderly permanent monitoring. Several factors are vital in the efficiency of an action recognition system such as, detection time, background of the location, the abnormal conditions and the number of people in the interested environment. The significance of each factor in the object of study and the type of action or behavior identify the type of recognition and classification. For instance, in partly behaviors just the top part of body is used to hand gesture recognition [5].
Human behavior analysis from a captured video requires a pre-processing step including foreground and background detection, and tracking individual in consecutive frames. The others major steps are feature extraction, a suitable classifier or model selection and finally the process of classification, identification and authentication based on extracted features. The first step for detection of an object behavior is identifying the movement of an object in the image and its segmentation. The most famous strategy for moving object detection is background subtraction [6]. A simple approach of background subtraction is achieved by comparing each frame of the video with static background. As we said, after pre-processing step, an automatic recognition system includes two fundamental stages: first stage is extracting features of the input frame and the second stage is actions classification [7]. One of the most important steps in behavior analysis process is feature extraction and creating a suitable feature vector. This part of process will generate the primitive data for classifier. There are wide selections of feature extraction methods in human action recognition such as blob method [2], edge based method [8]. Furthermore, the extremities of human contour to its centroid are one of the conceptual features extracted from star skeleton method [9]. Low computational complexity and low sensitivity to resizing are from the advantages of star-skeleton method. In recognition system, a sequence of images introduces the action, and independent of the feature extraction method, the system produces a feature vector and converts it to a symbol which is detectable by a classification method [10]. Human action classification presented by different strategies in the previous studies follow feature extraction step. K-nearest neighborhood (KNN) is a simple and useful classifier with high compatibility to take perception and without needing to create hypothesis on data [11]. A drawback of KNN classification is high computational timing in learning procedure. A good selection of k value is another problem which has to be set by different simulations. In Ref. [12], K-Means algorithm extracts features and KNN classifies different actions. Support vector machine (SVM) is another method of classification [13] that has indicated well performance in recent years in comparison with the old methods. Although SVM generally is used in two class problems, by using the strategy of one-versus-one and one-versus-all case it could solve multi class problems [14]. Hidden Markov model (HMM) presented in [15] is a high precision with extra computing load classification. In this paper we use HMM as a high accuracy classification method in our proposed surveillance system.
The remainder of the paper is as follows: section 2 overviews the related work. Section 3 is a brief review on hidden Markov model. Section 4 described the principal of our proposed surveillance system. The simulation results and comparison is presented in section 5 and finally, the paper is concluded in section 6.
-
II. Related Work
Many different approaches for action recognition have been proposed over the past two decades [16]. These researches have different applications according to behaviors varieties. Sensors and cameras are widely used for surveillance applications. In some researches, the acceleration obtained from sensors is used for human action recognition, such as elderly people care in smart homes by sensors [17, 18]. The main disadvantages of acceleration based methods are a person must wear a particular sensor or device or place in a particular place. The other method is video surveillance, in this method one or multi-camera is used in different locations for human behavior recognition. This type of supervision has been used for human different behaviors recognition such as care behavior for elderly people and abnormal or criminal behaviors in indoors or outdoors. Ref. [19] presents a particle video-based abnormal behavior detection method and hidden Markov model is used for small groups of abnormal behavior detection. An automated video surveillance for crime scene detection using statistical characteristics is presented in [20]. If the scene shows some peculiar situation such as purse snatching, kid napping and fighting on the street, the surveillance system recognize the situation and automatically report to agency. Another application of video surveillance is elderly people behavior analyzing in emergency. In Ref. [21] the recognition of abnormal human activities such as falling, chest pain and fainting, vomiting, and headache is studied. The proposed system model presents a novel combination of R transform and principal component analysis (PCA) for abnormal activity recognition. Hidden Markov model (HMM) is applied on extracted features for training and activity recognition. Ref. [22] presented a method for human fall detections based on combination of eigenspace technique and integrated time motion images (ITMI). Eigenspace technique is applied to ITMI for extracting eigen motion. On the other hand, multi class SVM classifies and determines a fall event. Ref. [23] proposed a method to detect falls based on a combination of motion history and human shape variation. Ref. [24] presents a HMM classifier for behavior understanding from video streams in a nursing center. To extract an activity from video stream, it is necessary to detect the foreground objects and extract image features. Based on the extracted foreground pixel, a posture is represented by a pair of histogram projection in both horizontal and vertical. The motion computed from the motion history map (MHS) is also used as the features in determining the activity and a duration-like HMM is adopted for activity feature extraction. Ref. [25] presents a novel method to detect various posture-based events in a typical elderly monitoring application in a home surveillance scenario. Combination of best-fit approximated ellipse around the human body, horizontal and vertical velocities of movement and temporal changes of centroid point, would provide a useful cue for detection of different behaviors. Extracted feature vectors are finally fed to a fuzzy multiclass support vector machine for precise classification of motions and the determination of a fall event.
In this paper, the classification and recognition human abnormal behaviors are our focus. For this purpose, extremities have been identified with sufficient accuracy by feature extracting step according to center of gravity and position of the body, and final feature vector is produced. Then HMM classifies according to extracted features and at the end abnormal behaviors are detected.
-
III. A Brief Review on Hidden Markov Model
Hidden Markov model is a powerful model for recognition random of events and dynamic processes [15]. Training is one of the most important ability of HMM. For training process, we apply a set of sequential data to HMM and estimate its primary parameters. In this paper, we use discrete HMM for classification and recognition human behavior.
A discrete HMM consists of a number of states each of which is assigned a probability of transition from one state to another state and when the system is in a state in particular time such as t is shown by qt, (t=1,2, ). With time transitions, states occur stochastically. Like Markov models, states at any time depend only on the previous state or the state at the preceding time. In a discrete HMM one symbol is yielded from one of the HMM states according to the probabilities assigned to the states. HMM states are not directly visible, and can be observed only through a sequence of observed symbols [26]. To describe a discrete HMM, the following notations are defined [15]:
N = number of states in the mode
V = { v1, v2,...,vM }: set of possible output symbols.
M = number of observation symbols.
Q = { q 1 , q 2 ,..., q t }: set of states
We display state transition probability matrix by A = { ai j } achieved according to equation (1):
a ij = P [ Q t + i = j I Q t = i ] 1 < i , j < n (1)
Where a is the probability of transition from state i to state j .
We display symbol output probability matrix by B = { bj ( k ) } and achieved according to equation (2):
b j ( k ) = P O t = vkIq = j ] 1 < i < M (2)
Where O = ( o1,o2 ,..., oT ) is the sequence of observations, OT is the output at time t and all the observations displayed by T .
Initial state probability matrix shown by π = {π i } and is achieved according to equation (3):
n i = P [ q t = i ] 1 < i < N (3)
For each of the above parameters, the model is defined completely if there is a value for above parameters. So a HMM like λ can be shown by a set of three matrixes as equation 4.
Я = ( A, B , n ) (4)
A. Recognition and Training Using HMM
To identify observed symbol sequences, we conceive one HMM for each category. For a classifier of C categories, we choose the best matches of model with the observations from C HMMs λᵢ ={ A ᵢ, B ᵢ, π ᵢ} i =1, , C . accordingly for a sequence of unknown category, we calculate Pr (A,| О ) for each HMM X i and select Я c * , where
*
c = argmax( Pr ( Я | О )) (5)
Given the observation sequence O = ( o 1 , o 2 , ..., o T ) and the HMM λ i , according to the Bayes rule, we should salve how to evaluate Pr (λᵢ| O ), the probability that the sequence was produced by HMM λi. This probability is calculated by using the forward algorithm [27]. The forward algorithm is defined as follows:
a t = P ( O i , o 2 ,..., O t | Q t = i , Я )
αt(i) is called the forward variable and is calculated recursively as follow:
a i = n i b j ( o i ) 1 < i < N
N 1 < t < T at+1(j) = [^ at(i) aij ]bj(ot+1) (8)
1 < j < N i=1
P ( 0 | Я ) = ]T a T ( i ) (9)
i = 1
We calculate the likelihood of each HMM using the above equation and select the most likely HMM as the recognition result. For learning stage, each HMM must be trained so that it is most similar to produce the symbol patterns for its category. Training an HMM means optimizing the parameters ( A , B , π ) of the model to maximize the probability of the observation sequence Pr (λ| O ). The Baum-Welch algorithm is used for these estimations. We should define a number of variables before Baum-Welch algorithm definition:
P t ( i ) = P ( O t + 1 , O t + 2 ,..., O T\ Q t = i , Я ) (10)
/? t (i) is called the backward variable and can also be solved inductively in a manner similar to that used for the forward variable α t (i).
PT (i) = 1 1 < i < N(11)
Pt(i) = ^^aybj(ot+1)Pt+1(j) t = T-У -2’...’1
j : 1 1 < i < N
P (0\Я) = ]T nibj(O1)P1( i)(13)
i = 1
To determine the optimal sequence of states it is required to define a variable named γ as follows:
Y t ( i ) = P ( Q t = i \ o , Я ) =
P ( O , Q t = 1 \ я ) =
P ( О \ Я )
P ( О , Q t = i \ Я )
]T p ( о , Q t = 1 \ я )
i = 1
This equation can be summarized as follows:
Y ( i ) = -Y^^^ £ a t ( i ) e , ( i ) i = 1
And finally Baum-Welch algorithm can be defined as follows:
Z t ( i , j ) = P ( q t = i , q t + 1 = j\O , X ) a t ( i ) a ij b j ( o t + 1 ) P t + 1( i )
P ( O\ X )
Using these equations, HMM parameters λ can be improved to X . The re-estimation equations from X = ( A , B , n ) to X = ( A , B , C )are:
^ i = / ( i )
T - 1
E Z t ( i , j ) a = _t =1_________ aij t - 1
E Y (i) t = 1
T - 1
E Yt ( i ) t = 1
bj ( k ) = -^Fv^" E Y ( i ) t = 1
-
IV. The Principle of Our Proposed Surveillance System
Totally, our proposed surveillance system includes several steps. Some of these steps are pre-processing and others are main steps. Fig. 1 shows our proposed surveillance system for human action detection procedures. As shown in figure, at first we use background subtraction algorithm for input data to extract silhouette of body by foreground and background detection. Then, we extract extremities by star-skeleton method. For this purpose, we calculate the centroid of the contour and provide the distances of each point on the contour from the centroid in a counter-clockwise. With this procedure, we find the local maxima of the external points from distance sequences and analyze the distance diagram of contour extremities for extracting important points or extremities of human contour. For feature vector production, we use polar coordinate system for description of the extremities. We place the center of the polar coordinate system on the centroid of the contour and produce feature vector by the position of the points in each division. As shown in Fig. 2.
Although Baum-Welch algorithm does not always find the global maximum, it find the local maximum of Pr ( O|λ ) .
Part 1

Feature extraction
Fig.1. Human Action Recognition Procedures Video
Input video
Human body silhouette extracting
Part2
Produce feature vector by extremity points
Find extremity points
Classification by HMM

Fig.2. The Extremity Points of Human Body Silhouette in Star Skeleton Method
We choose eight angle and three length divisions thus we have a feature vector with the length of 24. Accordingly, the number of points counted in angle and length divisions produce final feature vector of body contour. Overall, we have a feature vector for each video frame and a time sequence by converting these feature vectors to discrete symbols. As we know HMM is an effective method to analyze these sequences. We apply Leave-one-out method for training step, so in each operation, one of the video samples is chosen as test sample and others are used for training, every time test sample change and other samples are used for training. Therefore, as we said, in a feature vector the number of important extremities points counted and saved. Then this information is used as the input of training stage and HMM parameters are trained. To get HMM we use Baum-Welch algorithm and behavioral classification to adjust suitable HMM parameters from feature vector. After training HMM parameters for each class and in each stage of Leave-one-out method, we test new sample of behaviors. For this purpose, we program a function that takes new videos as inputs according to HMM parameters in training stage and compare the feature extracted from these videos frames with every HMM of each class or action and achieve a probability for the class. Finally the system select the class with the best matches to desired action and this class is labeled as the result of classification for this action.
-
V. The Simulation Results of Classification for Behavioral Surveillance Detection
In this paper, we focus on behaviors that are useful for elderly care. To examine our method, we collect a dataset as shown in Fig. 3. We consider a set of action including falling from the bed, falling from the chair, collapsing, sitting and bending by several persons. This collection consists of five different actions. For every action, we examine seven different samples thus altogether we use 35 different video sequences. As we said our surveillance system finally select a class with the best matches to desired action and this class is labeled as the result. The simulations carried out on a computer system with Windows7, X64, Core i5, 2.13 GHz, RAM 4 GB.
Fig. 4 shows the result of our system recognition accuracy for different samples of each action. As shown in Fig. 4, the system recognize actions 1, 4 and 5 completely true but the system is mistaken in recognition of action 2 and 3, because they are somehow similar together.


Action5
Fig.3. Our Dataset for Caring Behaviors Including Five Actions

Data samples
Fig.4. Our Proposed System Recognition Accuracy for Different Samples of Each Action
Table 1 shows the accuracy of our surveillance system by a confusion matrix. In addition, Fig. 5 exhibits the color-coded bar chart of the correct and incorrect detection accuracy for each action, which is derived from confusion matrix. The summation of results shows 94% accuracy in correct action detection. The proposed method works as a surveillance system. When the sequence input data is checked, if its features is similar to abnormal behavior such as falling from the bed, falling from the chair and collapsing with a high percentage, this action is labeled as abnormal behaviors and active an alarm.
To show the efficiency of the proposed approach, we have compared our surveillance method with 94% accuracy to its counterparts [24, 25], which are briefly introduced in related work section. Ref. [24] has been proposed based on duration-like HMM classification approach and its test stimuli are similar to ours. For abnormal detection, the approach has reported 90% accuracy. Multi-class SVM [25] reported result show it benefits of 88.8% accuracy for abnormal behavior detection.
Table 1. The Confusion Matrix of the Dataset
Action1 |
Action2 |
Action3 |
Action4 |
Action5 |
|
Action1 |
100 |
||||
Action2 |
14.29 |
85.71 |
|||
Action3 |
14.29 |
85.71 |
|||
Action4 |
100 |
||||
Action5 |
100 |

■ Action1
■ Action2
■ Action3
■ Action4
■ Action5
Actions
Fig.5. Correct and Incorrect Detection Accuracy for Each Action
-
VI. Conclusions
In this paper, we propose a high accuracy behavioral surveillance system for elderly care. Abnormal behavior detection in our proposed system works based on HMM classification. In our system, after pre-processing steps star-skeleton method is used to extract body features and extremities. Then, some suitable feature vectors are generated in polar coordinate system. Finally, this information applies to the input of HMM classifier which is able to detect and label each input action. The simulation results show the efficiency of our method to correct detection of five different actions as well as abnormal detection. The accuracy of our method in elderly care surveillance is 94% which shows improved in comparison with the previous similar works presented in Refs. [24] and [25] with 90% and 88% accuracy.
Список литературы Recognition and Classification of Human Behavior in Intelligent Surveillance Systems using Hidden Markov Model
- Teddy Ko, A Survey on Behavior Analysis in Video Surveillance for Homeland Security Applications, 37th IEEE Applied Imagery Pattern Recognition Workshop, PP. 1-8, April 2008.
- S. Mhatre, S. Varma, R. Nikhare, Visual Surveillance Using Absolute Difference Motion Detection, International Conference on Technologies for Sustainable Development (ICTSD), pp. 1-5, 2015.
- Hajer Rahali, Zied Hajaiej, Noureddine Ellouze, Robust Features for Speech Recognition using Temporal Filtering Technique in the Presence of Impulsive Noise, International Journal of Image, Graphics and Signal Processing (IJIGSP), Vol.6, No.11, pp. 17-24, October 2014.
- S. Rautaray, A. Agrawal, Real Time Multiple Hand Gesture Recognition System for Human Computer Interaction, International Journal of Intelligent Systems and Applications(IJISA), Vol. 4, No. 5, pp. 56-64, May 2012.
- J. Huang, S. Hsu1and, C. Huang, Human Upper Body Posture Recognition and Upper Limbs Motion Parameters Estimation, IEEE Signal and Information Processing Association Annual Summit and Conference, pp. 1-9, 2013.
- Shahrizat Shaik Mohamed, Nooritawati MdTahir, Ramli Adnan, Background Modeling and Background Subtraction Performance for Object Detection, 6th International Colloquium on Signal Processing and Its Applications (CSPA), pp.1-6, 2010.
- Al Mansur, Yasushi Makihara and Yasushi Yagi, Action Recognition using Dynamics Features, International Conference on Robotics and Automation, pp. 4020 - 4025, 2011.
- Chun-Hua Hu, Song-Lin Wo, An efficient method of human behavior recognition in smart environments, International Conference on Computer Application and System Modeling (ICCASM), Vol. 12, PP. 690-693, 2010.
- Xin Yuan, Xubo Yang, A Robust Human Action Recognition System using Single Camera, International Conference on Computational Intelligence and Software Engineering, pp.1-4, 2009.
- Chih-Chiang Chen, Jun-Wei Hsieh, Yung-Tai Hsu, Chuan-Yu Huang, Segmentation of Human Body Parts Using Deformable Triangulation, 18th International Conference on Pattern Recognition (ICPR'06), Vol.1, PP. 355 - 358, 2006.
- M.A. Wajeed, T. Adilakshami, Semi-supervised text classification using enhanced KNN algorithm, World Congress on Information and Communication Technologies (WICT), PP. 138-142, 2011.
- Sarvesh Vishwakarma, Anupam Agrawal, Framework for Human Action Recognition using Spatial Temporal based Cuboids, International Conference on Image Information Processing (ICIIP), pp. 1-6, 2011.
- Chen Junli, Jiao Licheng, Classification Mechanism of Support Vector Machines, 5th International Conference on Signal Processing Proceedings (WCCC-ICSP), Vol. 3, PP. 1556 - 1559, 2000.
- Megha D Bengalur , Human Activity Recognition using Body Pose Features and Support Vector Machine, International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 1970 - 1975, 2013.
- Lawrence R. Rabiner, Fellow, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proceeding of the IEEE, Vol.77, pp. 257 - 286, 1989.
- Zia Moghaddam and Massimo Piccardi, Senior Member, Training Initialization of Hidden Markov Models in Human Action Recognition, IEEE Trans. on Automation Science and Engineering, Vol.11, pp. 394-508, 2014.
- N.K. Suryadevara , S.C. Mukhopadhyay , R. Wang , R.K. Rayudu, Forecasting the behavior of an elderly using wireless sensors data in a smart home, Engineering Applications of Artificial Intelligence (Elsevier), vol. 26, pp. 2641–2652, November 2013.
- N. Noury, T. Hadidi, Computer simulation of the activity of the elderly person living independently in a Health Smart Home, Computer Methods and Programs in Biomedicine, (Elsevier), vol. 108, pp. 1216–1228, December 2012.
- Dongping. Zhang, Jiao.Xu, Yafei.Lu, Huailiang. Peng, Dynamic Model Behavior Analysis of Small Groups, IEEE Conference Based on Article Video Wireless Communications & Signal Processing (WCSP), pp.1 – 6, 2013.
- Koichiro Goya, Xiaoxue Zhang, Kouki Kitayama, A Method for Automatic Detection of Crimes for Public Security by Using Motion Analysis, IEEE, Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, pp. 736 - 741, 2009.
- Zafar Ali Khan, Won Sohn, (2010), Feature Extraction and Dimensions Reduction using R transform and Principal Component Analysis for Abnormal Human Activity Recognition, 6th International Conference on Advanced Information Management and Service (IMS), pp. 253 - 258.
- Homa Foroughi, Hadi Sadoghi Yazdi, Hamidreza Pourreza, Malihe Javidi, An Eigenspace-Based Approach for Human Fall Detection Using Integrated Time Motion Image and Multi-class Support Vector Machine, 4th International Conference on Intelligent Computer Communication and Processing (ICCP), pp.83-90, 2008.
- C. Rougier, J. Meunier, A. St-Arnaud, J. Rousseau, Fall Detection from Human Shape and Motion History using Video Surveillance, 21st International Conference on Advanced Information Networking and Applications Workshops, vol.2, pp. 875 - 880, 2007.
- Pau-Choo Chung, Chin-De Liu, A Daily Behavior Enabled Hidden Markov Model for Human Behavior Understanding, Pattern Recognition (Elsevier), vol. 41, pp. 1572-1580, May 2008.
- Homa Foroughi, Mohamad Alishahi, Hamidreza Pourreza, Maryam Shahinfar, Distinguishing Fall Activities using Human Shape Characteristics, Technological Developments in Education and Automation (Springer), PP. 23-528, 2010.
- Junji YAMATO, Jun OHYA, Kenichiro ISHII, Recognizing Human Action in Time Sequential Images using Hidden Markov Model, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1992.
- X.D. Huang, Y. Ariki, and M.A. Jack. "Hidden Markov Modes for Speech Recognition". Edmgurgh Univ. Press, 1990.