EFF-ViBE: an efficient and improved background subtraction approach based on ViBE

Автор: Elie Tagne Fute, Lionel L. Sop Deffo, Emmanuel Tonye

Журнал: International Journal of Image, Graphics and Signal Processing @ijigsp

Статья в выпуске: 2 vol.11, 2019 года.

Бесплатный доступ

Background subtraction plays an important role in intelligent video surveillance since it is one of the most used tools in motion detection. If scientific progress has enabled to develop sophisticated equipment for this task, algorithms used should be improved as well. For the past decade a background subtraction technique called ViBE is gaining the field. However, the original algorithm has two main drawbacks. The first one is ghost phenomenon which appears if the initial frame contains a moving object or in the case of a sudden change in the background situations. Secondly it fails to perform well in complicated background. This paper presents an efficient background subtraction approach based on ViBE to solve these two problems. It is based on an adaptive radius to deal with complex background, on cumulative mean and pixel counting mechanism to quickly eliminate the ghost phenomenon and to adapt to sudden change in the background model.

Еще

Background subtraction, ViBE, adaptive radius, cumulative mean, pixel counting, ghost

Короткий адрес: https://sciup.org/15016028

IDR: 15016028   |   DOI: 10.5815/ijigsp.2019.02.01

Текст научной статьи EFF-ViBE: an efficient and improved background subtraction approach based on ViBE

Published Online February 2019 in MECS DOI: 10.5815/ijigsp.2019.02.01

  • II.    Background

Naively, the basic idea behind background subtraction will just consist of the difference between the current pixel value and the background model value followed by the comparison of that difference to a threshold value for decision. However, in real life situations there is a lot of complexity in the background due to many challenges namely illumination changes, dynamic background, camera jitter and many other background challenges. This leads to the development of more sophisticated techniques for background subtraction purpose.

Probabilistic background subtraction techniques also called parametric background subtraction techniques generally refer to approaches that model the background using a Normal (Gaussian) distribution over pixel’s intensity values of an image. Many solutions exist in the literature, but one of the first was proposed by Stauffer C et al[6]. It was followed by many improved methods STGMM[9], SKMGM[10], SKMGM[10], TAPPMOG[11] and STAPPMOG[12] (where MOG stands for mixture of Gaussian) each of them trying to solve one or more insufficiency(ies) of the original algorithm. However, dues to its sensitivity, MOG approach cannot be accurately tuned and its ability to successfully handle high and low-frequency changes in the background is debatable.

To solve the drawbacks of manually selecting parameters in each environment, non-parametric approaches were proposed (Ahmed Elgamma[13]). They generally refer to sample-based methods but many other methods exist such as kernel density estimation (KDE) approach[14], the Bayesian approach[15] or recursive density estimation (RDE) approach[16].

Another family of method is codebook approach where for each pixel position, the background is modeled by a codebook. It was introduced by Kim et al[17]. Here each pixel is assign a codeword which consists of intensity, color and temporal features. To do the segmentation, the intensity and color of incoming pixels are compared with those of the codewords in the codebook. Later on, the algorithm computes the distances between the pixels and the codewords, compare them with a threshold value and assign either a foreground label if no match was found or a background label otherwise. Later on, the matching codeword is updated with the respective background pixel.

We also have subspace based approach[18], in which one model is devoted. to accurately detect motion, while the other aims to achieve a representation of the empty scene. The differences in foreground detection of the complementary models are used to identify new static regions. In this approach, a collection of images and the corresponding mean and covariance matrix are computed. This is followed by the computation of PCA of the covariance matrix, a projection with a certain number of vectors and a comparison of incoming images with their projections onto Eigen vectors. Finally a distance is computed between the image and the projection and compared with the corresponding threshold value to label the pixel as foreground or background pixel.

Moreover we have compressive sensing approaches[19, 20, 21, 22], which allow reducing the number of measurements required to represent the video using the prior knowledge of sparsity of the original signal. Its theory shows that a signal can be reconstructed from a small set of random projections, provided that the signal is sparse in some basis as an example in the case of wavelets. However these methods impose certain conditions on the design matrix.

It is also important to mention approaches that use edge detection[23], or patches detection[24]. While the first approach is used most frequently for image segmentation based on abrupt changes occurred in image intensities, the second one uses patch-based approaches for operations such as image denoising.

A more recent approach developed by Braham and Droogenbroeck is the use of convolution neural network generally refers as deep learning approach. It uses a fixed background model, which was generated from a temporal median operation over N video frames [25]. Its architecture is very similar to LeNet- 5 network for handwritten digit classification[26], except that sub sampling is performed with max-pooling instead of averaging and hidden sigmoid units are replaced with rectified linear units for faster training. However, this method tends to be scene specific that is why we will not focus on it in this paper.

This paper presents an efficient background subtraction approach based on ViBE named EFF-ViBE where EFF stands for efficient. The proposed approach has three main contributions. The first contribution is the introduction of a new update mechanism that uses Koenig formula to reduce overall computations. This is combined with the use of n recent frames to model the background. Here the value of n has been chosen empirically. The second contribution lies on the introduction of a pixel counting mechanism to consider foreground pixels that stay for long as background pixels. This usually happens when a sudden change appears in the background model. Note that these two contributions are to quickly eliminate the apparition of ghost phenomenon. Finally the third contribution concerns the introduction of a modified adaptive radius policy with parameter adjustment which enables the algorithm to deal with complex background when necessary and behave differently (almost as the original ViBE) when the background is relatively simple.

  • III.    Vibe Algorithm

The following section describes original version of ViBE algorithm proposed by Barnich et al in[5] as well its improvement[27] proposed by the same authors in order to handle video sequences. Its background model is a nonparametric pixel based model where each background ground pixel x is modeled by a set of samples.

M = ( V 1 , v 2 , v 3 — v N }                (1)

It follows the same structure as any background subtraction algorithm which has three sub-tasks namely background model initialization, segmentation or classification and update of the model.

  • A.    Background Model Initialization

For each pixel x , the set M ( x ) is filled randomly with samples around the spatial neighborhood of x . It is important to mention that those values are taken in the first frame and an assumption that neighboring pixels share a similar temporal distribution is made. The background estimate is therefore valid starting from the second frame. If t = 0 indexes the first frame and that N G ( x ) is a spatial neighborhood of a pixel location x , therefore

M 0 = { v 0 ( yy e N G ( x ) )}           (2)

where locations y is chosen randomly according to a uniform law.

  • B.    Classification Process

In practice, ViBE does not estimate any probability density function, but uses a set of previously observed sample values as a pixel model. If the algorithm wants to classify a value v(x) of a pixel x, it compares this value to the closest values among the set of samples by defining a sphere SR(v(x)) of radius R centered on v(x). If the intersection set of this sphere and the set of samples {v1, v2, ..., vN } is above a given radius #min, then the pixel is classified as background pixel. Mathematically, #min is compared to:

  • # { S R ( v ( x ) ) A { v i , v 2 , v 3„ . vw }}            (3)

  • C.    Backgroun Model Update

To update the model, the algorithm uses three powerful techniques namely memory-less update policy, random sampling and spatial diffusion.

  •    Memory-less update policy

This consists in choosing randomly the sample to be updated. If the probability of a sample present in the model at time t being preserved after the update of the pixel model is given by N - 1 it will have the value

N

( t + dt ) t

P(t,t + dt)=l N--1

for any further time, t+ dt . This can be rewrite as

, (N - 1 ) „

-lnl  I

P(t, t + dt) = e 1 N J

  •    Random sub-sampling policy

This policy decides randomly which pixels are updated. By default, the adopted time sub-sampling factor ϕ is 16 meaning that each background pixel value has 1 chance in 16 of being selected to update its pixel model

  •    Spatial diffusion policy

  • D.    Other Approaches Related to Vibe

Another update mechanism consists of a first-in first-out model update policy such as those employed in[28, 29], 300 samples cover a time window of only 10 seconds (at 30 frames per second). A pixel covered by a slowly moving object for more than 10 seconds would still be included in the background model. More recent techniques such as[30] suggest that one frame alone cannot distinguish whether pixels belong to the foreground or background. Therefore, they propose to use the first n frames of the video sequence to complete the initialization of the background model. Others such as[31], believe that for complex dynamic background, the radius R should be increased appropriately, so that the background cannot be easily detected as foreground. On the other hand, they also believe that for simple static background, R should be decreased to detect small changes of the foreground.

  • IV.    Description of Eff-Vibe Technique

In this section, we are going to explain in details the EFF-ViBE approach. The overall functioning can be divided as the original ViBE in three main parts namely background model initialization, background segmentation and background model update. The idea is, rather than using the first frame to initialize the N sample values of each pixel, we instead use the n first frames. This enables to take into consideration the relationship between the n recent frames and thus acts as a filter. In addition, pixel counting mechanism comes and reinforces the process, if a foreground pixel stays in the foreground during K consecutive frames; it has certainly become a background pixel since it is now a static pixel in the scene

To see the advantage in using n recent frames to model the background we can analyze Fig 1. It shows the difference in the results between the use of one frame and the use of n recent frames in comparison to the perfect waited output. We can easily see that it is better to choose the n recent frames for, the results are more accurate. The drawback of this approach is that it needs a little bit more computational resources (especially memory and time) than the one using one frame but, it is acceptable compared to the results obtained.

Fig.1. Comparison of the results on overpass dataset (a) the ground truth (b) the output using one frame (c) the output using n recent frames

On the other hand, when some modifications occur in the background as in the case of movement of branches, of the air, of small particles or even small waves at a sea surface, we have to handle this complexity. This is because those elements in movement do not belong to the foreground and thus must be treated as background elements. Yet, using a fix radius R can mistakenly classify those pixels as foreground pixels. That is why we use an adaptive radius (we have decided to denote it R ad ) that tends to be steady if the mean distance between the current pixel and its N sample background values increases gradually for complex dynamic background and tends to lightly decrease if the mean distance tends to be steady.

(a)                     (b)                     (c)

Fig.2. Comparison of the results on boats dataset (a) the ground truth (b) the output using a fixed radius (c) the output using an adaptive radius.

Fig 2 shows the difference in the results between the use of a fixed radius and the use of an adaptive radius in comparison to the perfect waited output. The drawback is still the same as the previous one so it is acceptable.

All this has enabled us to formalize the whole process schematically and it is presented on Fig 3

The proposed EFF-ViBE method has three main parts namely background initialization, background segmentation and background update as shown in Fig 3. The first part is performed during the initialization process. While the first n frames are gathered to compute the cumulative mean for each pixel, the sample values are initialized as well. At the same time, an adaptive radius R ad is also computed for each pixel. The second part is the segmentation part that classifies a pixel as background or foreground pixel if conditions in the segmentation block of Fig 3 are verified or not. The third part is performed after the segmentation process. It uses each new incoming frame to update its parameters. In this part, we try to update the parameters used with a particular approach. Since computing the mean and the standard deviation (at each iteration) for every pixel is costly, we use the fact that to compute a new mean, we only need to retrieve the first pixel's history value and add the new value on the previous mean and do the same for the standard deviation. Consequently the new mean will be function of the previous mean, the first value and the new observed value. The new standard deviation will use the same principle meaning that it will be function of the previous standard deviation, the first value and the new observed value. In addition, to compute the new adaptive radius, we also avoid computing it (at each iteration) instead we compute a new radius if and only if the history values of the corresponding pixel have changed. In a generic manner if we denote by “change” the variable that indicates if the history values of a pixel have changed or not, the update mechanism will use the history of the pixel and the “change” value to compute the new radius. The details on the method will be given in the following subsections, where the mathematical expressions are given.

  • A.    Modified Background Model Initialization

To initialize the background model we use three techniques. Firstly, we use the initialization process of the normal ViBE method to model each background pixel with N sample values taken from their direct neighborhood. Secondly, we use the n first frames as in[30] but with a modification of its value, for we have noticed that increasing the number of frames also increase the accuracy of the results. On the other hand, if n is too high the algorithm becomes slower. That is why we have chosen n equal to 30 instead of 20 as in the original paper. The chosen frames are used to compute the mean value of each pixel and we add this value to the sample values to have N + 1 sample values. This leads each pixel to be modeled by

M = { v ( x ) 4, v 2 , V3—V N }       (6)

where

n v0 (x )=-Z V (x )             (7)

n t = 1

Thirdly we use the N sample values as in[31] to compute an adaptive radius for each pixel. We have obtained our values empirically in relation to the values chosen in the original paper. After several tests we have noticed that the original values can be modified that is S, increased a little bit and £d decreased. Consequently we have chosen S equal to 0.08 rather than 0.06 and £d equal to 0.35 rather than 0.4. We have chosen to denote the adaptive radius Rad as

N dmean = ^ ^ (X )           (9)

i = 1

Where D i ( X ) is the distance between pixel

x

and

sample value v i.

  • B.    Modified Background Model Segmentation

If we denote by B the segmented image, B ( x) will be the value of the pixel x in matrix B which is equal to 0 if x is a background pixel and 255 if x is a foreground pixel. Therefore, to classify a pixel as background pixel or foreground pixel we use the following equations for:

,.  ..         min ..     .. v t ( x )e S T ( v 0 ( x ))     (10)

255,......... otherwise

Г R ad , T = ^

n

CT = JZ(vt(X )- v0 (x ))

V t = 1

The parameter Umin will be set to 2 as in[5,30,31], and β will be set to 3 as in[30]. In addition, we have noticed that in the specific situation of camera jitter and thermal condition choosing a factor of 1.5 rather than 4.5 as in the original paper to update the last history of the background model leads to better results

  • C.    Proposed Background Model Update

To update our model we use the following assumptions. Firstly, as stated in[5], we consider that neighboring background pixels share a similar temporal distribution and that a new background sample of a pixel should also update the models of neighboring pixels. For that reason, we use the same memory-less up date policy as well as the time sampling policy. This enables the sample values of each pixel to adapt to background change.

Secondly, we use a pixel counting mechanism that classifies a pixel that stays in foreground within K consecutive frames as background pixel. To achieve this purpose, we maintain a matrix with the size of the frame. In that matrix, each element represents the number of time the pixel has appeared in the foreground during K consecutive frames. So when a pixel is classified as foreground pixel the algorithm increments it corresponding counter in the matrix. If the counter is greater than a maximum value denoted counter max , the pixel is classified as background pixel. It has been shown by experiments that a value of 10 for counter max leads to good results.

Thirdly, we update other parameters such as σ , v 0 ( x ) and d mean . To avoid computing algorithm parameters after each new frame, we use the Koenig formula which does a better job. To update the parameters we just need to add the new value and subtract the first one

So rather than storing mean and standard deviations, we store the sum and the sum of squares. This gives rise to the following equations:

n +1

^ vt (x ) = ^ vt (x) + vn+1(x)-v 1( x)

t=2

n+1

E(vt(x# = E(vt(x#+(vn+1(x#- (v 1(x# t=2

So if we denote by v0 ( x ^,, ^ ( x ) , v0 + 1 ( x ^, ^ + 1 ( x ) the value of the cumulative mean and standard deviation at time t and t + 1 respectively, we will update them using the following formula:

1 n + 1

vо t+1 (x )=-E v (x)

n t ~2

n + 1

^+1 =JX(v' (x ))"|-(v.'+1( x ))2

\ к t=2

Now to update d mean we use the fact that a new value for a given pixel should be computed if its sample values have been updated. So we use a Boolean matrix where all the elements are initialized to false and each time the sample values of a given pixel are changed we affect true to the element at the corresponding position in the Boolean matrix. Each time we want to compute Euclidean distance to verify that sample values of a pixel belong to the circle of radius R ad , we verify if its corresponding Boolean is set to true. If it is the case, a new d mean is computed using equation 9 in order to deduce the new Rad . Otherwise nothing is done, this technique enables to considerably reduce unnecessary computations. The decision process will use Equation 6 to Equation 13 replacing v 0 ( x ) by vo t + 1 ( x ) and < 7 by < 7 t + 1 .

  • V. Implementation and Results

    Our implementation was done using C/C++ language, openCV 3.0 platform[32]. Note that openCV was used here just to benefit from its powerful functions to capture images. The operating system used is Linux distribution Ubuntu 16.04 on a laptop of type Dell INSPIRON 1545 core duo 2.3MHz X 2 with a RAM of 3GB.The dataset used to measure the performance of the algorithms is the one provided by changedetection.net dataset[33]. The parameters of the proposed method are gathered in Table 1

The implementation co de is based on[34] release in July 2014 thanks to Marc Van Droogenbroeck. We have also used the BSLibrary[35] to do our simulations. It is a framework that has up to 53 background subtraction algorithms already implemented from the oldest to the most recent and challenging. We have therefore modified the framework to add three new algorithms namely the improved ViBE that uses n recent frames to model the background, the adaptive ViBE that changes the radius accordingly and the proposed EFF-ViBE. The following results were obtained

  • A.    Fast Elimination of Ghost Phenomenon

When a series of pixels are detected as moving targets, but these pixels are not really moving objects[27], we have the apparition of ghost phenomenon. The algorithm therefore needs to quickly eliminate it apparition. However the original ViBE do not delete it as quick as we want, this is one of its major drawbacks. To overcome this difficulty, we use the cumulative mean computed using the n recent frames to reinforce the pixel classification.

Table.1. Parameters used in our implementation

parameters

values

number of frames

30

U min

2

β

3

time sampling ϕ

16

counter max

10

ε c

0.08

ε d

0.35

δ

5

Fig.3. Functional architecture of the system. Here, all the parts of the system are shown.

The justification of the choice of value n has been mentioned in section IV.A. We also introduce the pixel counting mechanism which classifies as background a pixel that stays in foreground for more than countermax consecutive frames. This enables to boost up ghost elimination. To clearly illustrate how our approach eliminates ghost quicker than the original ViBE, we have used the winter Driveway dataset where two cars are at a stationary position for a long period (more than 1800 frames) before one of the two decides to move. The following observations have been made:

  •    The original ViBE proposed in 2011 (see line c of Figure 4) needs a longer period (more than 600 frames) to eliminates ghost apparition

  •    For version proposed in 2014 by the same author (see line d of Figure 4), ghost phenomenon elimination is much better. However the accuracy and quality of results have decreased.

  •    On the other hand, EFF-ViBE (see line e of Figure 4) needs less than 200 frames to perform the same task. In addition, the detected ghost is quickly eliminated (already eliminated at frame 2100) while it is still present at frame 2100 and at frame 2200 in both original and improved ViBE. The approach therefore outperforms the existing ViBE algorithms so can be adopted in complex background.

  • B.    Effect of Adaptive Radius

As said earlier, for complex dynamic background, the radius should be increased appropriately, so that the background cannot be easily detected as foreground. This is because in basic ViBE algorithm, model matching always uses a fixed global radius. Yet, Le Chang et al[31] have shown by experiments that the simple radius policy ignores the complexity of the local environment and the uncertainty of changes. So it is therefore difficult to detect the target effectively in complex environment. That is why our third contribution consists of computing for each pixel a modified adaptive radius (we have decided to denote Rad) that depends on a variable parameter dmean computed using equation 9. It also uses fixed parameters εd = 0.08, εd = 0.35, δ = 5.

the dataset; (c) the detection results of the original ViBE; (d) the detection results of improved version of ViBE;(e) the detection results of EFF-ViBE proposed method.

To illustrate the effect of adaptive radius we have used the canoe data-set which consists of a canoe moving on sea disturbed by win that creates small waves on the sea surface. The following observations have been made:

  •    Normally the disturbance at the sea surface belongs to background but the original ViBE (see line c of Figure 5) tends to detect it as foreground

  •    The improvement (2014) tries to solve this failure (see line d of Figure 5), but the quality of the detected moving boat becomes poorer.

  •    EFF-ViBE on other hand solves these problems by computing the radius in an adaptive manner, consequently the effect of the waving background

is considerably reduced (see line e of Figure 5). Once more our approach outperformed original ViBE so can be used in complicated background.

  • C.    Measure of Performances

We have recorded the precision, recall and F-measure of nine algorithms: the original ViBE algorithm, the execution of ViBE using cumulative mean and pixel counting mechanism, the execution of ViBE using adaptive radius, the proposed EFF-ViBE method and some challenging algorithms such as Codebook[17], KDE[14], MOG[6], PBAS[36] and SUBSENSE[37]. The results are shown in Table 2, Table 3 and Table 4.

results of the original ViBE;(d) the detection results of improved version of ViBE; (e) the detection results of EFF-Vibe proposed method.

The datasets used are those of CDNet 2014 and to have the results we have used the BMC-wizard software to compute those values. The process is as follows, we take the segmented images obtained by the chosen algorithm as first set of parameters and the second set of parameters to the BMC-wizard is the original ground truth of the input data provided by the dataset.

Mathematically, the precision Pr , recall R and the F-measure F - measure are computed using the formula

Pr ecision ( Pr ) = ——— v ’ TP + FP

Re call ( R ) =

TP

TP + FN

F - MeasuredF ) =

2 x Pr x R Pr + R

where TP denote the number of foreground pixels truly classified as foreground, FP the number of background pixels wrongly classified as foreground and FN the number of foreground pixels wrongly classified as background.

  •    Quantitative analysis

The analysis of Table 2 shows how EFF-ViBE performs in terms of recall compared to the mentioned algorithms and it can be seen that it outperforms all the chosen algorithms with an average percentage of 65% of the cases (i.e among the 53 different challenging background situations proposed by the dataset) except PBAS algorithm where it outperforms only on 30% of the cases. The same observations can be made on Table 3, and here the results are even better for the average percentage of out-performance is about 90% for the other algorithm and 50% for PBAS algorithm. Finally combining these results to measure the F-measure (also called F-score) gives rises to Table 4 where the average percentage of outperformance is 85% for other algorithms and 40% for PBAS algorithm.

  •    Qualitative analysis

After having analyzed quantitatively the results, let us look at the quality aspect. Figure 6 presents the results obtained from different executions of algorithms for frames taken randomly in four datasets namely blizzard (1), park (2), sofa (3) and turnpike (4). Line (A) is the input frame, Line (B) is the ground truth image, Line (C) execution of PBAS algorithm, Line (D) execution of Subsense algorithm, Line (E) execution of Original ViBE algorithm, Line(F) execution of improved ViBE algorithm with n frames initialization, Line (G) execution of adaptive ViBE algorithm and Line (H) execution of EFF-ViBE algorithm.

It can therefore be seen how accurate are our proposed approach results compared to the other algorithms in relation to the ground truth image.

We can also notice that, even though at the quantitative analysis, the PBAS results were outperforming our approach; it is no more the case if we analyze the pictures. This was because PBAS detects a lot of background pixel as foreground pixel which was leading to modification of detection parameters (TP, FP and FN) and consequently good quantitative results were obtained.

Fig.6. Analysis of results

Table 2. Recapitulating table of Recall

Category

Dataset

PBAS

SUBSENSE

ViBE

ViBE A.

ViBE I.

EFF-ViBE

Bad weather

blizzard

0.508

0.508

0.503

0.503

0.503

0.506

skating

0.535

0.535

0.528

0.528

0.532

0.536

snowfall

0.498

0.508

0.481

0.506

0.508

0.506

wet snow

0.507

0.506

0.504

0.509

0.501

0.507

Baseline

highway

0.664

0.599

0.570

0.577

0.570

0.598

office

0.613

0.582

0.548

0.591

0.599

0.553

pedestrians

0.707

0.518

0.611

0.632

0.658

0.521

pet2006

0.542

0.544

0.527

0.521

0.527

0.542

Camera jitter

badminton

0.607

0.529

0.529

0.521

0.514

0.549

boulevard

0.565

0.519

0.516

0.557

0.549

0.513

sidewalk

0.565

0.526

0.516

0.517

0.516

0.523

traffic

0.577

0.545

0.571

0.550

0.545

0.579

Dynamic background

boats

0.513

0.513

0.502

0.508

0.505

0.509

canoe

0.525

0.506

0.536

0.535

0.530

0.530

fall

0.643

0.544

0.541

0.529

0.526

0.559

fountain1

0.566

0.502

0.505

0.525

0.524

0.508

fountain2

0.669

0.502

0.526

0.523

0.524

0.503

overpass

0.554

0.512

0.511

0.511

0.511

0.512

Intermittent object motion

abandon B.

0.543

0.533

0.479

0.488

0.497

0.524

parking

0.534

0.500

0.517

0.519

0.513

0.500

sofa

0.710

0.576

0.609

0.604

0.590

0.770

streetlight

0.615

0.520

0.552

0.549

0.542

0.570

tram stop

0.553

0.526

0.512

0.507

0.515

0.574

Win.D.W.S.

0.557

0.501

0.538

0.535

0.531

0.602

Low frame rate

port017fps

0.534

0.499

0.508

0.507

0.507

0.500

tram C.

0.544

0.512

0.534

0.532

0.530

0.515

Tunnel E.

0.514

0.500

0.515

0.514

0.514

0.515

turnpike

0.589

0.533

0.547

0.544

0.541

0.538

Nigth video

bridge E.

0.542

0.501

0.501

0.503

0.502

0.503

busy B.

0.534

0.501

0.501

0.501

0.501

0.501

fluid high.

0.581

0.505

0.503

0.548

0.543

0.543

StreetC.A.N.

0.510

0.503

0.502

0.502

0.502

0.502

tramstation

0.551

0.514

0.508

0.518

0.508

0.508

wint. Street

0.604

0.508

0.525

0.525

0.539

0.510

PTZ

Cont. p.

0.540

0.507

0.518

0.505

0.524

0.522

Intermi. p.

0.527

0.511

0.515

0.512

0.519

0.521

Cam. PTZ

0.548

0.541

0.532

0.522

0.517

0.531

Zoom

0.586

0.505

0.438

0.439

0.450

0.458

Shadow

backdoor

0.599

0.542

0.541

0.538

0.536

0.525

bungalows

0.651

0.589

0.583

0.581

0.577

0.561

bustation

0.728

0.567

0.566

0.564

0.566

0.526

copy M.

0.675

0.655

0.537

0.536

0.532

0.578

cubicle

0.683

0.558

0.621

0.619

0.612

0.532

people S.

0.740

0.606

0.588

0.583

0.581

0.576

Thermal

corridor

0.738

0.622

0.604

0.601

0.590

0.610

dining r.

0.699

0.614

0.612

0.607

0.594

0.670

lakeside

0.634

0.538

0.548

0.545

0.531

0.589

library

0.612

0.591

0.567

0.564

0.555

0.624

park

0.670

0.509

0.628

0.503

0.627

0.677

Turbulence

turbulence0

0.533

0.502

0.500

0.5003

0.500

0.501

turbulence1

0.513

0.501

0.503

0.503

0.503

0.518

turbulence2

0.500

0.501

0.501

0.501

0.501

0.500

turbulence3

0.527

0.506

0.504

0.504

0.503

0.529

Table 3. Recapitulating table of precision

Category

Dataset

PBAS

SUBSENSE

ViBE

ViBE A.

ViBE I.

EFF-ViBE

Bad weather

blizzard

0.698

0.703

0.714

0.714

0.714

0.714

skating

0.539

0.634

0.632

0.634

0.692

0.692

snowfall

0.497

0.704

0.472

0.715

0.545

0.711

wet snow

0.651

0.655

0.631

0.548

0.507

0.645

Baseline

highway

0.867

0.854

0.844

0.844

0.842

0.852

office

0.833

0.839

0.834

0.594

0.616

0.827

pedestrians

0.746

0.856

0.592

0.627

0.693

0.848

pet2006

0.833

0.858

0.857

0.590

0.621

0.843

Camera jitter

badminton

0.612

0.632

0.616

0.590

0.637

0.639

boulevard

0.677

0.694

0.623

0.690

0.686

0.567

sidewalk

0.554

0.545

0.511

0.531

0.531

0.538

traffic

0.687

0.688

0.678

0.687

0.684

0.679

Dynamic background

boats

0.702

0.699

0.544

0.575

0.553

0.628

canoe

0.659

0.654

0.629

0.629

0.628

0.639

fall

0.643

0.848

0.669

0.657

0.666

0.648

fountain1

0.817

0.763

0.612

0.688

0.687

0.633

fountain2

0.837

0.821

0.699

0.705

0.703

0.807

overpass

0.826

0.801

0.728

0.740

0.737

0.792

Intermittent object motion

abandon B.

0.514

0.513

0.494

0.496

0.499

0.509

parking

0.507

0.503

0.506

0.506

0.506

0.504

sofa

0.652

0.873

0.884

0.878

0.885

0.824

streetlight

0.534

0.608

0.521

0.521

0.524

0.522

tram stop

0.509

0.508

0.502

0.501

0.503

0.506

Win.D.W.S.

0.737

0.518

0.689

0.691

0.686

0.675

Low frame rate

port017fps

0.543

0.440

0.530

0.530

0.528

0.557

tram C.

0.530

0.529

0.533

0.532

0.532

0.529

Tunnel E.

0.545

0.504

0.573

0.573

0.573

0.573

turnpike

0.569

0.601

0.598

0.598

0.598

0.598

Nigth video

bridge E.

0.521

0.504

0.508

0.517

0.513

0.521

busy B.

0.523

0.506

0.511

0.513

0.512

0.511

fluid high.

0.631

0.542

0.542

0.617

0.620

0.620

StreetC.A.N.

0.631

0.627

0.590

0.579

0.579

0.596

tramstation

0.585

0.575

0.569

0.553

0.579

0.569

wint. Street

0.558

0.519

0.515

0.516

0.531

0.536

PTZ

Cont. p.

0.559

0.539

0.539

0.513

0.562

0.554

Intermi. p.

0.532

0.552

0.535

0.537

0.551

0.558

Cam. PTZ

0.529

0.583

0.550

0.565

0.581

0.574

Zoom

0.596

0.516

0.423

0.424

0.425

0.430

Shadow

backdoor

0.845

0.829

0.775

0.779

0.776

0.822

bungalows

0.777

0.790

0.746

0.749

0.758

0.785

bustation

0.740

0.841

0.604

0.610

0.617

0.857

copy M.

0.712

0.905

0.708

0.721

0.713

0.884

cubicle

0.818

0.879

0.660

0.662

0.677

0.889

people S.

0.878

0.877

0.674

0.672

0.680

0.880

Thermal

corridor

0.869

0.911

0.937

0.937

0.935

0.868

dining r.

0.794

0.849

0.802

0.818

0.860

0.842

lakeside

0.784

0.875

0.879

0.882

0.910

0.890

library

0.634

0.816

0.818

0.816

0.820

0.816

park

0.668

0.738

0.687

0.527

0.724

0.678

Turbulence

turbulence0

0.609

0.679

0.529

0.527

0.530

0.631

turbulence1

0.541

0.659

0.561

0.561

0.558

0.675

turbulence2

0.501

0.721

0.509

0.509

0.512

0.720

turbulence3

0.645

0.651

0.598

0.599

0.598

0.642

Table 4. Recapitulating table of F-measure

Category

Dataset

PBAS

SUBSENSE

ViBE

ViBE A.

ViBE I.

EFF-ViBE

Bad weather

blizzard

0.588

0.590

0.590

0.590

0.590

0.592

skating

0.537

0.604

0.575

0.577

0.601

0.604

snowfall

0.498

0.590

0.476

0.591

0.526

0.591

wet snow

0.570

0.572

0.560

0.528

0.504

0.567

Baseline

highway

0.752

0.704

0.686

0.685

0.680

0.703

office

0.706

0.687

0.661

0.593

0.608

0.662

pedestrians

0.743

0.645

0.601

0.629

0.675

0.645

pet2006

0.657

0.666

0.653

0.554

0.570

0.666

Camera jitter

badminton

0.609

0.576

0.569

0.554

0.569

0.590

boulevard

0.616

0.593

0.564

0.616

0.610

0.539

sidewalk

0.559

0.535

0.514

0.524

0.524

0.531

traffic

0.627

0.608

0.620

0.611

0.607

0.625

Dynamic background

boats

0.593

0.591

0.522

0.539

0.528

0.562

canoe

0.585

0.570

0.579

0.578

0.575

0.579

fall

0.643

0.663

0.598

0.586

0.587

0.600

fountain1

0.669

0.605

0.553

0.596

0.594

0.564

fountain2

0.744

0.623

0.601

0.600

0.600

0.620

overpass

0.663

0.625

0.600

0.605

0.603

0.622

Intermittent object motion

abandon B.

0.528

0.523

0.486

0.492

0.498

0.517

parking

0.520

0.502

0.511

0.513

0.509

0.502

sofa

0.680

0.694

0.721

0.715

0.708

0.795

streetlight

0.572

0.524

0.536

0.534

0.533

0.545

tram stop

0.530

0.517

0.507

0.504

0.509

0.538

Win.D.W.S.

0.635

0.509

0.605

0.603

0.599

0.637

Low frame rate

port017fps

0.538

0.468

0.519

0.518

0.517

0.527

tram C.

0.537

0.520

0.533

0.532

0.531

0.522

Tunnel E.

0.529

0.502

0.543

0.542

0.542

0.543

turnpike

0.579

0.565

0.571

0.570

0.568

0.571

Nigth video

bridge E.

0.531

0.502

0.504

0.510

0.507

0.512

busy B.

0.529

0.503

0.522

0.507

0.507

0.506

fluid high.

0.605

0.523

0.522

0.581

0.581

0.579

StreetC.A.N.

0.564

0.558

0.542

0.538

0.538

0.546

tramstation

0.567

0.543

0.537

0.535

0.539

0.537

wint. Street

0.581

0.513

0.520

0.521

0.535

0.523

PTZ

Cont. p.

0.550

0.522

0.529

0.509

0.542

0.538

Intermi. p.

0.530

0.531

0.525

0.524

0.534

0.539

Cam. PTZ

0.538

0.561

0.541

0.542

0.547

0.551

Zoom

0.591

0.511

0.430

0.431

0.432

0.443

Shadow

backdoor

0.701

0.656

0.637

0.636

0.634

0.641

bungalows

0.708

0.674

0.655

0.655

0.655

0.655

bustation

0.737

0.678

0.584

0.586

0.591

0.652

copy M.

0.693

0.760

0.611

0.615

0.610

0.699

cubicle

0.744

0.683

0.640

0.619

0.643

0.666

people S.

0.803

0.717

0.628

0.625

0.622

0.696

Thermal

corridor

0.798

0.739

0.735

0.733

0.724

0.717

dining r.

0.744

0.712

0.699

0.697

0.703

0.746

lakeside

0.701

0.666

0.675

0.674

0.671

0.709

library

0.622

0.685

0.670

0.667

0.662

0.708

park

0.669

0.602

0.656

0.513

0.672

0.677

turbulence

turbulence0

0.568

0.577

0.514

0.513

0.514

0.558

turbulence1

0.527

0.569

0.530

0.531

0.529

0.585

turbulence2

0.500

0.591

0.505

0.505

0.506

0.590

turbulence3

0.580

0.570

0.547

0.547

0.547

0.580

  • VI. Conclusion

This paper has presented an efficient background subtraction algorithm based of ViBE for complex background. For that, we studied related work done on the original ViBE algorithm with some improvements. Later on, the EFF-ViBE proposed method has been explained in details with all the needed parameters. It combines an improved ViBE that uses a cumulative mean and a pixel counting mechanism to quickly eliminate ghost and on an adaptive ViBE that computes an adaptive radius depending on background variation. Finally, the obtained results have been presented showing the efficiency of the method in comparison with the existing ones. With an average frame rate of 30 fps it can also been used in real time applications. For future work, we intend to speed-up the proposed method and see how it can be adapted for further applications.

Список литературы EFF-ViBE: an efficient and improved background subtraction approach based on ViBE

  • S. S. Hossain, S. Khalid, C. Nabendu, Moving Object Detection Using Background Subtraction, Springer Briefs in Computer Sciences Sciences, Sprnger, Cham, 2014.
  • G. Yao, T. Lei, J. Zhong, P. Jiang, W. Jia, Comparative evaluation of background subtraction algorithms in remote scene videos capturedby mwir sensors, Sensors (2017).
  • F. Zeng, G. Zhang, J. Jiang, Text image with complex background filtering method based on harris corner-point detection, journal of software 8 (8) (2013).
  • D. Li, Y. Li, F. He, S. Wang, Object detection in image with complex background, Third International Conference on Multimedia Technology(2013).
  • O. Barnich, M. V. Droogenbroeck, Vibe: A powerful random technique to estimate the background in video sequences, International Conference on Acoustics, Speech, and Signal Processing (ICASSP)(2009) 19–24.
  • C. Stauffer, E. Grimson, Adaptive background mixture models for real-time tracking, Computer Vision and Pattern Recognition (1999) 246–252.
  • B. Han, X. Lin, Update the GMMs via adaptive Kalman filtering,International Society for Optical Engineering (2005) 1506–1515
  • Y. Hong, Y. Tan, J. Tian, J. Liu, Accurate dynamic scene model for moving object detection, International Conference on Image Processing (ICIP) (2007) 157–160.
  • W. Zhang, X. Fang, X. Yang, J. Wu, Spatio-temporal Gaussian mixture model to detect moving objects in dynamic scenes., Journal of Electronic Imaging (2007).
  • P. Tang, L. Gao, Z. Liu, Salient moving object detection using stochastic approach filtering, Fourth International Conference on Image and Graphics (ICIG (2007) 530–535.
  • M. Harville, A framework for high-level feedback to adaptive, perpixel, mixture-of-Gaussian background models, 7th European Conference on Computer Vision (ECCV) (2002) 543–560
  • M. Cristani, V. Murino, A spatial sampling mechanism for effective background subtraction, 2nd International Conference on Computer Vision Theory and Applications (VISAPP) (2007) 403–410
  • A. Elgammal, D. Harwood, L. Davis, Non-parametric model for background subtraction, European Conference on Computer Vision(2000).
  • A. Elgammal, R. Duraiswami, D. Harwood, L. Davis, Background and foreground modeling using nonparametric kernel density estimation for visual surveillance, Proceedings of the IEEE 90 (2002) 1151–1163.
  • Park, C. Lee, Bayesian rule-based complex background modeling and foreground detection optical engineering, Optical Engineering (2010)
  • P. Angelov, P. S. Tehran, R. Ramezani, An approach to automatic real-time novelty detection, object identification, and tracking in video streams based on recursive density estimation and evolving Takagi-Sugeno Fuzzy systems, International Journal of Intelligent Systems (2011) 189–205.
  • K. Kim, T. H. Chalidabhongse, D. Harwood, L. Davis, Real-timeforeground background segmentation using codebook model, Real time imaging 11 (2005) 172–185.
  • N. M. Oliver, B. Rosario, A. P. Pentland, A bayesian computer vision system for modeling human interactions, IEEE transactions on pattern analysis and machine intelligence 22 (2000) 831–843.
  • D. Kuzin, O. Isupova, L. Mihaylova, Compressive sensing approaches for autonomous object detection in video sequences, Sensor Data Fusion: Trends, Solutions, Applications (SDF) (2015).
  • V. Cevhe Aswin, S. M. F., D. Dikpal, R. R. G., B. R. Chellappa,Compressive sensing for background subtraction, European Conferenceon Computer Vision (2008) 155–168.
  • V. Cevher, A. Sankaranarayanan, M. F., D. Dikpal, R. R. G., B. R. Chellappa, Background subtraction using spatio-temporal group sparsity recovery, European Conference on Computer Vision(2008) 155–168.
  • A. Azeroual, K. Afdel, Ieee transactions on circuits and systems for video technology: Background subtraction based on low-rank and structured sparse decomposition, IEEE Trans. Image Process 24 (08) (2017) 2502–2514.
  • A. Azeroual, K. Afdel, Fast image edge detection based on faber schauder wavelet and otsu threshold, Heliyon 3 (2017), doi: 10.1016/j.heliyon.2017.e00485.
  • M. Nishio, C. Nagashima, S. Hirabayashi, A. Ohnishi, K. Sasaki, T. Sagawa, M. Hamada, T. Yamashita., Convolutional auto-encoderfor image denoising of ultralow-dose ct, Heliyon 3 (2017), doi:10.1016/j.heliyon.2017.e00393.
  • M. Braham, M. V. Droogenbroeck, Deep background subtraction with scene-specific convolutional neural networks, International Conference on Systems, Signals and Image Processing (2016).
  • Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proceeding of IEEE 86 (1998) 2278–2324.
  • O. BARNICH, M. V. DROOGENBROECK, Vibe : A universal background subtraction algorithm for video sequences., IEEE Transactions on Image Processing (2011) 1709–1724.
  • A. Elgammal, D. Harwood, L. Davis, Non-parametric model for background subtraction, 6thEuropean Conference on Computer Vision Part II, Springer 1843 (2000) 751–767.
  • H. Wang, D. Suter, A consensus-based method for tracking: Modeling background scenario and foreground appearance, Pattern Recognition40 (2007) 1091–1105.
  • H. Batao, Y. Shaohua, An improved background subtraction method based on vibe, Chinese Conference on Pattern Recognition, Springer 662 (2016) 356–368
  • L. Chang, Zhenghua, Y. Ren, Improved adaptive vibe and the application for segmentation of complex background, Hindawi Publishing Corporation, Mathematical Problems in Engineering (2016).
  • Itseez, Open source computer vision library, https://github.com/itseez/opencv (2015).
  • Y. Wang, G. P. M. Jodoin, F. Porikli, J. Konrad, P. Ishwar, Y. Benezeth, P. Ishwar, Cdnet 2014: An expanded change detection benchmark dataset, 2014 IEEE Conference eon computer Vision and Pattern Recognition Workshops (2014) 16–21.
  • B. Laugraud, vibe-sources, source code in c/c++! original implementation. example for opencv, University of Liege, Belgium, http://orbi.ulg.ac.be /bitstream/2268/145853/5/vibe- sources.zip (2014).
  • A. Sobral, Bgslibrary: An opencv c++ background subtraction library,In Proceedings of the 2013 IX Workshop de Viso Computacional, Rio de Janeiro, Brazil (2013) 3–5.
  • M. Hofmann, P. Tiefenbacher, G. Rigoll, Background segmentation with feedback: The pixel-based adaptive segmenter, In Proceedings IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence , RI, USA (2012).
  • St-Charles, G. A. Bilodeau, R. Bergevin, Flexible background subtraction with self-balanced local sensitivity, In Proceedings of the 27th IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA (2014) 414–419.
Еще
Статья научная