Статьи журнала - International Journal of Image, Graphics and Signal Processing
Все статьи: 1092
Speaker Emotion Recognition based on Speech Features and Classification Techniques
Статья научная
Speech Processing has been developed as one of the vital provision region of Digital Signal Processing. Speaker recognition is the methodology of immediately distinguishing who is talking dependent upon special aspects held in discourse waves. This strategy makes it conceivable to utilize the speaker's voice to check their character and control access to administrations, for example voice dialing, data administrations, voice send, and security control for secret information. A review on speaker recognition and emotion recognition is performed based on past ten years of research work. So far iari is done on text independent and dependent speaker recognition. There are many prosodic features of speech signal that depict the emotion of a speaker. A detailed study on these issues is presented in this paper.
Бесплатно
Speaker Identification using SVM during Oriya Speech Recognition
Статья научная
In this research paper, we have developed a system that identifies users by their voices and helped them to retrieve the information using their voice queries. The system takes into account speaker identification as well as speech recognition i.e. two pattern recognition techniques in speech domain. The conglomeration of speaker identification task and speech recognition task provides multitude of facilities in comparison to isolated approach. The speaker identification task is achieved by using SVM where as speech recognition is based on HMM. We have used two different types of corpora for training the system. Gamma tone cepstral coefficients and mel frequency cepstral coefficients are extracted for speaker identification and speech recognition respectively. The accuracy of the system is measured from two perspective i.e. accuracy of speaker identity and accuracy of speech recognition task. The accuracy of the speaker identification is enhanced by adopting the speech recognition at the initial stage of speaker identification.
Бесплатно
Speaker Recognition in Mismatch Conditions: A Feature Level Approach
Статья научная
Mismatch in speech data is one of the major reasons limiting the use of speaker recognition technology in real world applications. Extracting speaker specific features is a crucial issue in the presence of noise and distortions. Performance of speaker recognition system depends on the characteristics of extracted features. Devices used to acquire the speech as well as the surrounding conditions in which speech is collected, affects the extracted features and hence degrades the decision rates. In view of this, a feature level approach is used to analyze the effect of sensor and environment mismatch on speaker recognition performance. The goal here is to investigate the robustness of segmental features in speech data mismatch and degradation. A set of features derived from filter bank energies namely: Mel Frequency Cepstral Coefficients (MFCCs), Linear Frequency Cepstral Coefficients (LFCCs), Log Filter Bank Energies (LOGFBs) and Spectral Subband Centroids (SSCs) are used for evaluating the robustness in mismatch conditions. A novel feature extraction technique named as Normalized Dynamic Spectral Features (NDSF) is proposed to compensate the sensor and environment mismatch. A significant enhancement in recognition results is obtained with proposed feature extraction method.
Бесплатно
Speckle Reduction with Edge Preservation in B-Scan Breast Ultrasound Images
Статья научная
Speckle is a multiplicative noise that degrades the quality of ultrasound images and its presence makes the visual inspection difficult. In addition, it limits the professional application of image processing techniques such as automatic lesion segmentation. So speckle reduction is an essential step before further processing of ultrasonic images. Numerous techniques have been developed to preserve the edges while reducing speckle noise, but these filters avoid smoothing near the edges to preserve fine details. The objective of this work is to suggest a new technique that enhances B-Scan breast ultrasound images by increasing the speckle reduction capability of an edge sensitive filter. In the proposed technique a local statics based filter is applied in the non homogeneous regions, to the output of an edge preserving filter and an edge map is used to retain the original edges. Experiments are conducted using synthetic test image and real time ultrasound images. The effectiveness of the proposed technique is evaluated qualitatively by experts and quantitatively in terms of various quality metrics. Results indicate that proposed method can reduce more noise and simultaneously preserve important diagnostic edge information in breast ultrasound images.
Бесплатно
Spectral Subtractive-Type Algorithms for Enhancement of Noisy Speech: An Integrative Review
Статья научная
The spectral subtraction method is a classical approach for enhancement of speech degraded by additive background noise. The basic principle of this method is to estimate the short-time spectral magnitude of speech by subtracting estimated noise spectrum from the noisy speech spectrum. This is also achieved by multiplying the noisy speech spectrum with a gain function and later combining it with the phase of the noisy speech. Besides reducing the background noise, this method introduces an annoying perceptible tonal characteristic in the enhanced speech and affects the human listening, known as remnant musical noise. Several variations and implementations of this method have been adopted in past decades to address the limitations of spectral subtraction method. These variations constitute a family of subtractive-type algorithms and operate in frequency domain. The objective of this paper is to provide an extensive overview of spectral subtractive-type algorithms for enhancement of noisy speech. After the review, this paper is concluded by mentioning a future direction of speech enhancement research from spectral subtraction perspective.
Бесплатно
Spectral and Time Based Assessment of Meditative Heart Rate Signals
Статья научная
The objective of this article was to study the effects of Chi meditation on heart rate variability (HRV). For this purpose, the statistical and spectral measures of HRV from the RR intervals were analyzed. In addition, it is concerned with finding adequate Auto-Regressive Moving Average (ARMA) model orders for spectral analysis of the time series formed from RR intervals. Therefore, Akaike's Final Prediction Error (FPE) was taken as the base for choosing the model order. The results showed that overall the model order chosen most frequently for FPE was p = 8 for before meditation and p = 5 for during meditation. The results suggested that variety of orders in HRV models upon different psychological states could be due to some differences in intrinsic properties of the system.
Бесплатно
Speech Emotion Recognition based on SVM as Both Feature Selector and Classifier
Статья научная
The aim of this paper is to utilize Support Vector Machine (SVM) as feature selection and classification techniques for audio signals to identify human emotional states. One of the major bottlenecks of common speech emotion recognition techniques is to use a huge number of features per utterance which could significantly slow down the learning process, and it might cause the problem known as "the curse of dimensionality". Consequently, to ease this challenge this paper aims to achieve high accuracy system with a minimum set of features. The proposed model uses two methods, namely "SVM features selection" and the common "Correlation-based Feature Subset Selection (CFS)" for the feature dimensions reduction part. In addition, two different classifiers, one Support Vector Machine and the other Neural Network are separately adopted to identify the six emotional states of anger, disgust, fear, happiness, sadness and neutral. The method has been verified using Persian (Persian ESD) and German (EMO-DB) emotional speech databases, which yield high recognition rates in both databases. The results show that SVM feature selection method provides better emotional speech-recognition performance compared to CFS and baseline feature set. Moreover, the new system is able to achieve a recognition rate of (99.44%) on the Persian ESD and (87.21%) on Berlin Emotion Database for speaker-dependent classification. Besides, promising result (76.12%) is obtained for speaker-independent classification case; which is among the best-known accuracies reported on the mentioned database relative to its little number of features.
Бесплатно
Speech Enhancement Using Joint Time and DCT Processing for Real Time Applications
Статья научная
Deep learning based speech enhancement approaches provides better perceptual quality and better intelligibility. But most of the speech enhancement methods available in literature estimates enhanced speech using processed amplitude, energy, MFCC spectrum, etc along with noisy phase. Because of difficult in estimating clean speech phase from noisy speech the noisy phase is still using in reconstruction of enhanced speech. Some methods are developed for estimating clean speech phase and it is observed that it is complex for estimation. To avoid difficulty and for better performance rather than using Discrete Fourier Transform (DFT) the Discrete Cosine Transform (DCT) and Discrete Sine Transform (DST) based convolution neural networks are proposed for better intelligibility and improved performance. However, the algorithms work either features of time domain or features of frequency domain. To have advantage of both time domain and frequency domain here the fusion of DCT and time domain approach is proposed. In this work DCT Dense Convolutional Recurrent Network (DCTDCRN), DST Convolutional Gated Recurrent Neural Network (DSTCGRU), DST Convolution Long Short term Memory (DSTCLSTM) and DST Convolutional Gated Recurrent Neural Network (DSTDCRN) are proposed for speech enhancement. These methods are providing superior performance and less processing difficulty when compared to the state of art methods. The proposed DCT based methods are used further in developing joint time and magnitude based speech enhancement method. Simulation results show superior performance than baseline methods for joint time and frequency based processing. Also results are analyzed using objective performance measures like Signal to Noise Ratio (SNR), Perceptual Evaluation of Speech Quality (PESQ) and Short-Time Objective Intelligibility (STOI).
Бесплатно
Статья научная
This paper presents a method to reduce the musical noise encountered with the most of the frequency domain speech enhancement algorithms. Musical Noise is a phenomenon which occurs due to random spectral speaks in each speech frame, because of large variance and inaccurate estimate of spectra of noisy speech and noise signals. In order to get low variance spectral estimate, this paper uses a method based on wavelet thresholding the multitaper spectrum combined with noise estimation algorithm, which estimates noise spectrum based on the spectral average of past and present according to a predetermined weighting factor to reduce the musical noise. To evaluate the performance of this method, sine multitapers were used and the spectral coefficients are threshold using Wavelet thresholding to get low variance spectrum .In this paper, both scale dependent, independent thresholdings with soft and hard thresholding using Daubauchies wavelet were used to evaluate the proposed method in terms of objective quality measures under eight different types of real-world noises at three distortions of input SNR. To predict the speech quality in presence of noise, objective quality measures like Segmental SNR ,Weighted Spectral Slope Distance ,Log Likelihood Ratio, Perceptual Evaluation of Speech Quality (PESQ) and composite measures are compared against wavelet de-noising techniques, Spectral Subtraction and Multiband Spectral Subtraction provides consistent performance to all eight different noises in most of the cases considered.
Бесплатно
Speech Enhancement through Implementation of Adaptive Noise Canceller Using FHEDS Adaptive Algorithm
Статья научная
Speech analysis is the modelling and estimating of the different speech characteristics that would provide the importance on each set of criteria established on the real time applications. One such analytic section in enhancement process on speeches would improve the need of speech enhancement. This paper compares the performance analysis of our proposed Fast Hybrid Euclidean Direction Search (FHEDS) algorithm with other adaptive algorithms such as NHP and FEDS algorithm. These algorithms have been tested for their adaptive noise cancellation of speech signal corrupted by different noises such as Babble, Factory, Destroy Engine, Car, Fire Engine and Train Noises. Ensuring the design criteria with current design limits of the database and its analysis have been encapsulated with each phase of design with Noise model, improving the better performance aspects. The relative factors for comparisons have been tabulated with each set of the noise and clear speech data with proposed filter operation. The proposed model effectively reduces the noise for achieving better speech enhancement. The proposed model achieves high Signal-to-Noise Ratio (SNR) when compared to traditional models.
Бесплатно
Speech Feature Extraction for Gender Recognition
Статья научная
Speech Recognition Technology can be embedded in various real time applications in order to increase the human-computer interaction. From robotics to health care and aerospace, from interactive voice response systems to mobile telephony and telematics, speech recognition technology have enhanced the human-machine interaction. Gender recognition is an important component for the application embedding speech recognition as it reduces the computational complexity for the further processing in these applications. The paper involves the extraction of one of the most dominant and most researched up on speech feature, Mel coefficients and its first and second order derivatives. We extracted 13 values for each of these from a data-set 46 speech samples containing the Hindi vowels (आ, इ, ई, उ, ऊ, ऋ, ए, ऎ, ऒ, ऑ) and trained them using a combined model of SVM and neural network classification to determine their gender using stacking. The results obtained showed the accuracy of 93.48% after taking into consideration the first Mel coefficient. The purpose of this study was to extract the correct features and to compare the performance based on first Mel coefficient.
Бесплатно
Spliced image classification and tampered region localization using local directional pattern
Статья научная
In this paper the authors have proposed a spliced image detection algorithm based on Local Directional Pattern (LDP). The output of many splicing detection techniques is either to classify spliced image from authentic images or to localize the spliced region. But the proposed algorithm has ability to classify and to localize the spliced region. First, the original image (RGB color space) is converted to Ycbcr color space. The histogram of LDP of chrominance component of suspect image is used in classification. Whereas for localization of spliced region, the chrominance component of input image is divide into overlapping blocks; then, the LDP of each block is calculated. The standard deviation of each block is used as clue to visualize the spliced region. The experimental results are calculated in terms of accuracy, specificity (true negative tare), sensitivity (true positive rate) and error rate and proves effectiveness of the proposed algorithm. The accuracy of the proposed algorithm is 98.55 %. The algorithm is also robust against post splicing image processing operation such as gaussian blur, additive white gaussian noise, JPEG compression and scaling however, previous techniques have not considered these experimental environment.
Бесплатно
Stabilogram mPCA Decomposition and Effects Analysis of Several Entries on The Postural Stability
Статья научная
This paper presents an analysis of stabilogram using the modified Principal Component Analysis (mPCA) decomposition which will be employed to highlight the effects of different aspects on the human postural stability. The aim of this study is to analyze stabilogram center of pressure time series using the mPCA decomposition method. The mPCA is a decomposition method applied to a complex signal. It decomposes the stabilogram, considered as an additive model, into three components: trend, rambling and trembling. The study of the trace of analytic trembling (respectively of rambling) in the complex plan highlights a unique rotation center. So the phase is defined and two parameters are extracted: the area of the circle in which 95% of the trace's data points are located and the angular frequency. In this study 25 healthy volunteers (average age 31± 11 years) are required to stand upright on an electromagnetic platform either with eyes closed or open and with feet outspread or tighten. Experimental results show the efficiency of the parameter area to identify the effect of visual, proprioceptive and directional entries on the postural stability. These results are able to discriminate between control and young groups and indicate a less well-controlled posture for control subjects (34.5± 7.5y) relatively to young subjects (22.5 ±2. 5y). Results serve also to display that female subjects are more stable than males, that fat subjects are more stable than thin and that tall subjects are more stable than small.
Бесплатно
Statistical Image Classification for Image Steganographic Techniques
Статья научная
Steganography is the method of information hiding. Free selection of cover image is a particular preponderance of steganography to other information hiding techniques. The performance of steganographic system can be improved by selecting the reasonable cover image. This article presents two level unsupervised image classification algorithm based on statistical characteristics of the image which helps Sender to make reasonable selection of cover image to enhance performance of steganographic method based on his specific purpose. Experiments demonstrate the effect of classification in satisfying steganography requirements.
Бесплатно
Statistical Texture Features Based Automatic Detection and Classification of Diabetic Retinopathy
Статья научная
Diabetes is a globally prevalent disease that can cause microvascular compilation such as Diabetic Retinopathy (DR) in the human eye organs and it might prompt a significant reason for visual deficiency. The present study aimed to develop an automatic detection and classification system to diagnosing diabetic retinopathy from digital fundus images. An automated diabetic retinopathy detection and classification system from retinal images is proposed in our work to reduce the workload of ophthalmologists. This work comprises three main stages. Our proposed method first extracts the blood vessels from color fundus image. Secondly, the method detects whatever the input image as normal or diabetic retinopathy and then illustrates an automatic diabetic retinopathy classification technique through statistical texture features. It embeds Gray Level Co-occurrence Matrix (GLCM) and Gray Level Run Length Matrix (GLRLM) for second-order and higher-order statistical texture feature as a feature extraction technique into three renowned classifiers namely K-Nearest Neighbor (KNN), Random Forest (RF) and Support Vector Machine (SVM). The evaluation results containing a dataset of 644 retinal images indicate that the proposed method based on random forest classifier is found to be effective with a weighted sensitivity, precision, F1-score and accuracy of 95.53% 96.45%, 95.38% and 95.19% respectively for the detection and classification of diabetic retinopathy. These outcomes propose, that the method could decrease the cost of screening and diagnosis while achieving higher than suggested performance and that the system could be implemented in clinical assessments requiring better evaluating.
Бесплатно
Steganography Based on Integer Wavelet Transform and Bicubic Interpolation
Статья научная
Steganography is the art and science of hiding information in unremarkable cover media so as not to observe any suspicion. It is an application under information security field, being classified under information security, Steganography will be characterized by having set of measures that rely on strengths and counter attacks that are caused by weaknesses and vulnerabilities. The aim of this paper is to propose a modified high capacity image steganography technique that depends on integer wavelet transform with acceptable levels of imperceptibility and distortion in the cover image as a medium file and high levels of security. Bicubic interpolation causes overshoot, which increases acutance (apparent sharpness). The Bicubic algorithm is frequently used for scaling images and video for display. The algorithm preserves fine details of the image better than the common bilinear algorithm.
Бесплатно
Stochastic Characterization of a MEMs based Inertial Navigation Sensor using Interval Methods
Статья научная
The aim here remains to introduce effectiveness of interval methods in analyzing dynamic uncertainties for marine navigational sensors. The present work has been carried out with an integrated sensor suite consisting of a low cost MEMs inertial sensor, GPS receiver of moderate accuracy, Doppler velocity profiler and a magnetic fluxgate compass. Error bounds for all the sensors have been translated into guaranteed intervals. GPS based position intervals are fed into a forward-backward propagation method in order to estimate interval valued inertial data. Dynamic noise margins are finally computed from comparisons between the estimated and measured inertial quantities It has been found that the intervals as estimated by proposed approach are supersets of 95% confidence levels of dynamic errors of accelerations. This indicates a significant drift of dynamic error in accelerations which may not be clearly defined using stationary error bounds. On the other side bounds of non-stationary error for rate gyroscope are found to be in consistence with the intervals as predicted using stationary noise coefficients. The guaranteed intervals estimated by the proposed forward backward contractor, are close to 95% confidence levels of stationary errors computed over the sampling period.
Бесплатно
Статья научная
Texture deals with the visual properties of an image. Texture analysis plays a dominant role for image segmentation. In texture segmentation, model based methods are superior to model free methods with respect to segmentation methods. This paper addresses the application of multivariate generalized Gaussian mixture probability model for segmenting the texture of an image integrating with hierarchical clustering. Here the feature vector associated with the texture is derived through DCT coefficients of the image blocks. The model parameters are estimated using EM algorithm. The initialization of model parameters is done through hierarchical clustering algorithm and moment method of estimation. The texture segmentation algorithm is developed using component maximum likelihood under Bayesian frame. The performance of the proposed algorithm is carried through experimentation on five image textures selected randomly from the Brodatz texture database. The texture segmentation performance measures such as GCE, PRI and VOI have revealed that this method outperform over the existing methods of texture segmentation using Gaussian mixture model. This is also supported by computing confusion matrix, accuracy, specificity, sensitivity and F-measure.
Бесплатно
Study for License Plate Detection
Статья научная
License Plate Detection (LPD) system is the application of computer vision and image processing technology. LPD system is the first and main step of License Plate Recognition (LPR) system. So, it performs as the main driver of the LPR system. License plate detection step is always performed in front of the license plate recognition step. LPD system takes the vehicle images as input, follows with the general steps: such as reprocessing, localization, region extraction, and region detection, and the detected image are the output of the system. There are many algorithms for LPD while detecting a license plate in different conditions is still a complex task. For the LPD system, morphological operation and deep learning model are mostly used. This paper presents the critical study of the license plate detection system and also examines the implementation of new technologies of the license plate detection system.
Бесплатно
Study of Noise Detection and Noise Removal Techniques in Medical Images
Статья научная
In this work we taken different medical images like MRI, Cancer, X-ray, and Brain and calculated standard derivations and mean of all these medical images. To finding salt & pepper noise and then applied median filtering technique for removal of noise. After removing a noise by using median filtering techniques again standard derivations and mean are evaluated. This experimental analysis will improve the accuracy of MRI, Cancer, X-ray and Brain images for easy diagnosis. The results, which we have achieved, are more useful and they prove to be helpful for general medical practitioners to analyze the symptoms of the patients.
Бесплатно