Статьи журнала - International Journal of Intelligent Systems and Applications
Все статьи: 1159
Covering Based Optimistic Multigranular Approximate Rough Equalities and their Properties
Статья научная
Since its inception rough set theory has proved itself to be one of the most important models to capture impreciseness in data. However, it was based upon the notion of equivalence relations, which are relatively rare as far as applicability is concerned. So, the basic rough set model has been extended in many directions. One of these extensions is the covering based rough set notion, where a cover is an extension of the concept of partition; a notion which is equivalent to equivalence relation. From the granular computing point of view, all these rough sets are unigranular in character; i.e. they consider only a singular granular structure on the universe. So, there arose the necessity to define multigranular rough sets and as a consequence two types of multigranular rough sets, called the optimistic multigranular rough sets and pessimistic rough sets have been introduced. Four types of covering based optimistic multigranular rough sets have been introduced and their properties are studied. The notion of equality of sets, which is too stringent for real life applications, was extended by Novotny and Pawlak to define rough equalities. This notion was further extended by Tripathy to define three more types of approximate equalities. The covering based optimistic versions of two of these four approximate equalities have been studied by Nagaraju et al recently. In this article, we study the other two cases and provide a comparative analysis.
Бесплатно
Статья научная
In this work, the Language Models (LMs) and Acoustic Models (AMs) are developed using the speech recognition toolkit Kaldi for noisy and enhanced speech data to build an Automatic Speech Recognition (ASR) system for Kannada language. The speech data used for the development of ASR models is collected under uncontrolled environment from the farmers of different dialect regions of Karnataka state. The collected speech data is preprocessed by proposing a method for noise elimination in the degraded speech data. The proposed method is a combination of Spectral Subtraction with Voice Activity Detection (SS-VAD) and Minimum Mean Square Error-Spectrum Power Estimator (MMSE-SPZC) based on Zero Crossing. The word level transcription and validation of speech data is done by Indic language transliteration tool (IT3 to UTF-8). The Indian Language Speech Label (ILSL12) set is used for the development of Kannada phoneme set and lexicon. The 75% and 25% of transcribed and validated speech data is used for system training and testing respectively. The LMs are generated by using the Kannada language resources and AMs are developed by using Gaussian Mixture Models (GMM) and Subspace Gaussian Mixture Models (SGMM). The proposed method is studied determinedly and used for enhancing the degraded speech data. The Word Error Rates (WERs) of ASR models for noisy and enhanced speech data are highlighted and discussed in this work. The developed ASR models can be used in spoken query system to access the real time agricultural commodity price and weather information in Kannada language.
Бесплатно
Credibility Detection on Twitter News Using Machine Learning Approach
Статья научная
Social media presence is a crucial portion of our life. It is considered one of the most important sources of information than traditional sources. Twitter has become one of the prevalent social sites for exchanging viewpoints and feelings. This work proposes a supervised machine learning system for discovering false news. One of the credibility detection problems is finding new features that are most predictive to better performance classifiers. Both features depending on new content, and features based on the user are used. The features' importance is examined, and their impact on the performance. The reasons for choosing the final feature set using the k-best method are explained. Seven supervised machine learning classifiers are used. They are Naïve Bayes (NB), Support vector machine (SVM), K-nearest neighbors (KNN), Logistic Regression (LR), Random Forest (RF), Maximum entropy (ME), and conditional random forest (CRF). Training and testing models were conducted using the Pheme dataset. The feature's analysis is introduced and compared to the features depending on the content, as the decisive factors in determining the validity. Random forest shows the highest performance while using user-based features only and using a mixture of both types of features; features depending on content and the features based on the user, accuracy (82.2 %) in using user-based features only. We achieved the highest results by using both types of features, utilizing random forest classifier accuracy(83.4%). In contrast, logistic regression was the best as to using features that are based on contents. Performance is measured by different measurements accuracy, precision, recall, and F1_score. We compared our feature set with other studies' features and the impact of our new features. We found that our conclusions exhibit high enhancement concerning discovering and verifying the false news regarding the discovery and verification of false news, comparing it to the current results of how it is developed.
Бесплатно
Статья научная
Optical technologies are ubiquitous in telecommunications networks and systems, providing multiple wavelength channels of transport at 2.5 Gbit/sec to 40 Gbit/sec data rates over single fiber optic cables. Market pressures continue to drive the number of wavelength channels per fiber and the data rate per channel. This trend will continue for many years to come as electronic commerce grows and enterprises demand higher and reliable bandwidth over long distances. Electronic commerce, in turn, is driving the growth curves for single processor and multiprocessor performance in data base transaction and Web based servers. Ironically, the insatiable taste for enterprise network bandwidth, which has driven up the volume and pushed down the price of optical components for telecommunications, is simultaneously stressing computer system bandwidth increasing the need for new interconnection schemes and providing for the first time commercial opportunities for optical components in computer systems. The evolution of integrated circuit technology is causing system designs to move towards communication based architectures. We have presented the current tends of high performance system capacity of optical interconnection data transmission link in high performance optical communication and computing systems over wide range of the affecting parameters.
Бесплатно
DIMK-means “Distance-based Initialization Method for K-means Clustering Algorithm”
Статья научная
Partition-based clustering technique is one of several clustering techniques that attempt to directly decompose the dataset into a set of disjoint clusters. K-means algorithm dependence on partition-based clustering technique is popular and widely used and applied to a variety of domains. K-means clustering results are extremely sensitive to the initial centroid; this is one of the major drawbacks of k-means algorithm. Due to such sensitivity; several different initialization approaches were proposed for the K-means algorithm in the last decades. This paper proposes a selection method for initial cluster centroid in K-means clustering instead of the random selection method. Research provides a detailed performance assessment of the proposed initialization method over many datasets with different dimensions, numbers of observations, groups and clustering complexities. Ability to identify the true clusters is the performance evaluation standard in this research. The experimental results show that the proposed initialization method is more effective and converges to more accurate clustering results than those of the random initialization method.
Бесплатно