Статьи журнала - International Journal of Intelligent Systems and Applications

Все статьи: 1187

Covering Based Optimistic Multigranular Approximate Rough Equalities and their Properties

Covering Based Optimistic Multigranular Approximate Rough Equalities and their Properties

B.K.Tripathy, S.C.Parida

Статья научная

Since its inception rough set theory has proved itself to be one of the most important models to capture impreciseness in data. However, it was based upon the notion of equivalence relations, which are relatively rare as far as applicability is concerned. So, the basic rough set model has been extended in many directions. One of these extensions is the covering based rough set notion, where a cover is an extension of the concept of partition; a notion which is equivalent to equivalence relation. From the granular computing point of view, all these rough sets are unigranular in character; i.e. they consider only a singular granular structure on the universe. So, there arose the necessity to define multigranular rough sets and as a consequence two types of multigranular rough sets, called the optimistic multigranular rough sets and pessimistic rough sets have been introduced. Four types of covering based optimistic multigranular rough sets have been introduced and their properties are studied. The notion of equality of sets, which is too stringent for real life applications, was extended by Novotny and Pawlak to define rough equalities. This notion was further extended by Tripathy to define three more types of approximate equalities. The covering based optimistic versions of two of these four approximate equalities have been studied by Nagaraju et al recently. In this article, we study the other two cases and provide a comparative analysis.

Бесплатно

Creation and comparison of language and acoustic models using Kaldi for noisy and enhanced speech data

Creation and comparison of language and acoustic models using Kaldi for noisy and enhanced speech data

Thimmaraja Yadava G., H. S. Jayanna

Статья научная

In this work, the Language Models (LMs) and Acoustic Models (AMs) are developed using the speech recognition toolkit Kaldi for noisy and enhanced speech data to build an Automatic Speech Recognition (ASR) system for Kannada language. The speech data used for the development of ASR models is collected under uncontrolled environment from the farmers of different dialect regions of Karnataka state. The collected speech data is preprocessed by proposing a method for noise elimination in the degraded speech data. The proposed method is a combination of Spectral Subtraction with Voice Activity Detection (SS-VAD) and Minimum Mean Square Error-Spectrum Power Estimator (MMSE-SPZC) based on Zero Crossing. The word level transcription and validation of speech data is done by Indic language transliteration tool (IT3 to UTF-8). The Indian Language Speech Label (ILSL12) set is used for the development of Kannada phoneme set and lexicon. The 75% and 25% of transcribed and validated speech data is used for system training and testing respectively. The LMs are generated by using the Kannada language resources and AMs are developed by using Gaussian Mixture Models (GMM) and Subspace Gaussian Mixture Models (SGMM). The proposed method is studied determinedly and used for enhancing the degraded speech data. The Word Error Rates (WERs) of ASR models for noisy and enhanced speech data are highlighted and discussed in this work. The developed ASR models can be used in spoken query system to access the real time agricultural commodity price and weather information in Kannada language.

Бесплатно

Credibility Detection on Twitter News Using Machine Learning Approach

Credibility Detection on Twitter News Using Machine Learning Approach

Marina Azer, Mohamed Taha, Hala H. Zayed, Mahmoud Gadallah

Статья научная

Social media presence is a crucial portion of our life. It is considered one of the most important sources of information than traditional sources. Twitter has become one of the prevalent social sites for exchanging viewpoints and feelings. This work proposes a supervised machine learning system for discovering false news. One of the credibility detection problems is finding new features that are most predictive to better performance classifiers. Both features depending on new content, and features based on the user are used. The features' importance is examined, and their impact on the performance. The reasons for choosing the final feature set using the k-best method are explained. Seven supervised machine learning classifiers are used. They are Naïve Bayes (NB), Support vector machine (SVM), K-nearest neighbors (KNN), Logistic Regression (LR), Random Forest (RF), Maximum entropy (ME), and conditional random forest (CRF). Training and testing models were conducted using the Pheme dataset. The feature's analysis is introduced and compared to the features depending on the content, as the decisive factors in determining the validity. Random forest shows the highest performance while using user-based features only and using a mixture of both types of features; features depending on content and the features based on the user, accuracy (82.2 %) in using user-based features only. We achieved the highest results by using both types of features, utilizing random forest classifier accuracy(83.4%). In contrast, logistic regression was the best as to using features that are based on contents. Performance is measured by different measurements accuracy, precision, recall, and F1_score. We compared our feature set with other studies' features and the impact of our new features. We found that our conclusions exhibit high enhancement concerning discovering and verifying the false news regarding the discovery and verification of false news, comparing it to the current results of how it is developed.

Бесплатно

Current Trends of High capacity Optical Interconnection Data Link in High Performance Optical Communication Systems

Current Trends of High capacity Optical Interconnection Data Link in High Performance Optical Communication Systems

Ahmed Nabih Zaki Rashed

Статья научная

Optical technologies are ubiquitous in telecommunications networks and systems, providing multiple wavelength channels of transport at 2.5 Gbit/sec to 40 Gbit/sec data rates over single fiber optic cables. Market pressures continue to drive the number of wavelength channels per fiber and the data rate per channel. This trend will continue for many years to come as electronic commerce grows and enterprises demand higher and reliable bandwidth over long distances. Electronic commerce, in turn, is driving the growth curves for single processor and multiprocessor performance in data base transaction and Web based servers. Ironically, the insatiable taste for enterprise network bandwidth, which has driven up the volume and pushed down the price of optical components for telecommunications, is simultaneously stressing computer system bandwidth increasing the need for new interconnection schemes and providing for the first time commercial opportunities for optical components in computer systems. The evolution of integrated circuit technology is causing system designs to move towards communication based architectures. We have presented the current tends of high performance system capacity of optical interconnection data transmission link in high performance optical communication and computing systems over wide range of the affecting parameters.

Бесплатно

DIMK-means “Distance-based Initialization Method for K-means Clustering Algorithm”

DIMK-means “Distance-based Initialization Method for K-means Clustering Algorithm”

Raed T. Aldahdooh, Wesam Ashour

Статья научная

Partition-based clustering technique is one of several clustering techniques that attempt to directly decompose the dataset into a set of disjoint clusters. K-means algorithm dependence on partition-based clustering technique is popular and widely used and applied to a variety of domains. K-means clustering results are extremely sensitive to the initial centroid; this is one of the major drawbacks of k-means algorithm. Due to such sensitivity; several different initialization approaches were proposed for the K-means algorithm in the last decades. This paper proposes a selection method for initial cluster centroid in K-means clustering instead of the random selection method. Research provides a detailed performance assessment of the proposed initialization method over many datasets with different dimensions, numbers of observations, groups and clustering complexities. Ability to identify the true clusters is the performance evaluation standard in this research. The experimental results show that the proposed initialization method is more effective and converges to more accurate clustering results than those of the random initialization method.

Бесплатно

Data Analysis for the Aero Derivative Engines Bleed System Failure Identification and Prediction

Data Analysis for the Aero Derivative Engines Bleed System Failure Identification and Prediction

Khalid Salmanov, Hadi Harb

Статья научная

Middle size gas/diesel aero-derivative power generation engines are widely used on various industrial plants in the oil and gas industry. Bleed of Valve (BOV) system failure is one of the failure mechanisms of these engines. The BOV is part of the critical anti-surge system and this kind of failure is almost impossible to identify while the engine is in operation. If the engine operates with BOV system impaired, this leads to the high maintenance cost during overhaul, increased emission rate, fuel consumption and loss in the efficiency. This paper proposes the use of readily available sensor data in a Supervisory Control and Data Acquisition (SCADA) system in combination with a machine learning algorithm for early identification of BOV system failure. Different machine learning algorithms and dimensionality reduction techniques are evaluated on real world engine data. The experimental results show that Bleed of Valve systems failures could be effectively predicted from readily available sensor data.

Бесплатно

Data Clustering Using Wave Atom

Data Clustering Using Wave Atom

Bilal A.Shehada, Mahmoud Z.Alkurdi, Wesam M. Ashour

Статья научная

Clustering of huge spatial databases is an important issue which tries to track the densely regions in the feature space to be used in data mining, knowledge discovery, or efficient information retrieval. Clustering approach should be efficient and can detect clusters of arbitrary shapes because spatial objects cannot be simply abstracted as isolated points they have different boundary, size, volume, and location. In this paper we use discrete wave atom transformation technique in clustering to achieve more accurate result .By using multi-resolution transformation like wavelet and wave atom we can effectively identify arbitrary shape clusters at different degrees of accuracy. Experimental results on very large data sets show the efficiency and effectiveness of the proposed wave atom bases clustering approach compared to other recent clustering methods. Experimental result shows that we get more accurate result and denoised output than others.

Бесплатно

Data Mining of Students’ Performance: Turkish Students as a Case Study

Data Mining of Students’ Performance: Turkish Students as a Case Study

Oyebade K. Oyedotun, Sam Nii Tackie, Ebenezer O. Olaniyi, Adnan Khashman

Статья научная

Artificial neural networks have been used in different fields of artificial intelligence, and more specifically in machine learning. Although, other machine learning options are feasible in most situations, but the ease with which neural networks lend themselves to different problems which include pattern recognition, image compression, classification, computer vision, regression etc. has earned it a remarkable place in the machine learning field. This research exploits neural networks as a data mining tool in predicting the number of times a student repeats a course, considering some attributes relating to the course itself, the teacher, and the particular student. Neural networks were used in this work to map the relationship between some attributes related to students’ course assessment and the number of times a student will possibly repeat a course before he passes. It is the hope that the possibility to predict students’ performance from such complex relationships can help facilitate the fine-tuning of academic systems and policies implemented in learning environments. To validate the power of neural networks in data mining, Turkish students’ performance database has been used; feedforward and radial basis function networks were trained for this task. The performances obtained from these networks were evaluated in consideration of achieved recognition rates and training time.

Бесплатно

Data Quality for AI Tool: Exploratory Data Analysis on IBM API

Data Quality for AI Tool: Exploratory Data Analysis on IBM API

Ankur Jariwala, Aayushi Chaudhari, Chintan Bhatt, Dac-Nhuong Le

Статья научная

A huge amount of data is produced in every domain these days. Thus for applying automation on any dataset, the appropriately trained data plays an important role in achieving efficient and accurate results. According to data researchers, data scientists spare 80% of their time in preparing and organizing the data. To overcome this tedious task, IBM Research has developed a Data Quality for AI tool, which has varieties of metrics that can be applied to different datasets (in .csv format) to identify the quality of data. In this paper, we will be representing how the IBM API toolkit will be useful for different variants of datasets and showcase the results for each metrics in graphical form. This paper might be found useful for the readers to understand the working flow of the IBM data purifier tool, thus we have represented the entire flow of how to use IBM data quality for the AI toolkit in the form of architecture.

Бесплатно

Data Transformation and Predictive Analytics of Cardiovascular Disease Using Machine and Ensemble Learning Techniques

Data Transformation and Predictive Analytics of Cardiovascular Disease Using Machine and Ensemble Learning Techniques

J. Cruz Antony, E. Murali, D. Deepa, R. Vignesh, S. Hemalatha, Umme Fahad

Статья научная

About one person dies every minute from cardiovascular disease; consequently, it has almost surpassed war as the largest cause of death in the twenty-first century. In cardiology, early and accurate diagnosis of heart illness is a cornerstone of effective healthcare. Predictive analytics, which involves machine-learning algorithms, can be a great option for contributing towards the early detection of cardiovascular disease. This study evaluates the data preprocessing techniques involved in building machine learning models to predict cardiovascular disease and identify the features contributing to the cardio attack. A novel data transformation technique named the superlative boundary binning method was proposed to enhance machine learning and ensemble learning classification models for predicting cardiac illness based on independent physiological feature parameters. The results revealed that the ensemble learning classifier AdaBoost using the superlative boundary binning method has performed well with a classification accuracy of 93% when compared with the other data transformation and machine learning classifier models.

Бесплатно

Data Visualization and its Proof by Compactness Criterion of Objects of Classes

Data Visualization and its Proof by Compactness Criterion of Objects of Classes

Saidov Doniyor Yusupovich

Статья научная

In this paper considered the problem of reducing the dimension of the feature space using nonlinear mapping the object description on numerical axis. To reduce the dimensionality of space used by rules agglomerative hierarchical grouping of different - type (nominal and quantitative) features. Groups do not intersect with each other and their number is unknown in advance. The elements of each group are mapped on the numerical axis to form a latent feature. The set of latent features would be sorted by the informativeness in the process of hierarchical grouping. A visual representation of objects obtained by this set or subset is used as a tool for extracting hidden regularities in the databases. The criterion for evaluating the compactness of the class objects is based on analyzing the structure of their connectivity. For the analysis used an algorithm partitioning into disjoint classes the representatives of the group on defining subsets of boundary objects. The execution of algorithm provides uniqueness of the number of groups and their member objects in it. The uniqueness property is used to calculate the compactness measure of the training samples. The value of compactness is measured with dimensionless quantities in the interval of [0, 1]. There is a need to apply of dimensionless quantities for estimating the structure of feature space. Such a need exists at comparing the different metrics, normalization methods and data transformation, selection and removing the noise objects.

Бесплатно

Data-driven Approximation of Cumulative Distribution Function Using Particle Swarm Optimization based Finite Mixtures of Logistic Distribution

Data-driven Approximation of Cumulative Distribution Function Using Particle Swarm Optimization based Finite Mixtures of Logistic Distribution

Rajasekharreddy Poreddy, Gopi E.S.

Статья научная

This paper proposes a data-driven approximation of the Cumulative Distribution Function using the Finite Mixtures of the Cumulative Distribution Function of Logistic distribution. Since it is not possible to solve the logistic mixture model using the Maximum likelihood method, the mixture model is modeled to approximate the empirical cumulative distribution function using the computational intelligence algorithms. The Probability Density Function is obtained by differentiating the estimate of the Cumulative Distribution Function. The proposed technique estimates the Cumulative Distribution Function of different benchmark distributions. Also, the performance of the proposed technique is compared with the state-of-the-art kernel density estimator and the Gaussian Mixture Model. Experimental results on κ−μ distribution show that the proposed technique performs equally well in estimating the probability density function. In contrast, the proposed technique outperforms in estimating the cumulative distribution function. Also, it is evident from the experimental results that the proposed technique outperforms the state-of-the-art Gaussian Mixture model and kernel density estimation techniques with less training data.

Бесплатно

Data-driven Insights for Informed Decision-Making: Applying LSTM Networks for Robust Electricity Forecasting in Libya

Data-driven Insights for Informed Decision-Making: Applying LSTM Networks for Robust Electricity Forecasting in Libya

Asma Agaal, Mansour Essgaer, Hend M. Farkash, Zulaiha Ali Othman

Статья научная

Accurate electricity forecasting is vital for grid stability and effective energy management, particularly in regions like Benghazi, Libya, which face frequent load shedding, generation deficits, and aging infrastructure. This study introduces a data-driven framework to forecast electricity load, generation, and deficits for 2025 using historical data from two distinct years: 2019 (an instability year) and 2023 (a stability year). Various time series models were employed, including Autoregressive Integrated Moving Average (ARIMA), seasonal ARIMA, dynamic regression ARIMA, extreme gradient boosting, simple exponential smoothing, and Long Short-Term Memory (LSTM) neural networks. Data preprocessing steps—such as missing value imputation, outlier smoothing, and logarithmic transformation—are applied to enhance data quality. Model performance was evaluated using metrics such as mean squared error, root mean squared error, mean absolute error, and mean absolute percentage error. LSTM outperformed other models, achieving the lowest mentioned metric values for forecasting load, generation, and deficits, demonstrating its ability to handle non-stationarity, seasonality, and extreme events. The study’s key contribution is the development of an optimized LSTM framework tailored to North Benghazi’s electricity patterns, incorporating a rich dataset and exogenous factors like temperature and humidity. These findings offer actionable insights for energy policymakers and grid operators, enabling proactive resource allocation, demand-side management, and enhanced grid resilience. The research highlights the potential of advanced machine learning techniques to address energy-forecasting challenges in resource-constrained regions, paving the way for a more reliable and sustainable electricity system.

Бесплатно

Decision-Making Using Efficient Confidence-Intervals with Meta-Analysis of Spatial Panel Data for Socioeconomic Development Project-Managers

Decision-Making Using Efficient Confidence-Intervals with Meta-Analysis of Spatial Panel Data for Socioeconomic Development Project-Managers

Ashok Sahai, Clement K. Sankat, Koffka Khan

Статья научная

It is quite common to have access to geospatial (temporal/spatial) panel data generated by a set of similar data for analyses in a meta-data setup. Within this context, researchers often employ pooling methods to evaluate the efficacy of meta-data analysis. One of the simplest techniques used to combine individual-study results is the fixed-effects model, which assumes that a true-effect is equal for all studies. An alternative, and intuitively-more-appealing method, is the random-effects model. A paper was presented by the first author, and his co-authors addressing the efficient estimation problem, using this method in the aforesaid meta-data setup of the ‘Geospatial Data’ at hand, in Map World Forum meeting in 2007 at Hyderabad; INDIA. The purpose of this paper had been to address the estimation problem of the fixed-effects model and to present a simulation study of an efficient confidence-interval estimation of a mean true-effect using the panel-data and a random-effects model, too in order to establish appropriate ‘confidence interval’ estimation for being readily usable in a decision-makers’ setup. The present paper continues the same perspective, and proposes a much more efficient estimation strategy furthering the gainful use of the ‘Geospatial Panel-Data’ in the Global/Continental/ Regional/National contexts of “Socioeconomic & other Developmental Issues’. The ‘Statistical Efficient Confidence Interval Estimation Theme’ of the paper(s) has a wider ambit than its applicability in the context of ‘Socioeconomic Development’ only. This ‘Statistical Theme’ is, as such, equally gainfully applicable to any area of application in the present world-order at large inasmuch as the “Data-Mapping” in any context, for example, the issues in the topically significant area of “Global Environmental Pollution-Mitigation for Arresting the Critical phenomenon of Global Warming”. Such similar issues are tackle-able more readily, as the impactful advances in the “GIS & GPS” technologies have led to the concept of “Managing Global Village” in terms of ‘Geospatial Meta-Data’. This last fact has been seminal to special zeal-n-motivation to the authors to have worked for this improved paper containing rather a much more efficient strategy of confidence-interval estimation for decision-making team of managers for any impugned area of application.

Бесплатно

Deep Hybrid System of Computational Intelligence with Architecture Adaptation for Medical Fuzzy Diagnostics

Deep Hybrid System of Computational Intelligence with Architecture Adaptation for Medical Fuzzy Diagnostics

Iryna Perova, Iryna Pliss

Статья научная

In the paper the deep hybrid system of computational intelligence with architecture adaptation for medical fuzzy diagnostics is proposed. This system allows to increase a quality of medical information processing under the condition of overlapping classes due to special adaptive architecture and training algorithms. The deep hybrid system under consideration can tune its architecture in situation when number of features and diagnoses can be variable. The special algorithms for its training are developed and optimized for situation of different system architectures without retraining of synaptic weights that have been tuned at previous steps. The proposed system was used for processing of three medical data sets (dermatology dataset, Pima Indians diabetes dataset and Parkinson disease dataset) under the condition of fixed number of features and diagnoses and in situation of its increasing. A number of conducted experiments have shown high quality of medical diagnostic process and confirmed the efficiency of the deep hybrid system of computational intelligence with architecture adaptation for medical fuzzy diagnostics.

Бесплатно

Deep Learning Based Traffic Management in Knowledge Defined Network

Deep Learning Based Traffic Management in Knowledge Defined Network

Tejas M. Modi, Kuna Venkateswararao, Pravati Swain

Статья научная

In recent Artificial Intelligence developments, large datasets as knowledge are a prime requirement for analysis and prediction. To manage the knowledge of the network, the Data Center Network (DCN) has been considered a global data storage facility on edge servers and cloud servers. In recent research trends, knowledge-defined networking (KDN) architecture is considered, where the management plane works as the knowledge plane. The major network management task in the DCN is to control traffic congestion. To improve network management, i.e., optimized resource management, enhanced Quality of Service (QoS), we propose a path prediction technique by combining the convolution layer with the RNN deep learning model, i.e., Convolution-Long short-term memory network as Convolution-LSTM and the bi-directional long short-term memory (BiLSTM) network as Convolution-BiLSTM. The experimental results demonstrate that, in terms of many metrics, i.e., network latency, packet loss ratio, network throughput, and overhead, our proposed methodologies perform better than the existing works, i.e., OSPF, FlowDCN, modified discrete PSO, ANN, CNN, and LSTM-based routing approaches. The proposed approach improves the network throughput by approximately 30% and 12% as compared to existing CNN and LSTM-based routing approaches, respectively.

Бесплатно

Журнал