Статьи журнала - International Journal of Information Technology and Computer Science
Все статьи: 1195
A data analysis of the academic use of social media
Статья научная
The use of Facebook, in higher education, has become common place presumably due to a general belief that the platform can promote information flows between students and with staff as well as increasing a sense of community engagement. This study sets out to examine the academic use of Facebook groups using data analysis in order to determine if there are educational benefits and if Facebook group based learning strategies can be evaluated quickly and relatively easily. The data analysis involved utilising Social Network Analysis (SNA) in examining two Facebook groups; one under-graduate ‘course’ based group with 135 members and one under-graduate first year ‘module’ based group with 123 members. The SNA metrics included degree centrality, betweeness centrality, clustering coefficient and eigenvector centrality. The study also involved conducting a survey and interviews drawn from users of the Facebook groups to validate the utility of the SNA metrics. Results from the validation phase of the data analysis suggested that degree centrality is a useful guide to positive attitudes towards information flows, whilst betweenness centrality is useful for detecting a sense of academic community. The validation outcomes also suggest that high clustering coefficient scores were associated with a lower perception of academic community. The analysis of the data sets also found that the ‘course’ based group had higher scores for degree centrality and betweenness. This suggests that the ‘course’ based group provided a better experience of information access and a sense of academic community. Follow up interviews with respondents suggested that the ‘course’ based Facebook group may have had higher scores because it included more real world acquaintances than the ‘module’ based group.
Бесплатно
A domain specific key phrase extraction framework for email corpuses
Статья научная
With the growth in the communication over Internet via short messages, messaging services and chat, still emails are the most preferred communication method. Thousands of emails are been communicated everyday over different service providers. The emails being the most effective communication methods can also attract a lot of spam or irrelevant information. The spam emails are annoying and consumes a lot of time for filtering. Regardless to mention, the spam emails also consumes the main allocated inbox space and at the same time causes huge network traffic. The filtration methods are miles away from perfection as most of these filters depends on the standard rules, thus making the valid emails marked as spam. The first step of any email filtration should be extracting the key phrases from the emails and based on the key phrases or mostly used phrases the filters should be activated. A number of parallel researches have demonstrated the key phrase extraction policies. Nonetheless, the methods are truly focused on domain specific corpuses and have not addressed the email corpuses. Thus this work demonstrates the key phrases extraction process specifically for the email corpuses. The extracted key phrases demonstrate the frequency of the words used in that email. This analysis can make the further analysis easier in terms of sentiment analysis or spam detection. Also, this analysis can cater to the need for text summarization. The proposed component based framework demonstrates a nearly 95% accuracy.
Бесплатно
A failure detector for crash recovery systems in cloud
Статья научная
Cloud computing has offered remarkable scalability and elasticity to distributed computing paradigm. It provides implicit fault tolerance through virtual machine (VM) migration. However, VM migration needs heavy replication and incurs storage overhead as well as loss of computation. In early cloud infrastructure, these factors were ignorable due to light load conditions; however, nowadays due to exploding task population, they trigger considerable performance degradation in cloud. Therefore, fault detection and recovery is gaining attention in cloud research community. The Failure Detectors (FDs) are modules employed at the nodes to perform fault detection. The paper proposes a failure detector to handle crash recoverable nodes and the system recovery is performed by a designated checkpoint in the event of failure. We use Machine Repairman model to estimate the recovery latency. The simulation experiments have been carried out using CloudSim plus.
Бесплатно
A hybrid dimensionality reduction model for classification of microarray dataset
Статья научная
In this paper, a combination of dimensionality reduction technique, to address the problems of highly correlated data and selection of significant variables out of set of features, by assessing important and significant dimensionality reduction techniques contributing to efficient classification of genes is proposed. One-Way-ANOVA is employed for feature selection to obtain an optimal number of genes, Principal Component Analysis (PCA) as well as Partial Least Squares (PLS) are employed as feature extraction methods separately, to reduce the selected features from microarray dataset. An experimental result on colon cancer dataset uses Support Vector Machine (SVM) as a classification method. Combining feature selection and feature extraction into a generalized model, a robust and efficient dimensional space is obtained. In this approach, redundant and irrelevant features are removed at each step; classification presents an efficient performance of accuracy of about 98% over the state of art.
Бесплатно
A hybrid technique for cleaning missing and misspelling Arabic data in data warehouse
Статья научная
Real-World datasets accumulated over a number of years tend to be incomplete, inconsistent and contain noisy data, this, in turn, will cause an inconsistency of data warehouses. Data owners are having hundred-millions to billions of records written in different languages, hence continuously increases the need for comprehensive, efficient techniques to maintain data consistency and increase its quality. It is known that the data cleaning is a very complex and difficult task, especially for the data written in Arabic as a complex language, where various types of unclean data can occur to the contents. For example, missing values, dummy values, redundant, inconsistent values, misspelling, and noisy data. The ultimate goal of this paper is to improve the data quality by cleaning the contents of Arabic datasets from various types of errors, to produce data for better analysis and highly accurate results. This, in turn, leads to discover correct patterns of knowledge and get an accurate Decision-Making. This approach established based on the merging of different algorithms. It ensures that reliable methods are used for data cleansing. This approach cleans the Arabic datasets based on the multi-level cleaning using Arabic Misspelling Detection, Correction Model (AMDCM), and Decision Tree Induction (DTI). This approach can solve the problems of Arabic language misspelling, cryptic values, dummy values, and unification of naming styles. A sample of data before and after cleaning errors presented.
Бесплатно
A new Measuring Method of Flux Linkage of SRM
Статья научная
This paper presents a indirect method of measuring flux characteristics based on DSP. By measuring current and voltage on a phase winding circuit and transfer them to PC by communication program, combined with the Simpson’s rule, the magnetization characteristics are obtained. The experimental instruments needed in this method are common, and the test platform is easy to be built, thus the expense is lowered in measurement of flux. The test indicates the measuring process is simple to implement, and the experimental results are accurate.
Бесплатно
Статья научная
Recommender Systems (RSs) are essential tools of an e-commerce portal in making intelligent decisions for an individual to obtain product recommendations. Neighborhood-based approaches are traditional techniques for collaborative recommendations and are very popular due to their simplicity and efficiency. Neighborhood-based recommender systems use numerous kinds of similarity measures for finding similar users or items. However, the existing similarity measures function only on common ratings between a pair of users (i.e. ignore the uncommon ratings) thus do not utilize all ratings made by a pair of users. Furthermore, the existing similarity measures may either provide inadequate results in many situations that frequently occur in sparse data or involve very complex calculations. Therefore, there is a compelling need to define a similarity measure that can deal with such issues. This research proposes a new similarity measure for defining the similarities between users or items by using the rating data available in the user-item matrix. Firstly, we describe a way for applying the simple matching coefficient (SMC) to the common ratings between users or items. Secondly, the structural information between the rating vectors is exploited using the Jaccard index. Finally, these two factors are leveraged to define the proposed similarity measure for better recommendation accuracy. For evaluating the effectiveness of the proposed method, several experiments have been performed using standardized benchmark datasets (MovieLens-1M, 10M, and 20M). Results obtained demonstrate that the proposed method provides better predictive accuracy (in terms of MAE and RMSE) along with improved classification accuracy (in terms of precision-recall).
Бесплатно
Статья научная
Data publishing plays a major role to establish a path between current world scenarios and next generation requirements and it is desirable to keep the individuals privacy on the released content without reducing the utility rate. Existing KC and KCi models concentrate on multiple categorical sensitive attributes. Both these models have their own merits and demerits. This paper proposes a new method named as novel KCi - slice model, to enhance the existing KCi approach with better utility levels and required privacy levels. The proposed model uses two rounds to publish the data. Anatomization approach is used to separate the sensitive attributes and quasi attributes. The first round uses a novel approach called as enhanced semantic l-diversity technique to bucketize the tuples and also determine the correlation of the sensitive attributes to build different sensitive tables. The second round generates multiple quasi tables by performing slicing operation on concatenated correlated quasi attributes. It concatenate the attributes of the quasi tables with the ID's of the buckets from the different sensitive tables and perform random permutations on the buckets of quasi tables. Proposed model publishes the data with more privacy and high utility levels when compared to the existing models.
Бесплатно
A novel interactive communication system realization through smart low noise block downconverter
Статья научная
An interactive communication is the basic motivation behind a smart communication system, which requires simultaneous downlink and uplink feature. Smart LNB is a popular discussion which is leading towards Know Your DTH (KY-DTH). A low noise block-downconverter (LNB) is the signal receiving device used for satellite TV reception mounted on the satellite dishes. For broadcasters, this smart LNB opens the door to operate their own linear TV ecosystem and other services connected directly by satellite. This new generation Smart LNB comprises of both transmitter and receiver to provide interactive TV experiences and M2M services, unlike LNB. Having uplink and downlink capability, it enables full duplex communication leading various additional applications like live interactions; live viewing; TV servicing for 24 hours; solutions for remote monitoring; control in mission critical applications in the energy and utility sectors; natural gas monitoring; Smart grid; etc. DVB-S2 source and sink are analyzed using Agilent SystemVue platform. This paper describes the study and design of a smart low noise block downconverter (LNB) used for satellite communication, transmission in Ka band (29.5 to 30 GHz) and reception in Ku band (10.7 to 12.75 GHz). The LNB design is compromised importance characteristics like Spectrum comparison. The proposed design will result in enhancement of working lifetime of the Smart LNB system with capability to receive all signals within the range. The designed and simulated process were done using Agilent SystemVue. A summary of simulation work and result over the Smart LNB in Ka and Ku band is illustrated.
Бесплатно
Статья научная
Muscles in a human body consists of a pair. Musculoskeletal imbalances caused by repetitive usage of one part of the muscle in this pair and incorrect posture a human body takes on regular basis lead to severe injuries in terms of neuro musculoskeletal problems, hamstring strains, lower back tightness, repetitive stress injuries, altered movement patterns, postural dysfunctions, trapped nerves and etc. and both neurological and physical performances are severely affected when time progresses. In clinical domain, muscle imbalances are determined by gait and posture analysis, Movement analysis, Joint range of motion analysis and muscle length analysis which require expertise knowledge and experience. X-Rays and CT scans in the medical domain also require domain experts to interpret the results of a checkup. Kinect is a motion capturing device which is able to track human skeleton, its joints and body movements within its sensory range. The purpose of this research is to provide a mechanism to identify muscle imbalances based on gait analysis tracked via Kinect motion capture device by differentiate the deviation of healthy person’s gait patterns. Primarily, the outcome of this study will be a self-identification method of human skeletal imbalance.
Бесплатно
A parallel evolutionary search for shortest vector problem
Статья научная
The hardness assumption of approximate shortest vector problem (SVP) within the polynomial factor in polynomial time reduced to the security of many lattice-based cryptographic primitives, so solving this problem, breaks these primitives. In this paper, we investigate the suitability of combining the best techniques in general search/optimization, lattice theory and parallelization technologies for solving the SVP into a single algorithm. Our proposed algorithm repeats three steps in a loop: an evolutionary search (a parallelized Genetic Algorithm), brute-force of tiny full enumeration (in role of too much local searches with random start points over the lattice vectors) and a single main enumeration. The test results showed that our proposed algorithm is better than LLL reduction and may be worse than the BKZ variants (except some so small block sizes). The main drawback for these test results is the not-sufficient tuning of various parameters for showing the potential strength of our contribution. Therefore, we count the entire main problems and weaknesses in our work for clearer and better results in further studies. Also it is proposed a pure model of Genetic Algorithm with more solid/stable design for SVP problem which can be inspired by future works.
Бесплатно
A reliable solution to load balancing with trust based authentication enhanced by virtual machines
Статья научная
Vehicular Ad hoc network is the most fast growing which shape fresh engineering opportunities like controlling traffic smartly, optimal resource maintenance and improved service for customers. Vehicular Ad hoc Network (VANET) is one of the most popular ad hoc networks. A vehicular ad hoc network generally faces the problems like trust modeling, congestion, and battery optimization issues. If the nodes are comparatively less than it can handle the traffic well when it comes to transferring the data at a rapid rate. But, when it comes to high-density traffic than a Vehicular network always faces congestion problem. This paper tried to find the reliable solution to the traffic management by adding up the virtual gears into the network and optimizes the congestion problem by using a trust queue which is updated with the broadcast concept of the hello packets in order to remove the unwanted nodes in the list. The network performance has been measured with QOS Parameters like delay, throughput, and other parameters to prove the authentication of the research.
Бесплатно
A robust functional minimization technique to protect image details from disturbances
Статья научная
Image capturing using faulty systems or environmental vulnerabilities always degrade the image quality and causes the distortion of true details from the original imaging signals. Thus a robust way of image enhancement and edge preservation is an enormously requirement for smooth imaging operations. Although, many techniques have been deployed in this area during the decades for its betterment. However, the key challenges are remain towards better tradeoff between image enhancement and details protection. Therefore, this study inspects the existing limitations and proposes a robust technique based on functional minimization scheme in variational framework for ensuring better performance in case of image enhancement and details preservation simultaneously. A vigorous way to solve the minimization problem is also develop to make sure the efficiency of the proposed technique than some other traditional techniques.
Бесплатно
A stochastic model for simple document processing
Статья научная
This work focuses on the stationary behavior of a simple document processing system. We mean by simple document, any document whose processing, at each stage of its progression in its graph of processing, is assured by a single person. Our simple document processing system derives from the general model described by MOUKELI and NEMBE. It is about an adaptation of the said general model to determine in terms of metrics and performance, its behavior in the particular case of simple document processing. By way of illustration, data relating to a station of a central administration of a ministry, observed over six (6) years, were presented. The need to study this specific case comes from the fact that the processing of simple documents is based on a hierarchical organization and the use of priority queues. As in the general model proposed by MOUKELI and NEMBE, our model has a static component and a dynamic component. The static component is a tree that represents the hierarchical organization of the processing stations. The dynamic component consists of a Markov process and a network of priority queues which model all waiting lines at each processing unit. Key performance indicators were defined and studied point by point and on average. As well as issues specific to the hierarchical model associated with priority queues have been analyzed and solutions proposed; it is mainly infinite loops.
Бесплатно
A study and performance comparison of MapReduce and apache spark on twitter data on Hadoop cluster
Статья научная
We explore Apache Spark, the newest tool to analyze big data, which lets programmers perform in-memory computation on large data sets in a fault tolerant manner. MapReduce is a high-performance distributed BigData programming framework which is highly preferred by most big data analysts and is out there for a long time with a very good documentation. The purpose of this project was to compare the scalability of open-source distributed data management systems like Apache Hadoop for small and medium data sets and to compare it’s performance against the Apache Spark, which is a scalable distributed in-memory data processing engine. To do this comparison some experiments were executed on data sets of size ranging from 5GB to 43GB, on both single machine and on a Hadoop cluster. The results show that the cluster outperforms the computation of a single machine by a huge range. Apache Spark outperforms MapReduce by a dramatic margin, and as the data grows Spark becomes more reliable and fault tolerant. We also got an interesting result that, with the increase of the number of blocks on the Hadoop Distributed File System, also increases the run-time of both the MapReduce and Spark programs and even in this case, Spark performs far more better than MapReduce. This demonstrates Spark as a possible replacement of MapReduce in the near future.
Бесплатно
A study on diagnosis of Parkinson’s disease from voice dysphonias
Статья научная
Parkinson disease that occurs at older ages is a neurological disorder that is one of the most painful, dangerous and non-curable diseases. One symptom that a person may have Parkinson’s disease is trouble that can occur in the voice of a person which is so-called dysphonia. In this study, an application based on assessing the importance of features was carried out by using multiple types of sound recordings dataset for diagnosis of Parkinson disease from voice disorders. The sub-datasets, which were obtained from these records and were divided into 70-30% training and testing data respectively, include the important features. According to the experimental results, the Random Forest and Logistic Regression algorithms were found successful in general. Besides, one or two of these algorithms were found to be more successful for each sound. For example, the Logistic Regression algorithm is more successful for the ‘a’ voice. The Artificial Neural Networks algorithm is more successful for the ‘o’ voice.
Бесплатно
A study on the diagnosis of parkinson’s disease using digitized wacom graphics tablet dataset
Статья научная
Parkinson Disease is a neurological disorder, which is one of the most painful, dangerous and non-curable diseases, which occurs at older ages. The Static Spiral Test, Dynamic Spiral Test and Stability Test on Certain Point records were used in the application which was developed for the diagnosis of this disease. These datasets were divided into 80-20% training and testing data respectively within the framework of 10-fold cross validation technique. Training data as the input data were sent to the Random Forest, Logistic Regression and Artificial Neural Networks classifier algorithms. After this step, performances of these classifier algorithms were evaluated on testing data. Also, new data analysis was carried out. According to the results obtained, Artificial Neural Networks is more successful than Random Forest and Logistic Regression algorithms in analysis of new data.
Бесплатно
A systematic study of data wrangling
Статья научная
The paper presents the theory, design, usage aspects of data wrangling process used in data ware housing and business intelligence. Data wrangling is defined as an art of data transformation or data preparation. It is a method adapted for basic data management which is to be properly processed, shaped, and is made available for most convenient consumption of data by the potential future users. A large historical data is either aggregated or stored as facts or dimensions in data warehouses to accommodate large adhoc queries. Data wrangling enables fast processing of business queries with right solutions to both analysts and end users. The wrangler provides interactive language and recommends predictive transformation scripts. This helps the user to have an insight of reduction of manual iterative processes. Decision support systems are the best examples here. The methodologies associated in preparing data for mining insights are highly influenced by the impact of big data concepts in the data source layer to self-service analytics and visualization tools.
Бесплатно
A task scheduling model for multi-CPU and multi-hard disk drive in soft real-time systems
Статья научная
In recent years, by increasing CPU and I/O devices demands, running multiple tasks simultaneously becomes a crucial issue. This paper presents a new task scheduling algorithm for multi-CPU and multi-Hard Disk Drive (HDD) in soft Real-Time (RT) systems, which reduces the number of missed tasks. The aim of this paper is to execute more parallel tasks by considering an efficient trade-off between energy consumption and total execution time. For study purposes, we analyzed the proposed scheduling algorithm, named HCS (Hard disk drive and CPU Scheduling) in terms of the task set utilization, the total execution time, the average waiting time and the number of missed tasks from their deadlines. The results show that HCS algorithm improves the above mentioned criteria compared to the HCS_UE (Hard disk drive and CPU Scheduling _Unchanged Execution time) algorithm.
Бесплатно
ADPBC: Arabic Dependency Parsing Based Corpora for Information Extraction
Статья научная
There is a massive amount of different information and data in the World Wide Web, and the number of Arabic users and contents is widely increasing. Information extraction is an essential issue to access and sort the data on the web. In this regard, information extraction becomes a challenge, especially for languages, which have a complex morphology like Arabic. Consequently, the trend today is to build a new corpus that makes the information extraction easier and more precise. This paper presents Arabic linguistically analyzed corpus, including dependency relation. The collected data includes five fields; they are a sport, religious, weather, news and biomedical. The output is CoNLL universal lattice file format (CoNLL-UL). The corpus contains an index for the sentences and their linguistic meta-data to enable quick mining and search across the corpus. This corpus has seventeenth morphological annotations and eight features based on the identification of the textual structures help to recognize and understand the grammatical characteristics of the text and perform the dependency relation. The parsing and dependency process conducted by the universal dependency model and corrected manually. The results illustrated the enhancement in the dependency relation corpus. The designed Arabic corpus helps to quickly get linguistic annotations for a text and make the information Extraction techniques easy and clear to learn. The gotten results illustrated the average enhancement in the dependency relation corpus.
Бесплатно