Feature Selection for Modeling Intrusion Detection

Автор: Virendra Barot, Sameer Singh Chauhan, Bhavesh Patel

Журнал: International Journal of Computer Network and Information Security(IJCNIS) @ijcnis

Статья в выпуске: 7 vol.6, 2014 года.

Бесплатный доступ

Feature selection is always beneficial to the field like Intrusion Detection, where vast amount of features extracted from network traffic needs to be analysed. All features extracted are not informative and some of them are redundant also. We investigated the performance of three feature selection algorithms Chi-square, Information Gain based and Correlation based with Naive Bayes (NB) and Decision Table Majority Classifier. Empirical results show that significant feature selection can help to design an IDS that is lightweight, efficient and effective for real world detection systems.

Feature selection, network intrusion detection system, decision table majority, naive Bayesian classification

Короткий адрес: https://sciup.org/15011323

IDR: 15011323

Текст научной статьи Feature Selection for Modeling Intrusion Detection

Published Online June 2014 in MECS

With the wide and quick development of network technology, in the field of social networking, e-business, e-learning and online shopping, Security is a big issue for all networks in today’s enterprise environment. Hackers and intruders have made many successful attempts to bring down high-profile company networks and web services. Many methods have been developed to secure the network infrastructure and communication over Internet. Some of them are the use of firewalls, encryption, and virtual private networks. Intrusion detection is a relatively new addition to such techniques. Intrusion detection methods with machine intelligence started appearing in the last few years. Using intrusion detection methods, you can collect and use information from known types of attacks and find out if someone is trying to attack your network or particular hosts.

An Intrusion Detection System (IDS) is the device (or application) that monitors network/system activities and the analyzing of data for potential vulnerabilities and attacks in progress; it also raises alarm or produces report [1]. Different sources of information and events based on information are gathered to decide whether intrusion has taken place. This information is gathered at various levels like system, host, application, etc [2]. Based on analysis of this data, we can detect the intrusion based on two common practices – Misuse detection and Anomaly detection.

Misuse detection IDS models function in very much the same sense as high-end computer anti-virus applications. That is, misuse detection IDS models analyze the system or network environment and compare the activity against signatures (or patterns) of known intrusive computer and network behavior [3].

Anomaly detection takes the normal observation model and uses statistical variance [4] or expert systems to determine if the system or network environment behavior is running normally or abnormally.

The paper is organized as follows. Section 2 in our paper gives brief idea of the work done in this field i.e. intrusion detection using data mining and feature selection for it. In Section 3, we give brief detail of probability based Naïve Bayesian model, Decision Table and various attribute selection scheme used in the experiments. In Section 4, we discuss the dataset used and experiment results in detail. Finally, we concluded the whole work in Section 5.

  • II.    Related Work

IDS have become important and widely used for ensuring network security. Since the amount of audit data that an IDS needs to examine is very large even for a small network, analysis is difficult even with computer assistance because extraneous features can make it harder to detect suspicious behavior patterns [5][9].

Data mining approaches can be used to extract features and compute detection model from the vast amount of audit data. The features computed from the data can be more objective than the ones handpicked by experts. The inductively learned detection model can be more generalized than hand-coded rules (that is they can have better performance against new variants of known normal behavior or intrusions). Therefore data mining approaches can play an important role in process of developing Intrusion Detection Systems. Complex relationships exist between the features and IDS must therefore reduce the amount of data to be processed. This is very important if real-time detection is desired. Reduction can occur by data filtering, data clustering and feature selection. In complex classification domains, features may contain false correlations, which hinder the process of detecting intrusions. Extra features can increase computation time, and can have an impact on the accuracy of IDS.

Feature selection improves classification by searching for the subset of features, which best classifies the training data. In the literature a number of work could be cited wherein several machine learning paradigms, fuzzy inference systems and expert systems, were used to develop IDS [5][6].

Список литературы Feature Selection for Modeling Intrusion Detection

  • Aleksanda Lazarevic, L. Ertoz, Aysel Ozgur, Jaideep Srivastava and Vipin Kumar, "A Comparative Study of Anomaly Detection Schemes in the Network Intrusion Detection", in Proceedings of Society for Industrial and Applied Mathematics, (SIAM) Conference on Data Mining, 2003.
  • Joseph Derrick, Richard W. Tibbs, Larry Lee Reynolds,"Investigating new approaches to data collection, management and analysis for nework intrusion detection", Proceeding of the 45th annual south east regional conference, DOI = http://dl.acm.org/citation.cfm?doid = 1233341.1233392, 2007.
  • Wenke Lee, Salvotore J. Stolfo and Kui W. Mok, "A Data Mining Framework for Building Intrusion Detection Model, Security and Privacy", Proceedings of the 1999 IEEE Symposium, pages 120-132, 1999.
  • E. Eskin, A. Arnold, M. Preau, L.Portnoy, and S. Stolfo, "A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data".
  • Applications of Data Mining in Computer Society, Kluwer Academic Publishers, 2002.
  • Lee W., Stolfo S. and Mok K., "A Data Mining framework for Building Intrusion Detection Models", In Proceedings of the IEEE Symposium on Security and Privacy, 1999.
  • Luo J. and Bridges S. M., "Mining Fuzzy Association Rules and Fuzzy Frequency Episodes for Intrusion Detection," International Journal of Intelligent Systems, (IJIS), John Wiley & Sons,Vol. 15, No. 8, pp. 687-704, 2000.
  • B. A. Nahla, B. Salem, and E. Zied, "Naive bayes vs decision trees in intrusion detection systems", In Proceeding of the ACM Symposium on Applied Computing, Nicosia, Cyprus, 2004.
  • A. H. Sung, S. Mukkamala, "Identifying Important Features for Intrusion Detection Using Support Vector Machines and Neural Networks", Symposium on Applications and the Internet, 2003.
  • Mukkamala S., Sung A.H. and Abraham A., "Intrusion Detection Using Ensemble of Soft Computing Paradigms", Third International Conference on Intelligent Systems Design and Applications, Springer Verlag Germany, pp. 239-248, 2003.
  • Hongjie Liu, Boqin feng, jianjie weng, "An Effective Data Classification Algorithm Based on the Decision Table", Seventh IEEE Association for Computer and Information Science(ACIS) International Conference on Computer and Information Science, 2008.
  • Jashan Koshal, Monark Bag, "Cascading of C4.5 Decision Tree and Support Vector Machine for Rule Based Intrusion Detection System", in International Journal of Computer Network and Information Security (IJCNS), Vol. 4, pp 8-20, August 2012.
  • Ron Kohavi, "The power of decision Tables", in 8th European conference on Machine learning, pp.174-189, 1995.
  • Y. Yang and J. Pedersen, "A comparative study on feature selection in text categorization", pp. 412–420, ICML, 1997.
  • H. Liu and, R. Setiono.Chi2 , "Feature selection and discritization of numeric attributes, Proc. IEEE 7th International Conference on Tools with Artificial Intelligence, pp. 338-391, 1995.
  • M. A. Hall, L. A. Smith, "Feature selection for machine learning: comparing a correlation-based filter approach to the wrapper", in Proceedings of Florida Artificial Intelligence Research Symposium, Orlando, FL, 1999, pp. 235–239.
  • R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. 2nd edition, 2004.
  • KDD (1999). Available at http:// kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.
  • http://weka.wikispaces.com/Eclipse/Eclipse+3.4.x+(weka-src.jar).
  • http://www.cs.waikato.ac.nz/ml/weka/docummentation.html.
Еще
Статья научная