Key term extraction using a sentence based weighted TF-IDF algorithm

T. Vetriselvi; N. P. Gopalan; G. Kumaresan

doi:10.5815/ijeme.2019.04.02

Scientific articles \ Prolegomena. Fundamentals of knowledge and culture. Propaedeutics \ Computer science and technology. Computing. Data processing \ Software

Key term extraction using a sentence based weighted TF-IDF algorithm

Author: T. Vetriselvi, N. P. Gopalan, G. Kumaresan

Journal: International Journal of Education and Management Engineering @ijeme

Article in issue: 4 vol.9, 2019.

Free access

Keyword ranking with similarity identification is an approach to find the significant Keywords in a corpus using a Variant Term Frequency Inverse Document Frequency (VTF-IDF) algorithm. Some of these may have same similarity and they get reduced to a single term when WordNet is used. The proposed approach that does not require any test or training set, assigns sentence based Weightage to the keywords(terms) and it is found to be effective. Its suitability is analyzed with several data sets using precision and recall as metrics.

Similarity Matrix, Term Count, WordNet

Short address: https://sciup.org/15015809

IDR: 15015809 | DOI: 10.5815/ijeme.2019.04.02

References Key term extraction using a sentence based weighted TF-IDF algorithm

S.Akter, AS.Asa and MP.Uddin, MD Hossain”An extractive text summarization technique for Bengali document (s) using K-means clustering algorithm “on IEEE International Conference Imaging, Vision & Pattern Recognition (icIVPR), pp 1-6 , 2017.
R.Silveira, V.Furtado, and V.Pinheiro “ Ranking Keyphrases from Semantic and Syntactic Features of Textual Terms”, Brazilian Conference on Intelligent Systems (BRACIS), pp 134-139, , 2015
M.Litvak and M.Last “Graph based keyword extraction for single –document summarization” on MMIES '08 Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization pp:17-24,2008.
P.Alireza and K.Mohadesh,”A Probabilistic Relational Model for Keyword Extraction” International Conference on Statistics in Science, Business and Engineering (ICSSBE) ,pp 1-5,2012.
Sneha .S Desai, and Dr.J.A.Laxmonarayana ”WordNet and Semantic Similarity based Approach for Document Clustering”International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), pp 312-317 ,2016
A .Guo, and T .Yang “Research And Improvement Of Feature Words Weight Based On Tfidf Algorithm,” Information Technology, Networking, Electronic and Automation Control Conference, IEEE 2016 ,pp 415-419,2016
C.Clifton, R.Cooley and J.Rennie “Topcat: Data Mining For Topic Identification In A Text Corpus” IEEE Transactions on Knowledge and Data Engineering Vol 16, pp 949-964,Issue: 8, Aug. 2004
L.Suanmali and N.Salim“ Fuzzy Genetic Semantic Based Text Summarization.IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing (DASC), pp 1184-1191,2014
A. Kiani, and MR. Akbarzadeh Automatic Text Summarization Using: Hybrid Fuzzy GA-GP “IEEE International Conference on Fuzzy Systems Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada ,pp 977-983,2006
P.Arora and O.Vikas ” Semantic Searching and Ranking of Documents using Hybrid Learning System and WordNet” (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 3,pp 113-120,2011
L. Lemnitzer and P. Monache” Extraction and evaluation of keywords from Learning Objects – a multilingual approachs” Language Resources and Evaluation Conference LREC, pp 112-120,2008
YA.Jaradat and AT.Al-Taani “Hybrid-based Arabic Single-Document Text Summarization Approach Using Genatic Algorithm “7th International Conference on Information and Communication Systems (ICICS), pp 85-91 ,2016
Porter M.F., “An Algorithm for Suffix Stripping”, MCB UP Ltd Program, Vol. 14, no. 3, pp. 130-137, 1980.
http://www.nltk.org/howto/wordnet.html
http://www.lt4el.eu/review_luxembourg.php