Dynamic Editing Distance-based Extracting Relevant Information Approach from Social Networks
Автор: Mohamed Nazih Omri, Fethi Fkih
Журнал: International Journal of Computer Network and Information Security @ijcnis
Статья в выпуске: 6 vol.14, 2022 года.
Бесплатный доступ
Online social networks, such as Facebook, Twitter, LinkedIn, etc., have grown exponentially in recent times with a large amount of information. These social networks have huge volumes of data especially in structured, textual, and unstructured forms which have often led to cyber-crimes like cyber terrorism, cyber bullying, etc., and extracting information from these data has now become a serious challenge in order to ensure the data safety. In this work, we propose a new, supervised approach for Information Extraction (IE) from Web resources based on remote dynamic editing, called EIDED. Our approach is part of the family of IE approaches based on masks extraction and is articulated around three algorithms: (i) a labeling algorithm, (ii) a learning and inference algorithm, and (iii) an extended edit distance algorithm. Our proposed approach is able to work even in the presence of anomalies in the tuples such as missing attributes, multivalued attributes, permutation of attributes, and in the structure of web pages. The experimental study, which we conducted, on a standard database of web pages, shows the performance of our EIDED approach compared to approaches based on the classic edit distance, and this with respect to the standard metrics recall coefficient, precision, and F1-measure.
Information Extraction, Mask Induction, Inductive Learning, Edit Distance, Alignment, Edit Operations
Короткий адрес: https://sciup.org/15018551
IDR: 15018551 | DOI: 10.5815/ijcnis.2022.06.01
Список литературы Dynamic Editing Distance-based Extracting Relevant Information Approach from Social Networks
- Asma Omri, Mohamed Nazih Omri, "Towards an Efficient Big Data Indexing Approach under an Uncertain Environment", International Journal of Intelligent Systems and Applications(IJISA), Vol.14, No.2, pp.1-13, 2022. DOI:10.5815/ijisa.2022.02.01
- R.Umagandhi, A.V. Senthil Kumar,"Evaluation of Reranked Recommended Queries in Web Information Retrieval using NDCG and CV", International Journal of Information Technology and Computer Science(IJITCS), vol.7, no.8, pp.23-30, 2015. DOI:10.5815/ijitcs.2015.08.04
- Mohamed Nazih Omri. Possibilistic Pertinence Feedback and Semantic Networks for Goal's Extraction. Asian Journal of Information Technology (AJIT) 3 (4), 258-265. 2004.
- Mohamed Nazih Omri. Relevance Feedback for Goal's Extraction from Fuzzy Semantic Networks. Asian Journal of Information Technology (AJIT). 3 (6), 434-440. 2004.
- Jalel Eddine Hajlaoui, Mohamed Nazih Omri, Djamal Benslimane. A QoS-aware approach for discovering and selecting configurable IaaS Cloud services. Computer Systems Science and Engineering 32 (4). 2017.
- Ranjan, R., Vathsala, H. & Koolagudi, S.G. Profile generation from web sources: an information extraction system. Soc. Netw. Anal. Min. 12, 2. 2022. https://doi.org/10.1007/s13278-021-00827-y
- Nicholas Kushmerick. Wrapper Induction for Information Extraction. PhD thesis, University of Washington, 1997.
- Nicholas Kushmerick, Daniel S. Weld and Robert Doorenbos. Wrapper Induction for Information Extraction. Proceedings of the Fifteenth International Conference on Artificial Intelligence (IJCAI), pp. 729-735, 1997.
- Kejriwal, M. Information Extraction. In: Domain-Specific Knowledge Graph Construction. SpringerBriefs in Computer Science. Springer, Cham. 2019. https://doi.org/10.1007/978-3-030-12375-8_2.
- Chia-Hui Chang, M. Kayed, M. R. Girgis and K. F. Shaalan, "A Survey of Web Information Extraction Systems," in IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 10, pp. 1411-1428, Oct. 2006, doi: 10.1109/TKDE.2006.152.
- Yu Guo, Zhengyi Ma, Jiaxin Mao, Hongjin Qian, Xinyu Zhang, Hao Jiang, Zhao Cao, and Zhicheng Dou. 2022. Webformer: Pre-training with Web Pages for Information Retrieval. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22). Association for Computing Machinery, New York, NY, USA, 1502–1512. https://doi.org/10.1145/3477495.3532086.
- Patricia Jiménez, Rafael Corchuelo, On validating web information extraction proposals, Expert Systems with Applications, Volume 199, 2022, 116700, ISSN 0957-4174, https://doi.org/10.1016/j.eswa.2022.116700.
- Leila Helali, Mohamed Nazih Omri, "Heuristic-based Approach for Dynamic Consolidation of Software Licenses in Cloud Data Centers", International Journal of Intelligent Systems and Applications(IJISA), Vol.13, No.6, pp.1-12, 2021. DOI:10.5815/ijisa.2021.06.01
- Mohamed Nazih Omri, Wafa Mribah, "Towards an Intelligent Machine Learning-based Business Approach", International Journal of Intelligent Systems and Applications(IJISA), Vol.14, No.1, pp.1-23, 2022. DOI:10.5815/ijisa.2022.01.01
- Sudhir Kumar Patnaik and C. Narendra Babu. 2022. A Web Information Extraction Framework with Adaptive and Failure Prediction Feature. J. Data and Information Quality 14, 2, Article 12 (June 2022), 21 pages. https://doi.org/10.1145/3495008
- M. Ramalingam, D. Saranya, R. ShankarRam, P. Chinnasamy, K. Ramprathap and A. Kalaiarasi, "An Automated Framework for Dynamic Web Information Retrieval Using Deep Learning," 2022 International Conference on Computer Communication and Informatics (ICCCI), 2022, pp. 1-6, doi: 10.1109/ICCCI54379.2022.9741044.
- Ping Yang (2022) Financial Information Extraction Using the Improved Hidden Markov Model and Deep Learning, IETE Journal of Research, DOI: 10.1080/03772063.2022.2054873
- Nair, P.C., Gupta, D., Indira Devi, B. Automatic Symptom Extraction from Unstructured Web Data for Designing Healthcare Systems. In: Shetty, N.R., Patnaik, L.M., Nagaraj, H.C., Hamsavath, P.N., Nalini, N. (eds) Emerging Research in Computing, Information, Communication and Applications. Lecture Notes in Electrical Engineering, vol 790. 2022. Springer, Singapore. https://doi.org/10.1007/978-981-16-1342-5_46
- Qifan Wang, Yi Fang, Anirudh Ravula, Fuli Feng, Xiaojun Quan, and Dongfang Liu. 2022. WebFormer: The Web-page Transformer for Structure Information Extraction. In Proceedings of the ACM Web Conference 2022 (WWW '22). Association for Computing Machinery, New York, NY, USA, 3124–3133. https://doi.org/10.1145/3485447.3512032
- Rinaldo Lima, Bernard Espinasse, and Fred Freitas. 2010. An adaptive information extraction system based on wrapper induction with POS tagging. In Proceedings of the 2010 ACM Symposium on Applied Computing (SAC '10). Association for Computing Machinery, New York, NY, USA, 1815–1820. https://doi.org/10.1145/1774088.1774471.
- Mirończuk, M.M. (2018). The BigGrams: the semi-supervised information extraction system from HTML: an improvement in the wrapper induction. Knowl Inf Syst 54, 711–776. https://doi.org/10.1007/s10115-017-1097-2.
- Fethi Fkih and Mohamed Nazih Omri. Estimation of a Priori Decision Threshold for Collocations Extraction: An Empirical Study. International Journal of Information Technology and Web Engineering (IJITWE), 8(3), 2013.
- Anupama Gupta, Imon Banerjee, Daniel L. Rubin. Automatic information extraction from unstructured mammography reports using distributed semantics. Vol 78, 78-86. 2018.
- Shuo Yang, Jingzhi Guo, Improved strategies of relation extraction based on graph convolutional model on tree structure for web information processing, Journal of Industrial Information Integration, Volume 25, 2022, 100301, ISSN 2452-414X, https://doi.org/10.1016/j.jii.2021.100301
- B. Bazeer Ahamed, D. Yuvaraj, S. Shitharth, Olfat M. Mirza, Aisha Alsobhi, Ayman Yafoz, "An Efficient Mechanism for Deep Web Data Extraction Based on Tree-Structured Web Pattern Matching", Wireless Communications and Mobile Computing, vol. 2022, Article ID 6335201, 10 pages, 2022. https://doi.org/10.1155/2022/6335201
- Zhinian Shu, Xiaorong Li, "Automatic Extraction of Web Page Text Information Based on Network Topology Coincidence Degree", Wireless Communications and Mobile Computing, vol. 2022, Article ID 9220661, 10 pages, 2022. https://doi.org/10.1155/2022/9220661
- Fethi Fkih and Mohamed Nazih Omri. Information Retrieval from Unstructured Web Text Document Based on Automatic Learning of the Threshold. International Journal of Information Retrieval Research (IJIRR), 2(4), 2012.
- Fethi Fkih and Mohamed Nazih Omri. Hybridization of an Index Based on Concept Lattice with a Terminology Extraction Model for Semantic Information Retrieval Guided by WordNet. In: Abraham, A., Haqiq, A., Alimi, A., Mezzour, G., Rokbani, N., Muda, A. (eds) Proceedings of the 16th International Conference on Hybrid Intelligent Systems (HIS 2016). HIS 2016. Advances in Intelligent Systems and Computing, vol 552. 2017. Springer, Cham. https://doi.org/10.1007/978-3-319-52941-7_15
- Sarra Ouni, Fethi Fkih and Mohamed Nazih Omri. Toward a new approach to author profiling based on the extraction of statistical features. Soc. Netw. Anal. Min. 11, 59 (2021). https://doi.org/10.1007/s13278-021-00768-6
- Duy Dang-Pham, Karlheinz Kautz, Ai-Phuong Hoang and Siddhi Pittayachawan. Identifying information security opinion leaders in organizations: Insights from the theory of social power bases and social network analysis. Computers & Security, Volume 112,2022.