MediBERT: A Medical Chatbot Built Using KeyBERT, BioBERT and GPT-2

Sabbir Hossain; Rahman Sharar; Md. Ibrahim Bahadur; Abu Sufian; Rashidul Hasan Nabil

doi:10.5815/ijisa.2023.04.05

Scientific articles \ Prolegomena. Fundamentals of knowledge and culture. Propaedeutics \ Computer science and technology. Computing. Data processing \ Software

MediBERT: A Medical Chatbot Built Using KeyBERT, BioBERT and GPT-2

Author: Sabbir Hossain, Rahman Sharar, Md. Ibrahim Bahadur, Abu Sufian, Rashidul Hasan Nabil

Journal: International Journal of Intelligent Systems and Applications @ijisa

Article in issue: 4 vol.15, 2023.

Free access

The emergence of chatbots over the last 50 years has been the primary consequence of the need of a virtual aid. Unlike their biological anthropomorphic counterpart in the form of fellow homo sapiens, chatbots have the ability to instantaneously present themselves at the user's need and convenience. Be it for something as benign as feeling the need of a friend to talk to, to a more dire case such as medical assistance, chatbots are unequivocally ubiquitous in their utility. This paper aims to develop one such chatbot that is capable of not only analyzing human text (and speech in the near future), but also refining the ability to assist them medically through the process of accumulating data from relevant datasets. Although Recurrent Neural Networks (RNNs) are often used to develop chatbots, the constant presence of the vanishing gradient issue brought about by backpropagation, coupled with the cumbersome process of sequentially parsing each word individually has led to the increased usage of Transformer Neural Networks (TNNs) instead, which parses entire sentences at once while simultaneously giving context to it via embeddings, leading to increased parallelization. Two variants of the TNN Bidirectional Encoder Representations from Transformers (BERT), namely KeyBERT and BioBERT, are used for tagging the keywords in each sentence and for contextual vectorization into Q/A pairs for matrix multiplication, respectively. A final layer of GPT-2 (Generative Pre-trained Transformer) is applied to fine-tune the results from the BioBERT into a form that is human readable. The outcome of such an attempt could potentially lessen the need for trips to the nearest physician, and the temporal delay and financial resources required to do so.

Medical chatbot, RNN, LSTM, GRU, TNN, KeyBERT, BioBERT, GPT-2

Short address: https://sciup.org/15019006

IDR: 15019006 | DOI: 10.5815/ijisa.2023.04.05

References MediBERT: A Medical Chatbot Built Using KeyBERT, BioBERT and GPT-2

Ayanouz, S., Abdelhakim, B. and Benhmed, M., 2020. A Smart Chatbot Architecture based NLP and Machine Learning for Health Care Assistance. Proceedings of the 3rd International Conference on Networking, Information Systems & Security, doi: https://doi.org/10.1145/3386723.3387897
A. Mullen, L., Benoit, K., Keyes, O., Selivanov, D., & Arnold, J. (2018). Fast, Consistent Tokenization of Natural Language Text. Journal of Open Source Software, 3(23), 655. https://doi.org/10.21105/joss.00655
Kumar, L., & Bhatia, P. K. (2013). TEXT MINING: CONCEPTS, PROCESS AND APPLICATIONS. Journal of Global Research in Computer Sciences, 4(3), 36–39. https://www.rroij.com/open-access/text-mining-concepts-process-and-applications.php?aid=38178
Ferilli, S., Esposito, F., & Grieco, D. (2014). Automatic Learning of Linguistic Resources for Stopword Removal and Stemming from Text. Procedia Computer Science, 38, 116–123. https://doi.org/10.1016/j.procs.2014.10.019
du Buf, J., Kardan, M., & Spann, M. (1990). Texture feature performance for image segmentation. Pattern Recognition, 23(3–4), 291–309. https://doi.org/10.1016/0031-3203 (90)90017-f
Fattahi, J., & Mejri, M. (2021). SpaML: a Bimodal Ensemble Learning Spam Detector based on NLP Techniques. 2021 IEEE 5th International Conference on Cryptography, Security and Privacy (CSP). https://doi.org/10.1109/csp51677.2021.9357595
Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A Fast Learning Algorithm for Deep Belief Nets. Neural Computation, 18(7), 1527–1554. https://doi.org/10.1162/neco.2006.18.7.1527
Williams, R. J., & Zipser, D. (1995). Gradient-based learning algorithms for recurrent networks and their computational complexity. L. Erlbaum Associates Inc. EBooks, 433–486. https://dl.acm.org/citation.cfm?id=201801
Kim, Y., Denton, C., Hoang, L., & Rush, A. M. (2017). Structured Attention Networks. ArXiv: Computation and Language. https://arxiv.org/pdf/1702.00887.pdf
Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep Contextualized Word Representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). https://doi.org/10.18653/v1/n18-1202
Hu, Y., Huber, A. E. G., Anumula, J., & Liu, S. (2018). Overcoming the vanishing gradient problem in plain recurrent networks. Cornell University - ArXiv. https://doi.org/10.48550/arxiv.1801.06105
Werbos, P. J. (1988). Generalization of backpropagation with application to a recurrent gas market model. Neural Networks, 1(4), 339–356. https://doi.org/10.1016/0893-6080(88)90007-x
Robinson, A. J. & Fallside, F. (1987). The Utility Driven Dynamic Error Propagation Network (CUED/F-INFENG/TR.1). Engineering Department, Cambridge University.
Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Yu, Y., Si, X., Hu, C., & Zhang, J. (2019). A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures. Neural Computation, 31(7), 1235–1270. https://doi.org/10.1162/neco_a_01199
Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to Forget: Continual Prediction with LSTM. Neural Computation, 12(10), 2451–2471. https://doi.org/10.1162/089976600300015015
Li, W., Qi, F., Tang, M., & Yu, Z. (2020). Bidirectional LSTM with self-attention mechanism and multi-channel features for sentiment classification. Neurocomputing, 387, 63–77. https://doi.org/10.1016/j.neucom.2020.01.006
Shao, D., Zheng, N., Yang, Z., Chen, Z., Xiang, Y., Xian, Y., & Yu, Z. (2019). Domain-Specific Chinese Word Segmentation Based on Bi-Directional Long-Short Term Memory Model. IEEE Access, 7, 12993–13002. https://doi.org/10.1109/access.2019.2892836
Attri, I., & Dutta, D. M. (2019). Bi-Lingual (English, Punjabi) Sarcastic Sentiment Analysis by using Classification Methods. International Journal of Innovative Technology and Exploring Engineering, 8(9), 1383–1388. https://doi.org/10.35940/ijitee.i8053.078919
Ouerhani, N., Maalel, A., Ghézala, H. B., & Chouri, S. (2020). Smart Ubiquitous Chatbot for COVID-19 Assistance with Deep learning Sentiment Analysis Model during and after quarantine. Research Square. https://doi.org/10.21203/rs.3.rs-33343/v1
Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014b). Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). https://doi.org/10.3115/v1/d14-1179
Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. ArXiv: Neural and Evolutionary Computing. https://arxiv.org/pdf/1412.3555
Zhao, R., Wang, D., Yan, R., Mao, K., Shen, F., & Wang, J. (2018). Machine HealthMonitoring Using Local Feature-Based Gated Recurrent Unit Networks. IEEE Transactionson Industrial Electronics, 65(2), 1539–1548. https://doi.org/10.1109/tie.2017.2733438
Yang, S., Yu, X., & Zhou, Y. (2020). LSTM and GRU Neural Network Performance Comparison Study: Taking Yelp Review Dataset as an Example. 2020 International Workshop on Electronic Communication and Artificial Intelligence (IWECAI). https://doi.org/10.1109/iwecai50956.2020.00027
Arai, K., Bhatia, R., & Kapoor, S. (Eds.). (2019). Proceedings of the Future Technologies Conference (FTC) 2018. Advances in Intelligent Systems and Computing. https://doi.org/10.1007/978-3-030-02686-8
Cui, L., Huang, S., Wei, F., Tan, C., Duan, C., & Zhou, M. (2017). SuperAgent: A Customer Service Chatbot for E-commerce Websites. Proceedings of ACL 2017, System Demonstrations. https://doi.org/10.18653/v1/p17-4017
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is All you Need. Neural Information Processing Systems, 30, 5998–6008. https://arxiv.org/pdf/1706.03762v5
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-End Object Detection with Transformers. Computer Vision – ECCV 2020, 213–229. https://doi.org/10.1007/978-3-030-58452-8_13
Zeyer, A., Bahar, P., Irie, K., Schluter, R., & Ney, H. (2019). A Comparison of Transformer and LSTM Encoder Decoder Models for ASR. 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). https://doi.org/10.1109/asru46091.2019.9004025
Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural Machine Translation by Jointly Learning to Align and Translate. International Conference on Learning Representations. https://arxiv.org/pdf/1409.0473
Sutskever, I., Vinyals, O., & V. Le, Q. (2014). Sequence to Sequence Learning with Neural Networks. Cornell University - ArXiv. https://doi.org/10.48550/arXiv.1409.3215
Kaiser, U., & Sutskever, I. (2016). Neural GPUs Learn Algorithms. International Conference on Learning Representations. https://arxiv.org/pdf/1511.08228
Kalchbrenner, N., Espeholt, L., Simonyan, K., Van Den Oord, A., Graves, A., & Kavukcuoglu, K. (2016). Neural Machine Translation in Linear Time. ArXiv: Computation and Language. https://arxiv.org/pdf/1610.10099
Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv: Computation and Language. https://arxiv.org/pdf/1810.04805v2
Zhu, Y., Kiros, R., Zemel, R., Salakhutdinov, R., Urtasun, R., Torralba, A., & Fidler, S. (2015). Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books. 2015 IEEE International Conference on Computer Vision (ICCV). https://doi.org/10.1109/iccv.2015.11
Acheampong, F. A., Nunoo-Mensah, H., & Chen, W. (2021). Transformer models for text-based emotion detection: a review of BERT-based approaches. Artificial Intelligence Review, 54(8), 5789–5829. https://doi.org/10.1007/s10462-021-09958-2
Qudar, M. M. A., & Mago, V. (2020). TweetBERT: A Pretrained Language Representation Model for Twitter Text Analysis. ArXiv: Computation and Language. https://arxiv.org/pdf/2010.11091.pdf
Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang, J. (2019b). BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. https://doi.org/10.1093/bioinformatics/btz682
Mathur, A., & Suchithra, M. (2022). Application of Abstractive Summarization in Multiple Choice Question Generation. 2022 International Conference on Computational Intelligence and Sustainable Engineering Solutions (CISES). https://doi.org/10.1109/CISES54857.2022.9844396
Hegde, C. V., & Patil, S. (2020). Unsupervised Paraphrase Generation using Pre-trained Language Models. ArXiv: Computation and Language. https://arxiv.org/abs/2006.05477
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (n.d.). Language Models are Unsupervised Multitask Learners. https://d4mucfpksywv.cloudfront.net/better-language models/language_models_are_unsupervised_multitask_learners.pdf
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., . . . Amodei, D. (2020). Language Models are Few-Shot Learners. ArXiv: Computation and Language. https://arxiv.org/pdf/2005.14165.pdf
Mikolov, T., Chen, K., Corrado, G. S., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. ArXiv: Computation and Language. http://export.arxiv.org/pdf/1301.3781
Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2001). BLEU. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics - ACL ’02. https://doi.org/10.3115/1073083.1073135
Liu, H., Lin, T., Sun, H., Lin, W., Chang, C., Zhong, T., & Rudnicky, A. I. (2017). RubyStar: A Non-Task-Oriented Mixture Model Dialog System. ArXiv: Computation and Language. http://export.arxiv.org/pdf/1711.02781
Serban, I. V., Sankar, C., Germain, M., Zhang, S., Lin, Z., Subramanian, S., Kim, T., Pieper, M., Chandar, S., Ke, N. R., Rajeshwar, S., De Brébisson, A., Sotelo, J., Suhubdy, D., Michalski, V., Nguyen, A., Pineau, J., & Bengio, Y. (2017). A Deep Reinforcement Learning Chatbot. ArXiv: Computation and Language. http://export.arxiv.org/pdf/1709.02349
Adamopoulou, E., & Moussiades, L. (2020). An Overview of Chatbot Technology. IFIP Advances in Information and Communication Technology, 373–383. https://doi.org/10.1007/978-3-030-49186-4_31
Tyen, G., Brenchley, M., Caines, A., & Buttery, P. (n.d.). Towards An Open-Domain Chatbot For Language Practice. Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.bea-1.28
Yin, Z., Chang, K., & Zhang, R. (n.d.). DeepProbe: Information Directed Sequence Understanding and Chatbot Design via Recurrent Neural Networks. Proceedings of the 23rdACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017. https://doi.org/10.48550/arXiv.1707.05470
Qiu, M., Li, F. L., Wang, S., Gao, X., Chen, Y., Zhao, W., Chen, H., Huang, J., & Chu, W. (2017). AliMe Chat: A Sequence to Sequence and Rerank based Chatbot Engine. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). https://doi.org/10.18653/v1/p17-2079
Cui, L., Huang, S., Wei, F., Tan, C., Duan, C., & Zhou, M. (2017b). SuperAgent: A Customer Service Chatbot for E-commerce Websites. Proceedings of ACL 2017, System Demonstrations. https://doi.org/10.18653/v1/p17-4017
Koehler, B. J. (2017, December 1). AhriBot: A Python Bot Written for Discord Tasks. https://keep.lib.asu.edu/items/134113
Liu, H., Lin, T., Sun, H., Lin, W., Chang, C., Zhong, T., & Rudnicky, A. I. (2017b). RubyStar: A Non-Task-Oriented Mixture Model Dialog System. ArXiv: Computation and Language. http://export.arxiv.org/pdf/1711.02781
Tiong, R. L., & Alum, J. (1997). Evaluation of proposals for BOT projects. International Journal of Project Management, 15(2), 67–72. https://doi.org/10.1016/s0263-7863(96)00003-8
Rick, S. R., Goldberg, A. P., & Weibel, N. (2019). SleepBot. Proceedings of the 24th International Conference on Intelligent User Interfaces: Companion. https://doi.org/10.1145/3308557.3308712
Epstein, J., & Klinkenberg, W. (2001). From Eliza to Internet: a brief history of computerized assessment. Computers in Human Behavior, 17(3), 295–314. https://doi.org/10.1016/s0747-5632(01)00004-8
Weizenbaum, J. (1966). ELIZA—a computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36–45. https://doi.org/10.1145/365153.365168
Jurafsky, D., & Martin, J. (2000). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall EBooks. https://nats-www.informatik.uni-hamburg.de/pub/CDG/JurafskyMartin00Comments/JurafskyMartin00-Review.pdf
Wallace, R. S. (2007). The Anatomy of A.L.I.C.E. Parsing the Turing Test, 181–210. https://doi.org/10.1007/978-1-4020-6710-5_13
Bao, Q., Ni, L., & Liu, J. (2020). HHH: An Online Medical Chatbot System based on Knowledge Graph and Hierarchical Bi-Directional Attention. Proceedings of the Australasian Computer Science Week Multiconference. https://doi.org/10.1145/3373017.3373049
Sabharwal, N., & Agrawal, A. (2020). Introduction to Google Dialogflow. Cognitive Virtual Assistants Using Google Dialogflow, 13–54. https://doi.org/10.1007/978-1-4842-5741-8_2
Samuel, I., Ogunkeye, F. A., Olajube, A., & Awelewa, A. (2020, November). Development of a voice chatbot for payment using amazon lex service with eyowo as the payment platform. In 2020 International Conference on Decision Aid Sciences and Application (DASA) (pp. 104-108). IEEE.