ZUL-Gene: Model for generating text descriptions of gene sets
Автор: Buzanov G.S., Voronov A.D., Makeev V.J.
Журнал: Труды Московского физико-технического института @trudy-mipt
Рубрика: Информатика и управление
Статья в выпуске: 3 (67) т.17, 2025 года.
Бесплатный доступ
The task of automatically generating informative textual descriptions of gene sets is a relevant challenge in modern bioinformatics: it arises during the analysis of omics data and remains labor-intensive when performed manually. Currently, no specialized neural language model exists that can solve this task. We adapted the BioGPT model to focus on generating textual descriptions of gene sets. For fine-tuning, we constructed a corpus including textual information on signaling pathways and functions of individual genes from BioCarta and UniProt, which improved the accuracy and informativeness of the generated texts. The training process used data on genes and their sets, augmented with synthetic permutations and negative examples, enhancing the model’s ability to distinguish relevant from irrelevant descriptions. A comparative evaluation against GPT-4 was conducted through expert review by specialists in bioinformatics and molecular biology. The results showed that the fine-tuned BioGPT outperforms GPT-4 in terms of accuracy, usefulness, clarity, and completeness of the generated descriptions.
BioGPT, generative model, text generation, gene sets
Короткий адрес: https://sciup.org/142245835
IDR: 142245835 | УДК: 004.912