An Approach for Predicting Protein Abundance in Yeast Cells Based on Their Genomical Sequences
Автор: Venzel A.S., Klimenko A.I., Ivanisenko T.V., Demenkov P.S., Lashin S.A., Ivanisenko V.A.
Журнал: Проблемы информатики @problem-info
Рубрика: Прикладные информационные технологии. Биоинформатика
Статья в выпуске: 4 (65), 2024 года.
Бесплатный доступ
In this work presented a new method for predicting protein abundance in Saccharomyces cerevisiae baker’s yeast cells, based on the analysis of their biological sequences using pre-trained language models. For sequence processing, ESM2 family models were applied to amino acid protein sequences, and the GENA-LM model was used for nucleotide gene sequences, which allowed for obtaining informative embedding of input data. The study evaluates the impact of various architectures and sizes of pretrained language models on prediction accuracy. The proposed method has potential applications in biotechnology, optimization of biosynthesis processes, and computer-aided design of producer strains with enhanced gene expression of target proteins. The results of the study may contribute to a deeper understanding of genetic expression regulation mechanisms and open up prospects for predicting protein abundance in other microorganisms.
Трансформер esm2
Короткий адрес: https://sciup.org/143184143
IDR: 143184143 | DOI: 10.24412/2073-0667-2024-4-17-26