MACHINE LEARNING ALGORITHM FOR THE CONSTRUCTION OF A NUCLEOTIDE SEQUENCE IN THE NANOFOR SPS SEQUENCER USING THE PRINCIPAL COMPONENT ANALYSIS

Автор: V. V. Manoilov, A. G. Borodinov, A. I. Petrov, I. V. Zarutsky, V. E. Kurochkin

Журнал: Научное приборостроение @nauchnoe-priborostroenie

Рубрика: Математические методы и моделирование в приборостроении

Статья в выпуске: 2, 2023 года.

Бесплатный доступ

The development of information technologies and mathematical methods for data processing plays an essential role in establishing various features in the analyzed nucleic acids and trends in their modifications. An important stage in the technology of massively parallel sequencing of nucleic acids is the process of constructing a nucleotide sequence based on the measured intensities of fluorescence signals. The paper considers an algorithm for generating a training sample, that is used to construct a sequence of letter codes of DNA nucleotides via the intensities of fluorescence signals obtained directly from the results of image processing. These signals were not corrected for the physical and chemical characteristics of the sequencing process. The algorithm uses principal component analysis and a k-means classifier. With the help of such a classifier, the data after transformation using the method of principal components is separated into four independent classes according to the number of letter codes of DNA nucleotides. With the help of the training sample, the class to which the vector containing the fluorescence signal data belongs, and hence its letter code, are determined. The algorithm's performance on a test sample revealed great outcome reliability.

Еще

: nucleic acid sequencing, mathematical processing and classification of multivariate data, principal component analysis, machine learning

Короткий адрес: https://sciup.org/142236995

IDR: 142236995 | УДК: 543.07