Bereiche | Tage | Auswahl | Suche | Aktualisierungen | Downloads | Hilfe
AKPIK: Arbeitskreis Physik, moderne Informationstechnologie und Künstliche Intelligenz
AKPIK 1: AKPIK I: Data Science & Analytics
AKPIK 1.6: Vortrag
Dienstag, 16. März 2021, 17:15–17:30, AKPIKa
Classification of respiratory-related RNA virus sequences using Machine Learning — Louis Oberer, •Angel Diaz Carral, and Maria Fyta — Institute for Computational Physics, Universität Stuttgart, Allmandring 3, 70569 Stuttgart, Germany
A very simple and efficient approach to analyze and identify respiratory related virus sequences based on Machine Learning is proposed. The method is based on RNA sequence comparison and the open reading frame (ORF). Data from the respiratory related corona viruses are collected and features are extracted based on reoccurring nucleobase tuples in the RNA. These are further used for classification purposes. Well separated clusters for the respiratory related corona viruses were found in the feature space. The relevant features are the natural nucleobase triplets used in protein biosynthesis. Accordingly, our methodology is simply based on counting nucleobase triplets, normalizing the count to the length of the sequence and applying PCA techniques. Our very simple and very efficient approach was also validated by including more RNA sequences from the herpes virus family. We discuss the relevance of this scheme in identifying differences in similar viruses and its impact in bioanalysis.