Berlin 2008 – scientific programme
Parts | Days | Selection | Search | Downloads | Help
BP: Fachverband Biologische Physik
BP 25: Protein Structure and Folding
BP 25.9: Talk
Thursday, February 28, 2008, 16:30–16:45, PC 203
Accurate sequence alignment statistics for different protein models — •Stefan Wolfsheimer1, Inke Herms2, Sven Rahmann3, and Alexander K Hartmann1 — 1Institut für Physik, Universität Oldenburg, Germany — 2AG Genominformatik/COMET, Technische Fakultät,Universität Bielefeld, Germany — 3Fachbereich Informatik, TU Dortmund, Germany
Searching for homologous sequences or identifying proteins are well studied fields in bioinformatics. For these purposes a large sequence database is searched with a query by sequence alignment algorithms. The Smith-Waterman algorithm is a famous representative of those. A meaningful interpretation of the score is given by a p-value, which states the probability of the score within a selected null model.
Exact results are only known for gapless alignment of infinitely long uncorrelated protein models, where the amino acids are independent and identically distributed (i.i.d.). For this case a Gumbel distribution is expected. It turned out that real proteins do not fulfill these restrictions: first they are finite and secondly the i.i.d. assumption might not be the best description. Therefore we study more complex systems which incorporate information from secondary structure annotation to obtain a more plausible null model.
By generalized ensemble Monte Carlo simulations we obtain the score distributions down to very small probabilities (p ∼ 10−100 ). We find strong deviations from the expected form in the rare-event tail. Our results indicate that p-values are overestimated in the high scoring regime, when assuming a Gumbel extrapolation.