Finding nearly optimal GDT scores.
Journal: 2011/August - Journal of Computational Biology
ISSN: 1557-8666
Abstract:
Global Distance Test (GDT) is one of the commonly accepted measures to assess the quality of predicted protein structures. Given a set of distance thresholds, GDT maximizes the percentage of superimposed (or matched) residue pairs under each threshold, and reports the average of these percentages as the final score. The computation of GDT score was conjectured to be NP-hard. All available methods are heuristic and do not guarantee the optimality of scores. These heuristic strategies usually result in underestimated GDT scores. Contrary to the conjecture, the problem can be solved exactly in polynomial time, albeit the method would be too slow for practical usage. In this paper we propose an efficient tool called OptGDT to obtain GDT scores with theoretically guaranteed accuracies. Denote ℓ as the number of matched residue pairs found by OptGDT for a given threshold d. Let ℓ' be the optimal number of matched residues pairs for threshold d/(1 + ε), where ε is a parameter in our computation. OptGDT guarantees that ℓ ≥ ℓ'. We applied our tool to CASP8 (The eighth Critical Assessment of Structure Prediction Techniques) data. For 87.3% of the predicted models, better GDT scores are obtained when OptGDT is used. In some cases, the number of matched residue pairs were improved by at least 10%. The tool runs in time O(n³) log n/ε⁵) for a given threshold d and parameter ε. In the case of globular proteins, the tool can be improved to a randomized algorithm of O(n log² n) runtime with probability at least 1 - O(1/n). Released under the GPL license and downloadable from http://bioinformatics.uwaterloo.ca/∼scli/OptGDT/ .
Relations:
Content
Citations
(3)
References
(5)
Chemicals
(1)
Processes
(3)
Affiliates
(1)
Similar articles
Articles by the same authors
Discussion board
J Comput Biol 18(5): 693-704

Finding Nearly Optimal GDT Scores

David R. Cheriton School of Computer Science, University of Waterloo Waterloo, Ontario, Canada.
Toyota Technological Institute at Chicago, Chicago, Illinois.
Corresponding author.
Address correspondence to: Dr. Ming Li, David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, Canada N2L 3G1. E-mail:ac.oolretawu.sc@ilm
Address correspondence to: Dr. Ming Li, David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, Canada N2L 3G1. E-mail:ac.oolretawu.sc@ilm

Abstract

Global Distance Test (GDT) is one of the commonly accepted measures to assess the quality of predicted protein structures. Given a set of distance thresholds, GDT maximizes the percentage of superimposed (or matched) residue pairs under each threshold, and reports the average of these percentages as the final score. The computation of GDT score was conjectured to be NP-hard. All available methods are heuristic and do not guarantee the optimality of scores. These heuristic strategies usually result in underestimated GDT scores. Contrary to the conjecture, the problem can be solved exactly in polynomial time, albeit the method would be too slow for practical usage. In this paper we propose an efficient tool called OptGDT to obtain GDT scores with theoretically guaranteed accuracies. Denote ℓ as the number of matched residue pairs found by OptGDT for a given threshold d. Let ℓ′ be the optimal number of matched residues pairs for threshold d/(1 + ε), where ε is a parameter in our computation. OptGDT guarantees that ℓ ≥ ℓ′. We applied our tool to CASP8 (The eighth Critical Assessment of Structure Prediction Techniques) data. For 87.3% of the predicted models, better GDT scores are obtained when OptGDT is used. In some cases, the number of matched residue pairs were improved by at least 10%. The tool runs in time O(n log n/ε) for a given threshold d and parameter ε. In the case of globular proteins, the tool can be improved to a randomized algorithm of O(n log n) runtime with probability at least 1 − O(1/n). Released under the GPL license and downloadable from http://bioinformatics.uwaterloo.ca/∼scli/OptGDT/.

Key words: algorithms, alignment, computational molecular biology, linear programming, protein folding
Abstract
Collaboration tool especially designed for Life Science professionals.Drag-and-drop any entity to your messages.