Long Term Accessible data for preprint: DOI: 10.22541/au.160317361.15075213/v1
The prediction of amino acid contacts from protein sequence is an important problem, as protein contacts are a vital step towards the prediction of folded protein structures. We propose that a powerful concept from deep learning, called ensembling, can increase the accuracy of protein contact predictions by combining the outputs of different neural network models. We show that ensembling the predictions made by different groups at the recent Critical Assessment of Protein Structure Prediction (CASP13) outperforms all individual groups. Further, we show that contacts derived from the distance predictions of three additional deep neural networks – AlphaFold, trRosetta, and ProSPr – can be substantially improved by ensembling all three networks. We also show that ensembling these recent deep neural networks with the best CASP13 group creates a superior contact prediction tool. Finally, we demonstrate that two ensembled networks can successfully differentiate between the folds of two highly homologous sequences. In order to build further on these findings, we propose the creation of a better protein contact benchmark set and additional open-source contact prediction methods.