Share 
Follow 
AboutDownloadsDocumentsForumsWikiSource CodeIssuesNews
Date:
2010-10-15 22:00
Priority:
2
State:
Open
Submitted by:
Mike Wong (mikewong899)
Assigned to:
Gurgen Tumanyan (tumanian)
Summary:
Featurize should read alternative PDB structural models

Detailed description
Protein Data Bank files often have more than one structural model. Each of these alternate configurations shows the atoms in slightly different locations. Rarely one of the alternate configurations will have extra atoms (e.g. Hydrogen atoms).

Currently FEATURE ignores all but the first PDB structural model. However, some ML methods may include other structural models as training data. After all, they are valid evidence gathered through acceptable experimental methods.

Therefore, the following alterations should be made to support this feature request.

1. Point files should accept a model accession number in the first column:


1kft.21
2ys1.9
1lhe.1

The grammar should be <PDB ID>.<MODEL ACCESSION NUMBER>

Note that 1lhe has only one model, and there is no MODEL or ENDMDL record type.

2. Protein.cc will have to be modified to skip to the requested model. If that model does not exist, it should throw a fatal error and let the user know what model accession numbers are available.

3. featurize.cc will have to be modified to take a new model accession parameter as a command-line option.

Add A Comment: Notepad

Comments:

No Comments Have Been Posted

Attached Files:

Changes

No Changes Have Been Made to This Item

Feedback