protein from PDB

Timothy Lezon · Post by **Timothy Lezon** » Fri Mar 23, 2012 12:15 pm

Hi,

I am trying to get a handle on what happens when a PDB file is read using 'protein'. In ExampleLoadPDB2.cpp, immediately after
State state = system.getDefaultState();
I added:

protein.writePdb(state,std::cout);
return 0;

I thought that this should print the input structure, modulo rigid transformations and hydrogen atoms, but it produces a structure with alpha carbon RMSD of 0.6 from the input PDB file. What am I missing? Is there some way to read a protein from PDB without distorting the structure?

Thanks,
Tim

Samuel Flores · Post by **Samuel Flores** » Fri Mar 23, 2012 4:29 pm

This is actually not trivial. There are a variety of ways to turn cartesian coordinates read using PDBStructure into internal coordinates in a Biopolymer. The simplest way is to use several "matchDefault .." type methods. But, error accumulates over a sufficiently long chain, if you're not using SimTK's special double-precision REMARK-SIMTK-COORDS records in your PDB file. I'm nonetheless appending the code to do that here.

What is it you're trying to do? I've spent quite a bit of time solving these problems in MMB. If you explain your problem, I might be able to tell you how to do it using MMB without writing code. Or, you could look in MMB (BiopolymerClass::matchCoordinates method, in BiopolymerClass.cpp) to see how I did it the more reliable way.

Biopolymer tempMolecule = RNA ( "UACGUAAGGA");
tempMolecule.setPdbChainId(*("A"));
ifstream inputFile("/Users/Sam/svn/RNAToolbox/trunk/src/in.pdb");

PdbStructure pdbStructure(inputFile);
inputFile.close();

bool guessCoords = true ;
Compound::AtomTargetLocations biopolymerAtomTargets = tempMolecule.createAtomTargets(pdbStructure,guessCoords);
bool matchHydrogenAtomLocations = false;
if (! matchHydrogenAtomLocations)
{
map<Compound::AtomIndex, Vec3>::iterator it;
map<Compound::AtomIndex, Vec3>::iterator next;
next = biopolymerAtomTargets.begin();
while (next != biopolymerAtomTargets.end())
{
it = next;
Compound::AtomIndex m = (*it).first;
Element myAtomElement = tempMolecule.getAtomElement(m);
next++;
//cout<<__FILE__<<":"<<__LINE__<<(myAtomElement.getName())<<endl;
if ((myAtomElement.getName()).compare("hydrogen") == 0)
{
biopolymerAtomTargets.erase(it);
}
}
}

cout<<__FILE__<<":"<<__LINE__<<" "<<biopolymerAtomTargets.size()<<endl;

//tempMolecule.matchDefaultConfiguration(biopolymerAtomTargets,Compound::Match_Exact);

guessCoords = false ;
biopolymerAtomTargets = tempMolecule.createAtomTargets(pdbStructure,guessCoords);

tempMolecule.writeDefaultPdb("test.1.pdb",Vec3(0));
tempMolecule.matchDefaultAtomChirality(biopolymerAtomTargets, 0.01, false);

tempMolecule.writeDefaultPdb("test.2.pdb",Vec3(0));

tempMolecule.matchDefaultBondLengths(biopolymerAtomTargets);
tempMolecule.writeDefaultPdb("test.3.pdb",Vec3(0));
tempMolecule.matchDefaultBondAngles(biopolymerAtomTargets);
tempMolecule.writeDefaultPdb("test.3.pdb",Vec3(0));

// Set dihedral angles even when bonded atoms are planar
tempMolecule.matchDefaultDihedralAngles(biopolymerAtomTargets, Compound::DistortPlanarBonds);
tempMolecule.writeDefaultPdb("test.4.pdb",Vec3(0));

tempMolecule.matchDefaultTopLevelTransform(biopolymerAtomTargets);
tempMolecule.writeDefaultPdb("test.5.pdb",Vec3(0));

tempMolecule.fitDefaultConfiguration(biopolymerAtomTargets, 0.005);
tempMolecule.writeDefaultPdb("test.6.pdb",Vec3(0));

cout<<__FILE__<<":"<<__LINE__<<" "<<biopolymerAtomTargets.size()<<endl;

Michael Sherman · Post by **Michael Sherman** » Fri Mar 23, 2012 4:31 pm

Hi, Tim. I think Sam Flores can give some practical advice. For what it's worth, the big picture is that Molmodel represents the molecule with far fewer coordinates than the original model, which has (x,y,z) for every atom. Instead groups of atoms are placed at fixed locations on bodies, and then move together as a result of rotations around torsion bonds. You have two choices about the locations at which the atoms are fixed: either (1) you take a generic internal coordinate molecule model and try to "best fit" it to the atom locations you see in the pdb file by modifying only the torsion angles, or (2) you construct the bodies by gluing down the atoms at the locations you find them in the pdb file.

We call the first option "match default" and the second "match exact". Match exact allows you to make a model that will exactly match the pdb file, but it can only be exact in that one configuration (because atoms have more degrees of freedom than do torsions). Match default is a more reasonable approach if you expect to make large conformational changes, because the atoms are placed at a kind of "average" location rather than at the locations they were found in an arbitrary conformation.

Regards,
Sherm

Timothy Lezon · Post by **Timothy Lezon** » Fri Apr 13, 2012 11:29 am

Thanks for the quick replies, and apologies for my own delay.

What I'm looking to do is simulate a large protein using various levels of resolution in the structure and energy function. The protein is a homomultimer, each subunit of which contains two domains joined by a flexible linker. As a starting point, I am working with a single subunit, keeping the domains rigid and allowing motion only along the linker backbone. I expect that I will have more questions as I progress.

The matchDefaultConfiguration seems to take care of the problem that I was having, giving me <0.7 Angstrom RMSD for a 420 residue protein. Another problem is that the PDB structure is missing residues 23 and 24, and the gap was causing problems with fitting the structure. I've deleted the first 22 residues and used renumberPdbResidues to get a proper match for now...I'll probably model the missing residues and eventually insert them into the PDB file, unless there's straightforward way to handle missing residues.

I was also able to resolve the issue with MMB, which I had previously not used.

Thanks again,
Tim

Samuel Flores · Post by **Samuel Flores** » Fri Apr 13, 2012 1:17 pm

Hi Timothy,

MMB will generate residues which span the gap you speak of. Just call :

matchGapped

somewhere in your command file.

Then when you specify the sequence of your protein, include the residues which are missing from the input structure file. MMB will then match the full coordinates of any residues it finds in the input structure file, and guess internal coordinates for the missing residues. I believe you will find this documented in the Reference Guide.

Sam

Samuel Flores · Post by **Samuel Flores** » Fri Apr 13, 2012 4:51 pm

I would suggest you write a command file according to your best understanding and if it doesn't work immediately, send it to me along with the input PDB file and I will edit it for you.

I think I may have already done what you are trying to do WRT force fields, as well. I have a feature called "Physics where you want it" which lets you turn on the atomistic (e.g. Amber99) force field in specified regions, which can also have a different flexibility from the rest of the protein.

I will be giving an MMB workshop at MIT on April 30th, if you care to attend. I will also be pleased to have you or your student visit Uppsala for a few days to work on your problem.

Homomultimers can in principle be treated by having all the internal degrees of freedom of all subunits coupled. That's something I've talked to Michael Sherman about and could be the basis of a collaboration, should you be interested. It requires writing a little code. If you have a student interested in working on the guts of MMB I will be glad to have them come for a longer training visit.

Remember, the coordinate matching is done on the basis of residue numbers, chain IDs, and atom names. So residues in the MMB command file will be matched to corresponding residues in the input PDB. The main thing is to make sure these match up, except of course for the ones that are missing from the input structure file but not the command file.

Sam

Timothy Lezon · Post by **Timothy Lezon** » Mon Apr 16, 2012 2:30 pm

Thanks again. I agree that "Physics where you want it" will be helpful here.

I'm having a problem using matchGapped in MMB. The last lines of the output are:

/home/flores/svn/RNAToolbox/trunk/src/BiopolymerClass.cpp:372 About to optimize chain A using ObservedPointFitter
/usr/local/Installer.2_8.CentOS64/MMB.2_8.dynamic.CentOS64: symbol lookup error: /usr/local/Installer.2_8.CentOS64/MMB.2_8.dynamic.CentOS64: undefined symbol: _ZN5SimTK8Compound23fitDefaultConfigurationERKSt3mapINS0_9AtomIndexENS_3VecILi3EdLi1EEESt4lessIS2_ESaISt4pairIKS2_S4_EEEdbd

Here is my commands.dat:

#-------------------------------------------------------------------------------------------------
protein A 7 TRFTEEYQLFEELGKGAFSVVRRCVKVLAGQEYAAMIINTKKLSARDHQKLEREARICRLLKHPNIVRLHDSISEEGHHYLIFDLVTGGELFEDIVAREYYSEADASHCIQQILEAVLHCHQMGVVHRNLKPENLLLASKLKGAAVKLADFGLAIEVEGEQQAWFGFAGTPGYLSPEVLRKDPYGKPVDLWACGVILYILLVGYPPFWDEDQHRLYQQIKAGAYDFPSPEWDTVTPEAKDLINKMLTINPSKRITAAEALKHPWISHRSTVASCMHRQETVDCLKKFNARRKLKGAILTVMLATRNFSVRKQEIIKVTEQLIEAISNGDFESYTKMCDPGMTAFEPEALGNLVEGLDFHRFYFENLWSRNSKPVHTTILNPHIHLMGDESACIAYIRITQYLDAGGIPRTAQSEETRVWHRRDGKWQIVHFHRSGAPS

mobilizer Rigid A 7 444

#randomizeInitialVelocities true
matchGapped

# Run Parameters
numReportingIntervals 1
reportingInterval 1.0
firstStage 4
lastStage 4
temperature 100.0
#-------------------------------------------------------------------------------------------------

I am uploading last.3.pdb.

I'm using the 64bit CentOS build on 64bit Ubuntu. Everything else seems to work fine, so I don't suspect that this is a compatibility issue.

Tim

protein from PDB

protein from PDB

Re: protein from PDB

Re: protein from PDB

Re: protein from PDB

Re: protein from PDB

Re: protein from PDB

Re: protein from PDB