bonded forces with variable numbers of particles

Justin MacCallum · Post by **Justin MacCallum** » Tue Feb 26, 2013 6:52 pm

I am interested in adding r^-6 weighted distance restraints. In these calculations, you have two groups of atoms, I and J, and you compute an energy based on the r^-6 average of all of the pairwise distances between the I's and J's:

Code: Select all

E = 0.5 * k * (r_eff - r_0)^2

r_eff = ( sum_ij(r_ij^-6) / N ) ^ (-1/6)

In our calculations, we can sometimes have thousands of these types of restraints.

I'm not sure the best way to implement this type of interaction because the number of I's and J's will change. For any particular number of I's and J's, I can easily write a CustomCompoundBondForce to calculate the energy above. I can also easily write a python class to dynamically generate code for any combination of I's and J's. However, I am a bit concerned that this could have performance implications. This approach could easily generate hundreds of custom kernels (for different combinations of I and J) and I'm worried that having many kernels, some of which are only sparsely "populated" could have performance implications.

The other approach would be to write a custom force plugin. This is obviously more work and much lower level. I barely remember how to program in C++ anymore (python spoils me) and I'm not yet very knowledgeable with cuda.

Have you thought about implementing these types of forces? Do you have thoughts on any of the pros and cons listed above?

Peter Eastman · Post by **Peter Eastman** » Wed Feb 27, 2013 11:04 am

Hi Justin,

Could you explain more about what you're doing? What are you using this potential to model? When you say, "the number of I's and J's will change," do you mean during a single simulation, or between different simulations? Also, am I correct in understanding that the constraints are not based on the distances between individual atoms, but rather on the average of a bunch of different pairwise distances? What is the typical size of an "I" or "J" group, and how many of those groups would typically exist at once?

Peter

Justin MacCallum · Post by **Justin MacCallum** » Wed Feb 27, 2013 12:39 pm

Could you explain more about what you're doing? What are you using this potential to model? When you say, "the number of I's and J's will change," do you mean during a single simulation, or between different simulations? Also, am I correct in understanding that the constraints are not based on the distances between individual atoms, but rather on the average of a bunch of different pairwise distances? What is the typical size of an "I" or "J" group, and how many of those groups would typically exist at once?

Hi Peter,

Sorry for not being more clear. The potential is to model ambiguous NOE restraints from NMR. Typically you would specify two groups of atoms and the interaction would be based on an r^-6 average of all pairs of atoms between the two groups (the I's and J's). A particular restraint will always involve the same number of atoms during the course of the simulation.

We will sometimes have several thousand of these restraints in a single system (typically 1 per NOE peak from NMR). The values for #I and #J are varied. One NOE might only be explainable by a few pairs of atoms (#I and #J are both small), while others might be very ambiguous and the signal could be due to as many as several hundred atom pairs (#I and #J might be as large as 20).

For any particular value of #I and #J, I can easily write a CustomCompoundBond to compute the interaction. Just to be concrete, say (#I = 2, #J = 1) or (#I = 2, #J = 2). The code would look like:

Code: Select all

ambig_rest_2_1 = CustomCompoundBondForce(3,  '0.5*kbond*( (distance(p1, p3)^-6.0 + distance(p2, p3)^-6.0) / 2.0) ) ^ (-1.0/6.0)^2.0' )

ambig_rest_2_2 = CustomCompoundBondForce(4,  '0.5*kbond*( (distance(p1, p3)^-6.0 + distance(p1, p4)^-6.0 + distance(p2,p3)^-6.0 + distance(p2,p4)^-6.0) / 4.0) ) ^ (-1.0/6.0)^2.0' )

I could write similar code any arbitrary value of #I and #J. Obviously, this would get very tedious. But, I can easily write a python class that can generate a CustomCompoundBondForce for any particular #I and #J (meta-meta programming?). What I am worried about is that I will potentially end up with many different kernels, one for each combination of #I and #J. Some of these kernels will have many interactions to compute (which is good) because there are many NOEs from experiment that have a particular degree of ambiguity #I, #J. Other kernels might have a small number of interactions to compute (I assume this is bad) because that degree of ambiguity is rare in the dataset. I'm worried that the combination of many different kernels with varying degrees of occupancy might cause performance issues.

Here's another way to put it. I have a general interaction that I can write as a double sum over two sets of atoms (I and J). What I am proposing is to write an unrolled version of this loop for each particular combination of the sizes of the two sets (#I and #J). I can implement this unrolled loop using a CustomCompoundBondForce. To someone with little CUDA experience, but tons of python experience, this seems very easy and straightforward to do. But, I am worried about the performance of this set of unrolled kernels versus writing (from scratch) a custom kernel that actually loops over all interactions.

Peter Eastman · Post by **Peter Eastman** » Wed Feb 27, 2013 1:14 pm

Ok, I think I understand.

First, your idea of writing Python code to generate the expression sounds good and should work well.

Having lots of different custom forces is not as terrible as you might think because it doesn't actually map each one to a separate kernel, just to a separate loop within a single kernel. Still, it might be better to combine them into a single force (or a smaller number of forces), since GPUs are only efficient if they can execute identical code for lots of threads at once.

Suppose you wanted a single force for all of them. So you would set the number of atoms based on the maximum I and J you need for any group. If a particular interaction has fewer than that, just pad it with references to atom 0.

Now you need to prevent those padding atoms from actually affecting the potential. To do that, you can define two per-bond parameters called "numI" and "numJ" that store the actual numbers of I and J atoms in a particular bond. When you compute r_eff, it now becomes a weighted sum, where the weight is 1 for real atoms and 0 for padding atoms. For example, interactions involving the third "I" atom would be multiplied by step(numI-3): that will be 0 if numI<3, 1 if numI>=3.

If most pairs have small I and J but a few are larger, it might be best to split them into a few groups with a separate force for each one. That way you don't have to add up 400 distances for every single restraint when most of them have less than 10 real atom pairs.

Regarding the difference between using a custom force and a plugin... well, I'm a little scared to think what sort of expressions it's going to be generating when you've written a force involving 400 separate terms, and then it has to differentiate it with respect to the positions of each of 40 atoms! Handwritten code that just involves a couple of loops will be much much simpler. Still, it might end up working fine. So give it a try, and then we'll see whether a plugin is really required or not.

Peter

Justin MacCallum · Post by **Justin MacCallum** » Wed Feb 27, 2013 3:56 pm

Peter,

Thanks for the input. There are many helpful ideas there. I will probably try my approach, just to see what happens. Possibly using your padding approach to reduce the number of unique forces.

Ultimately, writing a plugin is probably the most sensible way forward. I just don't want to

. I haven't had to mess with low-level coding in a long time. Allocating memory and dealing with pointers seems so tedious at this point. But, I think it probably makes sense to bite the bullet.

For the future, one thing that might be helpful is to provide a simple skeleton plugin. I can obviously look at the existing code and the developer manual, but it would be easier to get a handle on a simple minimal-but-functional example force plugin.

Justin MacCallum · Post by **Justin MacCallum** » Thu Feb 28, 2013 10:14 am

OK, the CustomCompoundBondForce route is not going to work.

I wrote a simple test with #I = #J = 20 and then added a single restraint of this type to the system. Setting up the system takes ~10-15 minutes, which is presumably due to evaluating the complicated expression and taking derivatives of 40 different variables over hundreds of terms. After that, it doubles the simulation time, as you would expect because there is a single thread computing this very complicated interaction.

I will push forward with writing this as a plugin.

Peter Eastman · Post by **Peter Eastman** » Thu Feb 28, 2013 6:16 pm

Oh well. I was afraid that might happen.

The good news is that it should be a pretty simple plugin. OpenCLBondedUtilities/CudaBondedUtilities is designed for exactly this sort of case, so it can handle most of the work for you. Take a look at the developer guide, and ask as many questions as you want. And yes, it would be helpful if we had a minimal example plugin to start from. AMOEBA and RPMD can be used as examples, but they're more complicated. Perhaps we can create that soon.

Peter

Justin MacCallum · Post by **Justin MacCallum** » Thu Feb 28, 2013 8:03 pm

Code: Select all

Take a look at the developer guide, and ask as many questions as you want. And yes, it would be helpful if we had a minimal example plugin to start from. AMOEBA and RPMD can be used as examples, but they're more complicated. Perhaps we can create that soon.

Peter, thanks again for your help!

I've spent most of my day turning the amoeba plugin into a minimal plugin. I've stripped out everything except the bonded force and renamed everything, etc. It was a bit tedious, but I can get it compiled, installed, and the one remaining test for the bonded force passes just fine. It's almost a minimal example, but it still must be in the main plugins directory and depends on the main CMakeLists.txt, etc. It would be nice to have it completely standalone, but this works for now.

The next question is: How do I get the python wrappers generated for this? I don't understand exactly what's going on with swig/etc during the build process. Where do I need to tap into in order to get wrappers built for my plugin?

Peter Eastman · Post by **Peter Eastman** » Fri Mar 01, 2013 1:08 pm

Hi Justin,

The code for generating the python wrappers is contained in wrappers/python. The easiest thing to do is search for all appearances of "amoeba" in that directory (ignoring the simtk subdirectory, since that contains source code and data files), then make corresponding changes in the same places. This will involve one line each in CMakeLists.txt, setup.py, OpenMM.i, and Doxyfile.in. You may need more changes in swigInputConfig.py. The comments in that file describe what each section is for.

Peter

Robert McGibbon · Post by **Robert McGibbon** » Mon Apr 08, 2013 12:59 pm

Just to throw this out there, I agree w/ Justin that it would be really nice if it were possible for all of the plugin code, including stuff for the python wrappers and CMAKE stuff, to be containing within a single directory. I'm not sure how feasible this is for a CMake though.

-Robert

bonded forces with variable numbers of particles

bonded forces with variable numbers of particles

Re: bonded forces with variable numbers of particles

Re: bonded forces with variable numbers of particles

Re: bonded forces with variable numbers of particles

Re: bonded forces with variable numbers of particles

Re: bonded forces with variable numbers of particles

Re: bonded forces with variable numbers of particles

Re: bonded forces with variable numbers of particles

Re: bonded forces with variable numbers of particles

Re: bonded forces with variable numbers of particles