Page 1 of 2

Question about qdata.txt file and force matching

Posted: Fri Dec 14, 2012 12:34 pm
by nluque
Dear Lee Ping,

I got another error.

For the first try I place 2 snapshot in all.gro and qdata.txt
I might say that I am using nwchem in order to get the energies and gradients.

The forcebalance stop in the Newton-Raphson

^M^Mfdwrap: callM [45] = 1.0e-03 ^M^Mfdwrap: callM [46] = -1.0e-03
^M^Mfdwrap: callM [46] = 1.0e-03 ^M^MIncrementing quantities for snapshot 0^MDone with snapshots, bu
ilding objective function now^MTarget: first Errors: Energy = 25185.1341 kJ/mol (10292.3916%) Atomic Force (%) = nan Objective = nan

######################################################################
###ESC[94m Objective Function Breakdown ESC[0m###
###ESC[94m Target Name Residual x Weight = Contribution ESC[0m###
######################################################################
first nan 1.000 ESC[94m nanESC[0m
Regularization 0.00000 1.000 ESC[94m 0.00000e+00ESC[0m
Total ESC[94m nanESC[0m
----------------------------------------------------------------------
Step |k| |dk| |grad| -=X2=- Stdev(X2) StepQual
0 0.000e+00 0.000e+00 nanESC[1m nanESC[0m nan 1.000


Could it be that only 2 snapshot are not sufficient for the fitting ?
Thanks a lot for your help,
Best regards,
Noelia

Re: nan error

Posted: Fri Dec 14, 2012 2:46 pm
by leeping
Hi Noelia,

From looking at your output file, I'm seeing the following line:

Target: first Errors: Energy = 25185.1341 kJ/mol (10292.3916%) Atomic Force (%) = nan Objective = nan

This is the line that is printed out when it computes the MM energies and compares to QM.

The reason ForceBalance is crashing is because the error in the force is nan. Are you planning on fitting the forces in this project or just the energies? If you're planning on fitting the energies only, did you put "zero" forces into qdata.txt? If you paste your qdata.txt file into a reply, I would be able to learn more.

The second thing I'm noticing is that your MM energies are very different from the QM - it says the energy error is 25,000 kJ/mol. Usually an error of >100% is not good news because that means the force field is doing something qualitatively wrong. Perhaps you have two atoms that have a Lennard-Jones repulsive term that should be excluded? (This should not be related to the crash but might affect your result.)

Thanks,

- Lee-Ping

Question about qdata.txt file and force matching

Posted: Mon Dec 17, 2012 10:59 am
by nluque
Dear Lee-Ping,

thanks a lot for your help !!
I have solved part of the problems, but some of them are still open.
In particular, this particular error had to do with the fact that by using nwchem and qchem, the molecule was re-oriented to a standard orientation and the all.gro and qdata.txt were not matching. After fixing this, the ForceBalance calculation run smoothly but I have the impression that somehow the format of the qdata.txt is faulty. For some reasons, I have the impression that the forces are not read well or not read at all. Is there any particular format I should use to write them ?
I am using my own script to get these values from qchem and nwchem output.

For simplicity, I send you the .itp, the qdata.txt, all.gro and .in .
Probably in this way would be easier to optimize the force field !

Thanks again !

Question about qdata.txt file and force matching

Posted: Mon Dec 17, 2012 11:00 am
by nluque
I add the all.gro file here

Re: nan error

Posted: Tue Dec 18, 2012 12:40 am
by leeping
Hi Noelia,

You are correct that the QM code should not reorient the molecule because that scrambles the forces. I should have mentioned that in the documentation! I will make sure to write a chapter on generating QM data over the holiday.

Your script for writing qdata.txt appears to be correct. A few notes here for posterity:

- The coordinates are never used; they are just for reference.
- Only relative energies are being fitted - that is to say, the absolute QM energy doesn't matter, only the energy differences between snapshots.
- The "forces" in qdata.txt are actually gradients. Thus you should be printing out the gradients from the QM code rather than multiplying by -1 to get the forces. I think you got the sign correct though. The unit to use for the force is Hartree/bohr (this is the Q-Chem native unit).

There was a mistake in all.gro. The coordinates in all.gro should be in nanometers rather than in angstroms, and so your molecule was 10 times too big. I used the "molecule" module in forcebalance to resize it using the Python prompt like this:

In [1]: from forcebalance.molecule import *
In [2]: M = Molecule('all.gro')
In [3]: for i in M.xyzs: i /= 10
In [4]: M.write('all1.gro')

Then replace the original all.gro with the new all1.gro.

If you're familiar with Python, try playing with "molecule.py" - it's a file format conversion library within ForceBalance that I wrote. It's not as fully featured as OpenBabel or OpenEye but for my projects it's easier to use.

I noticed the following things in ti_5h2o.itp :

1) Your force constants for the angles are very big - some of them are too big by a factor of 100. Where did you get them from? If I change the starting values to numbers on the order of 100, things are more reasonable.

2) You specified parameters in the "bondtypes" and "angletypes" sections, but also in the "bonds" and "angles" sections. The parameters in the latter sections will override the former ones, so I deleted the parameter values from the "bonds" and "angles" section.

3) In the future it would be best to add more snapshots, so that you have more snapshots than parameters.

After I performed the fit here's what I found (see end of message for attached calculation and images - QM forces are in blue and MM forces are in gold):

It looks like your third snapshot is highly compressed and that is contributing a lot to the objective function. Since the forces in this snapshot are so much bigger than in the other two, ForceBalance will mainly try to fit this one (it is a least squares optimization). The forces in the second snapshot are not well reproduced. Also, some of the optimized parameter values are negative, so we are probably overfitting.

My suggestion at this point is:

1) Generate more data and make sure the energies and forces are roughly at the same order of magnitude. That is to say, avoid having one snapshot where the energy is much higher than the rest, or where the forces are many times bigger than the rest.

2) Examine your functional form carefully. Your system is a transition metal complex and it's nontrivial to design force fields for this type of system. What will the ion be doing in your system? Hopefully it won't dissociate because the force field won't accommodate any chemistry. :)

There is some literature precedent for transition metal force fields but not very much - I'd be interested to see what you find. I also have a friend who has had some success designing metal complex functional forms and I could put the two of you in touch.
noelia1.tar.gz
(236.39 KiB) Downloaded 78 times
Image
Image
Image

Note:

These images can be generated by running a command like this: vmd -e drawforces.vmd -args all.gro QMforce.xyz MMforce.xyz

where drawforces.vmd is in the ForceBalance source distribution (under bin/Extras/drawforces), and [QM]Mforce.xyz are found in the "temp" directory where the optimization is being oerformed.

Hope this helps.

- Lee-Ping

Re: Question about qdata.txt file and force matching

Posted: Wed Dec 19, 2012 4:50 pm
by nluque
Dear Lee-Ping,

sorry for the delay in answering, but I wanted first to carry on the calculations you have suggested.
First of all, let me thank you for your help. It is very appreciated and your prompt replies let me progress faster than I have thought. I am also glad that you could use my "errors" to update the documentation. I will try to make a post later to suggest something that could improve the user experience.
Please, also consider that I am totally new to molecular mechanics, so I am learning ForceBalance and Gromacs at the same time and the probability to make mistakes is higher than expected.

I will answer as best as I can to your opinions/suggestions:

1. Yes, I have used the gradient and not the forces, therefore I havent multiplied them by -1 !
2. I did not realize that the all.gro was in nanometers. I actually used the molecule.py to correct those and was very easy fix .
3. The angle force constants are actually big because I was using a small program I have written to make an estimate of the constants based on Badger's rules. The same rules that have been used in the UFF program. Unfortunately, in the unit conversion I implemented there was a factor 100 that should not be there and therefore I made the mistake.
4. The system I would like to simulate is Ti4+ in water. We would like to try several options, like the bare titanium in water, but also a "first shell" of water with Ti as a single residue and the rest of the solvent around it. We would like to see how a different partitioning affect the properties of the system.

In the last couple of days what I did was to make a short MD simulation (btw is there any requirement about the type of simulations that are more suitable to fit the data? periodic boundary conditons, NVT, NPT or NVE ensambles ?) with the initial force field and I have extracted 100 snapshots. I used the gradients and energies to generate a new force-field running ForceBalance. I have noticed that some of these snapshots have an energy much higher than the reference energy. Consequently, some of the forces are much higher than the reference and may lead to a wrong fit.
My first question is : is there a easy way to check whether a snapshot could be discarded ?
For example, I have thought that some of the snapshots computed with QM could be discarded based on an energy threshold, and select only those configurations that have energies that are somewhat closer to the reference. Or would it be better to make a threshold selection based on atomic gradient ? Is this a crazy idea ?

Another question that I raise is if there is a minimum number of snapshots from an MD simulaton (100-1000 ??) that should be run to obtain a good fit.

Thanks again for your help !

Noelia

Re: Question about qdata.txt file and force matching

Posted: Wed Dec 19, 2012 8:39 pm
by leeping
Hi Noelia,

I'm glad these responses have been helpful for you. I started in research much the same way, with zero experience in quantum chemistry or molecular dynamics. Now I'm new to the experience of making my software formally available to the community. Any feedback you have would be very helpful. :)

Responses to questions:
Is there any requirement about the type of simulations that are more suitable to fit the data? periodic boundary conditons, NVT, NPT or NVE ensambles ?
The choice of sampling ensemble has an impact on the final parameters. I suggest that you perform the sampling simulation using the same ensemble as your eventual "production" simulations. Constant temperature is probably best because it prevents things like energy drift or inadvertently having lots of energy from your initial conditions.

If you are extracting only the molecule you're parameterizing (i.e. the complex), then it wouldn't really matter whether you are running constant pressure or constant volume simulations. I would still go with constant pressure because it would ensure that your system is fluctuating around a reasonable average volume - and if you create a parameter set that has different interactions with the solvent, the box volume might change.
My first question is : is there a easy way to check whether a snapshot could be discarded ? For example, I have thought that some of the snapshots computed with QM could be discarded based on an energy threshold, and select only those configurations that have energies that are somewhat closer to the reference. Or would it be better to make a threshold selection based on atomic gradient ? Is this a crazy idea ?
There's no rigorous way I know about to decide whether to keep or discard a snapshot. I would advise against using the size of the error (force field - reference) as a criterion. Since the QM potential surface is what we would like to reproduce, I think it's better to look at the reference quantities themselves. Very high energies and forces from QM are the most difficult to fit, but they also tell the force field where it shouldn't go. For this reason it is useful to have a reasonably high energy or force cutoff.

If you use the QM energy as a criterion, you need to actually know the minimum QM energy. Then you could discard snapshots that are higher than the minimum by some number ; 20-30 kcal/mol is a fairly often used number for dihedral energy scans, but again there's no rigorous basis for this. If your snapshots are thermally sampled, you should probably normalize using the number of atoms - something like 100 kJ / mol / atom.

A threshold selection based on the gradient is a pretty good idea actually, but I've never tried it. From looking at typical covalent bonds, the force experienced by an atom at ambient conditions is about 1000 kJ/mol/nm (Gromacs unit system). Conversion to the Q-Chem or Gaussian unit system requires dividing by about 50000, so ~0.02 Hartree / bohr. Thus, you can quickly try to filter out snapshots where any component of the QM force is 0.1 Hartree/bohr or larger.

To be a bit more careful, you should know the approximate magnitude of the gradient that one is likely to encounter during MD simulations, and then you set the cutoff based on that amount. You can get this number by saving the force during your MD simulation by setting the "nstfout" variable in the mdp, and then using "g_traj" to print out the forces to a file. The norm of the force vector can be used to inform your cutoff; it will scale with the square root of number of components.

Try a few different force cutoffs and let me know what you find - I'm interested to hear the result. If you find it to be helpful, I could add automatic cutoffs into ForceBalance as an option. Keep me posted.
Another question that I raise is if there is a minimum number of snapshots from an MD simulaton (100-1000 ??) that should be run to obtain a good fit.
The number of snapshots should be much bigger than the number of parameters. For your system (50 parameters) I would suggest about 5000 snapshots. Also, in order to make sure that your snapshots aren't all hanging out in the same free energy basin, you can save a much larger number of snapshots (say 500,000), then use k-centers clustering or a similar algorithm to get 5000 snapshots that are more evenly distributed in the configuration space. Incidentally, k-centers clustering on Gromacs trajectories is implemented in MSMBuilder, a software package being developed in the Pande lab (https://simtk.org/home/msmbuilder) .

If you do clustering then you're no longer sampling from a Boltzmann ensemble, but in practice it seems to work better. You can combine this with the cutoff strategy so that you are still likely to sample rare events like barrier crossings while still trying to filter out the unreasonable (and "unfittable") snapshots.

Thanks,

- Lee-Ping

Re: Question about qdata.txt file and force matching

Posted: Tue Feb 19, 2013 9:23 am
by nluque
Hi Lee-Ping,

So far so good, I am getting some results, but still not what I expected, I tried to plot the forces as you suggested, but without succeed.
I used your script drawforces.vmd and copy the QMforce.xyz and MMforce.xyz from the temp directory.
I used both the all.gro and the coord.xyz files, but I only get the atoms in the gro file and none of the forces.
I looked up in internet, but it seems not that easy to draw the forces with vmd.
I do not get what I am doing wrong.
Should I do something else after the command you wrote?
It seems to read correctly the gro file and force file but I get this error:
can't set "vmd_frame(2)": invalid command name "min"
I am using vmd 1.9.1 should I use a different version ?

Thanks a lot for your help!
Best regards,
Noelia

Re: Question about qdata.txt file and force matching

Posted: Wed Feb 20, 2013 7:00 am
by leeping
Hi Noelia,

Drawing forces isn't the easiest task in VMD but the script "drawforces.vmd" should take care of it. The script should be located in the bin/Extras/drawforces under the ForceBalance source directory. Did you use it when trying to draw the forces?

If you don't have the script, then you might want to update your code to the latest SVN revision, or you can get it here: https://simtk.org/websvn/wsvn/forcebala ... forces.vmd

Then you need to run the command:

vmd -e drawforces.vmd -args all.gro QMforce.xyz MMforce.xyz

The script assumes that all three files have the same number of frames and the same number of atoms.

If this doesn't work, then there might be something wrong with drawforces.vmd or how it's being called. If so, please let me know and attach the files you're using.

- Lee-Ping

Drawing forces

Posted: Wed Feb 20, 2013 9:09 am
by nluque
Hi Lee-Ping,

thanks for the answer.
what you said, it is what I did. That is why I think it might be my vmd version, I don't know!
Now I came back to the old file you sent me, (you sent once a noelia1.tar.gz file) just in case, but even in this case I couldn't see the arrows.
I attached those three files plus the drawforces.vmd that i took from your scripts.
I run it :

vmd -e drawforces.vmd -args all.gro QMforce.xyz MMforce.xyz

Thanks a lot for your help!
Best regards,
noelia