Page 1 of 1

CPU vs GPU running time test

Posted: Tue Sep 15, 2009 7:07 am
by jopd83
Hi, i have another question!

I installed Openmm-Gromacs in linux CentOS 5.x
And I tested CPU and GPU(NVIDIA) running time.

I expected GPU running time is shoter than CPU.
Then, the running time of CPU more shorter than GPU.

result is below
-----------------------------
CPU(./mdrun ~~) 128sec
GPU(./mdrun-openmm ~~) 235sec

I think that GPU's longer running time means CUDA operate anyway.

I wonde what has happened to GPU.
Why GPU have longer running time than CPU?
Somebody, help me, please!

RE: CPU vs GPU running time test

Posted: Tue Sep 15, 2009 3:11 pm
by peastman
Probably this means you are not using the GPU. When you run mdrun-openmm, it will print out either

OpenMM Platform: Cuda

which means you are using the GPU, or

OpenMM Platform: Reference

which means it is using the (slow) reference code. Can you confirm which of those is happening?

Peter

RE: CPU vs GPU running time test

Posted: Tue Sep 15, 2009 6:08 pm
by jopd83
Thanks, your reply!

When I run mdrun-openmm, I can see that

OpenMM Platform: Cuda
Cuda device: 0
CudaUseBlockingSync: true
starting mdrun 'LYSOZYME in water'
2500 steps, 5.0ps.


RE: CPU vs GPU running time test

Posted: Wed Sep 16, 2009 10:24 am
by peastman
Ok, so it really is using the GPU. That's good to know. Here are a few other questions that might shed light on what is happening.

What GPU and CPU do you have? The difference between a low end and high end GPU is *much* larger than the difference between a low end and high end CPU.

How many atoms does your system include? The GPU needs a lot of threads to get optimal performance, so small systems won't get the full benefit of it.

What coulombtype are you using?

Are you including implicit solvent? Standard Gromacs doesn't support that, so it will ignore the implicit_solvent parameter.

Peter

RE: CPU vs GPU running time test

Posted: Wed Sep 16, 2009 4:05 pm
by jackygrahamez
I got two problems one of them is the GPU is taking forever to run. The other is a pop-up at the end.

I'm testing this on windows. The CPU mdrun will run a 100ns simulation CPU time 66.1351
CPU runs fairly reliable on my BOINC project.

However when I try to run GPU with the changes I made to allow the BOINC manager to choose GPU device, it runs for about 1 hour to simulate 100ns.

Reason I cannot run GPU sucessfully with BOINC, at the end it briefly pops up with a windows dialogue box. This kills the BOINC workunits.

c:\ProgramData\BOINC\slots\2>mdrun_openmm.exe
:-) G R O M A C S (-:

Groningen Machine for Chemical Simulation

:-) VERSION 4.0.99_development_20090421 (-:


Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2008, The GROMACS development team,
check out http://www.gromacs.org for more information.

This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.

:-) mdrun_openmm.exe (-:

Option Filename Type Description
------------------------------------------------------------
-s topol.tpr Input Run input file: tpr tpb tpa
-o traj.trr Output Full precision trajectory: trr trj cpt
-x traj.xtc Output, Opt. Compressed trajectory (portable xdr format)
-cpi state.cpt Input, Opt. Checkpoint file
-cpo state.cpt Output, Opt. Checkpoint file
-c confout.gro Output Structure file: gro g96 pdb
-e ener.edr Output Energy file: edr ene
-g md.log Output Log file
-dgdl dgdl.xvg Output, Opt. xvgr/xmgr file
-field field.xvg Output, Opt. xvgr/xmgr file
-table table.xvg Input, Opt. xvgr/xmgr file
-tablep tablep.xvg Input, Opt. xvgr/xmgr file
-tableb table.xvg Input, Opt. xvgr/xmgr file
-rerun rerun.xtc Input, Opt. Trajectory: xtc trr trj gro g96 pdb cpt
-tpi tpi.xvg Output, Opt. xvgr/xmgr file
-tpid tpidist.xvg Output, Opt. xvgr/xmgr file
-ei sam.edi Input, Opt. ED sampling input
-eo sam.edo Output, Opt. ED sampling output
-j wham.gct Input, Opt. General coupling stuff
-jo bam.gct Output, Opt. General coupling stuff
-ffout gct.xvg Output, Opt. xvgr/xmgr file
-devout deviatie.xvg Output, Opt. xvgr/xmgr file
-runav runaver.xvg Output, Opt. xvgr/xmgr file
-px pullx.xvg Output, Opt. xvgr/xmgr file
-pf pullf.xvg Output, Opt. xvgr/xmgr file
-mtx nm.mtx Output, Opt. Hessian matrix
-dn dipole.ndx Output, Opt. Index file

Option Type Value Description
------------------------------------------------------
-[no]h bool no Print help info and quit
-nice int 19 Set the nicelevel
-deffnm string Set the default filename for all file options
-[no]xvgr bool yes Add specific codes (legends etc.) in the output
xvg files for the xmgrace program
-[no]pd bool no Use particle decompostion
-dd vector 0 0 0 Domain decomposition grid, 0 is optimize
-npme int -1 Number of separate nodes to be used for PME, -1
is guess
-ddorder enum interleave DD node order: interleave, pp_pme or cartesian
-[no]ddcheck bool yes Check for all bonded interactions with DD
-rdd real 0 The maximum distance for bonded interactions with
DD (nm), 0 is determine from initial coordinates
-rcon real 0 Maximum distance for P-LINCS (nm), 0 is estimate
-dlb enum auto Dynamic load balancing (with DD): auto, no or yes
-dds real 0.8 Minimum allowed dlb scaling of the DD cell size
-[no]sum bool yes Sum the energies at every step
-[no]v bool no Be loud and noisy
-[no]compact bool yes Write a compact log file
-[no]seppot bool no Write separate V and dVdl terms for each
interaction type and node to the log file(s)
-pforce real -1 Print all forces larger than this (kJ/mol nm)
-[no]reprod bool no Try to avoid optimizations that affect binary
reproducibility
-cpt real 15 Checkpoint interval (minutes)
-[no]append bool no Append to previous output files when continuing
from checkpoint
-[no]addpart bool yes Add the simulation part number to all output
files when continuing from checkpoint
-maxh real -1 Terminate after 0.99 times this time (hours)
-multi int 0 Do multiple simulations in parallel
-replex int 0 Attempt replica exchange every # steps
-reseed int -1 Seed for replica exchange, -1 is generate a seed
-[no]glas bool no Do glass simulation with special long range
corrections
-[no]ionize bool no Do a simulation including the effect of an X-Ray
bombardment on your system
--device int 0 Select GPU


Back Off! I just backed up md.log to ./#md.log.1#

-------------------------------------------------------
Program mdrun_openmm.exe, VERSION 4.0.99_development_20090421
Source code file: .\gmxfio.c, line: 736

Can not open file:
topol.tpr
-------------------------------------------------------

Thanx for Using GROMACS - Have a Nice Day


c:\ProgramData\BOINC\slots\2>mdrun_openmm.exe

c:\ProgramData\BOINC\slots\2>more job.xml
<job_desc>
<task>
<application>mdrun_openmm.exe</application>
<stdout_filename>md.log</stdout_filename>
<stderr_filename>md.log</stderr_filename>
<command_line> -v -deffnm md</command_line>
</task>
</job_desc>


c:\ProgramData\BOINC\slots\2>mdrun_openmm.exe -v -deffnm md
:-) G R O M A C S (-:

Groningen Machine for Chemical Simulation

:-) VERSION 4.0.99_development_20090421 (-:


Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2008, The GROMACS development team,
check out http://www.gromacs.org for more information.

This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.

:-) mdrun_openmm.exe (-:

Option Filename Type Description
------------------------------------------------------------
-s md.tpr Input Run input file: tpr tpb tpa
-o md.trr Output Full precision trajectory: trr trj cpt
-x md.xtc Output, Opt. Compressed trajectory (portable xdr format)
-cpi md.cpt Input, Opt. Checkpoint file
-cpo md.cpt Output, Opt. Checkpoint file
-c md.gro Output Structure file: gro g96 pdb
-e md.edr Output Energy file: edr ene
-g md.log Output Log file
-dgdl md.xvg Output, Opt. xvgr/xmgr file
-field md.xvg Output, Opt. xvgr/xmgr file
-table md.xvg Input, Opt. xvgr/xmgr file
-tablep md.xvg Input, Opt. xvgr/xmgr file
-tableb md.xvg Input, Opt. xvgr/xmgr file
-rerun md.trr Input, Opt. Trajectory: xtc trr trj gro g96 pdb cpt
-tpi md.xvg Output, Opt. xvgr/xmgr file
-tpid md.xvg Output, Opt. xvgr/xmgr file
-ei md.edi Input, Opt. ED sampling input
-eo md.edo Output, Opt. ED sampling output
-j md.gct Input, Opt. General coupling stuff
-jo md.gct Output, Opt. General coupling stuff
-ffout md.xvg Output, Opt. xvgr/xmgr file
-devout md.xvg Output, Opt. xvgr/xmgr file
-runav md.xvg Output, Opt. xvgr/xmgr file
-px md.xvg Output, Opt. xvgr/xmgr file
-pf md.xvg Output, Opt. xvgr/xmgr file
-mtx md.mtx Output, Opt. Hessian matrix
-dn md.ndx Output, Opt. Index file

Option Type Value Description
------------------------------------------------------
-[no]h bool no Print help info and quit
-nice int 19 Set the nicelevel
-deffnm string md Set the default filename for all file options
-[no]xvgr bool yes Add specific codes (legends etc.) in the output
xvg files for the xmgrace program
-[no]pd bool no Use particle decompostion
-dd vector 0 0 0 Domain decomposition grid, 0 is optimize
-npme int -1 Number of separate nodes to be used for PME, -1
is guess
-ddorder enum interleave DD node order: interleave, pp_pme or cartesian
-[no]ddcheck bool yes Check for all bonded interactions with DD
-rdd real 0 The maximum distance for bonded interactions with
DD (nm), 0 is determine from initial coordinates
-rcon real 0 Maximum distance for P-LINCS (nm), 0 is estimate
-dlb enum auto Dynamic load balancing (with DD): auto, no or yes
-dds real 0.8 Minimum allowed dlb scaling of the DD cell size
-[no]sum bool yes Sum the energies at every step
-[no]v bool yes Be loud and noisy
-[no]compact bool yes Write a compact log file
-[no]seppot bool no Write separate V and dVdl terms for each
interaction type and node to the log file(s)
-pforce real -1 Print all forces larger than this (kJ/mol nm)
-[no]reprod bool no Try to avoid optimizations that affect binary
reproducibility
-cpt real 15 Checkpoint interval (minutes)
-[no]append bool no Append to previous output files when continuing
from checkpoint
-[no]addpart bool yes Add the simulation part number to all output
files when continuing from checkpoint
-maxh real -1 Terminate after 0.99 times this time (hours)
-multi int 0 Do multiple simulations in parallel
-replex int 0 Attempt replica exchange every # steps
-reseed int -1 Seed for replica exchange, -1 is generate a seed
-[no]glas bool no Do glass simulation with special long range
corrections
-[no]ionize bool no Do a simulation including the effect of an X-Ray
bombardment on your system
--device int 0 Select GPU


Back Off! I just backed up md.log to ./#md.log.2#
Getting Loaded...
Reading file md.tpr, VERSION 4.0.5 (single precision)
Note: tpx file_version 58, software version 59
Loaded with Money


Back Off! I just backed up md.trr to ./#md.trr.4#

Back Off! I just backed up md.edr to ./#md.edr.4#
GPU: 0
OpenMM Platform: Reference
starting mdrun 'Protein in water'
50 steps, 0.1 ps.
Step = 1 , Time = 0.002 ps
Step = 2 , Time = 0.004 ps
Step = 3 , Time = 0.006 ps
Step = 4 , Time = 0.008 ps
Step = 5 , Time = 0.010 ps
Step = 6 , Time = 0.012 ps
Step = 7 , Time = 0.014 ps
Step = 8 , Time = 0.016 ps
Step = 9 , Time = 0.018 ps
Step = 10 , Time = 0.020 ps
Step = 11 , Time = 0.022 ps
Step = 12 , Time = 0.024 ps
Step = 13 , Time = 0.026 ps
Step = 14 , Time = 0.028 ps
Step = 15 , Time = 0.030 ps
Step = 16 , Time = 0.032 ps
Step = 17 , Time = 0.034 ps
Step = 18 , Time = 0.036 ps
Step = 19 , Time = 0.038 ps
Step = 20 , Time = 0.040 ps
Step = 21 , Time = 0.042 ps
Step = 22 , Time = 0.044 ps
Step = 23 , Time = 0.046 ps
Step = 24 , Time = 0.048 ps
Step = 25 , Time = 0.050 ps
Step = 26 , Time = 0.052 ps
Step = 27 , Time = 0.054 ps
Step = 28 , Time = 0.056 ps
Step = 29 , Time = 0.058 ps
Step = 30 , Time = 0.060 ps
Step = 31 , Time = 0.062 ps
Step = 32 , Time = 0.064 ps
Step = 33 , Time = 0.066 ps
Step = 34 , Time = 0.068 ps
Step = 35 , Time = 0.070 ps
Step = 36 , Time = 0.072 ps
Step = 37 , Time = 0.074 ps
Step = 38 , Time = 0.076 ps
Step = 39 , Time = 0.078 ps
Step = 40 , Time = 0.080 ps
Step = 41 , Time = 0.082 ps
Step = 42 , Time = 0.084 ps
Step = 43 , Time = 0.086 ps
Step = 44 , Time = 0.088 ps
Step = 45 , Time = 0.090 ps
Step = 46 , Time = 0.092 ps
Step = 47 , Time = 0.094 ps
Step = 48 , Time = 0.096 ps
Step = 49 , Time = 0.098 ps
Step = 50 , Time = 0.100 ps

c:\ProgramData\BOINC\slots\2>

RE: CPU vs GPU running time test

Posted: Wed Sep 16, 2009 4:17 pm
by peastman
Hi Jack,

It's slow because you're not using the GPU. You can tell that from the line in the output:

OpenMM Platform: Reference

Make sure you follow all the directions in the readme file, particularly the ones about setting your PATH and OPENMM_PLUGIN_DIR environment variables correctly. Also, you mentioned a problem with a dialog box popping up, but you didn't say anything about what the dialog box said.

Peter

RE: CPU vs GPU running time test

Posted: Wed Sep 16, 2009 6:34 pm
by jackygrahamez
My intent is to place all the dependencies in the current working directory. I just did this
set OPENMM_PLUGIN_DIR=.
set PATH=%PATH%:.
copied the gmxlib files to the Current working directory
set GMXLIB=.

A pop-up message closes very quickly and it fails to execute properly
mdrun_openmm.exe has stoped working
windows is checking for solution to this problem
btw I am using windows 7 if that makes difference.

RE: CPU vs GPU running time test

Posted: Wed Sep 16, 2009 9:02 pm
by jopd83
I have Nvidia GTX260 graphic card and
AMD Phenom(tm) 9850 Processor 2.51GHz CPU.

My test protein have 129 Amino Acids, 1321 atom(heavy atom + hydrogen atom).

I use coulomb type 'cut-off'.

And I include not implicit but explicit water.
(./editconf -f minimized.gro -o minimized_box.gro
-d 0.75 -bt cubic

./genbox -cp minimized_box.gro -cs spc216.gro
-o minimized_water.gro -p aki.top)


I'm running to test sample in http://md.chem.rug.nl/education/mdcourse/MDpract.html.

RE: CPU vs GPU running time test

Posted: Thu Sep 17, 2009 3:43 am
by jackygrahamez
I include water as well and the protein has about 3844. Its complexed with a ligand. I should have mentioned I'm test with GTS 250

RE: CPU vs GPU running time test

Posted: Thu Sep 17, 2009 5:13 am
by jackygrahamez
3844 atoms sorry did not get my coffee yet.