Share 
Follow 
AboutDownloadsDocumentsForumsWikiIssuesNews
Date:
2009-08-08 00:22
Priority:
3
State:
Closed
Submitted by:
Siddharth Srinivasan (sshrinivasan)
Assigned to:
Nobody (None)
Resolution:
none
Summary:
Large system size causes segfault

Detailed description
I tried running a 180k atom system and get a segfault, of the nature
{{{
Error: invalid configuration argument launching kernel kFindBlocksWithInteractionsPeriodic
}}}
While I realize that 80k was about the maximum tested and recommended, I was able to run 100k systems once in a while without segfaulting. This is pretty arbitrary, and a FATAL type descriptive error stating that the maximum supported system size was exceeded would be very useful.

Add A Comment: Notepad

Message  ↓
Date: 2009-08-14 17:52
Sender: Siddharth Srinivasan

Just a comment, I patched P3 with the changes recommended,
and still get the same segfault in a 180k atom system,
though the number of GPU interaction blocks gets set to 240.
I should add that even for a 22k atom system, this maximum
limit is reached. If there is anything else that you think I
could do, let me know, otherwise I'll just wait for the P4
release and see if that solves the issue.

Date: 2009-08-10 21:36
Sender: Peter Eastman

The changes are quite simple, if you want to integrate them yourself. In gpu.cpp, find the function gpuBuildThreadBlockWorkList(). About half way through, you'll find a line that sets the value of gpu->sim.interaction_blocks. Immediately after that line, add the following:

if (gpu->sim.interaction_blocks > 8*gpu->sim.blocks)
gpu->sim.interaction_blocks = 8*gpu->sim.blocks;

Next go to kFindInteractingBlocks.h. In the middle of that file (line 83 in the current code, but it might be slightly different in PR3) is an if block:

if (pos < cSim.workUnits)
{
...
}

Change it to a while loop:

while (pos < cSim.workUnits)
{
...
pos += gridDim.x*blockDim.x;
}

Date: 2009-08-10 21:26
Sender: Siddharth Srinivasan

Thanks Peter. Would it be possible to get me a patch for
this for the P3 release? I ask because I think this segfault
is causing the GPU to stop responding to further OpenMM
simulations, I get "launch failure" errors after the initial
segfault. I'm hoping that this fix will allow me to proceed
without rebooting. I understand if its complicated and I
should wait for the next release.

Date: 2009-08-10 21:15
Sender: Peter Eastman

I've checked in a fix for this, so it should work correctly in the next release.

Field Old Value Date By
close_date2009-08-14 17:522009-08-14 17:52sshrinivasan
close_date2009-08-10 21:362009-08-10 21:36peastman
close_date2009-08-10 21:262009-08-10 21:26sshrinivasan
status_idOpen2009-08-10 21:15peastman
close_date2009-08-10 21:152009-08-10 21:15peastman
Feedback