AboutDownloadsDocumentsForumsWikiIssuesNews
Date:
2012-06-24 22:53
Priority:
3
State:
Open
Submitted by:
John Chodera (jchodera)
Assigned to:
Peter Eastman (peastman)
Resolution:
None
Summary:
Memory leak in OpenMM4.1-Linux64 release for GTX 580, OpenCL platform

Detailed description
Peter,

Levi Naden has run into a GPU memory issue when using the OpenCL platform of the OpenMM4.1-Linux64 [May 4, 2012] binary release to create and cache multiple Context objects on a machine with two GTX 580s installed.

See the attached archive with two test scripts. The first [context-cache-test.py] simply tries to create and cache 24 Context objects for an implicit solvent system loaded via the OpenMM app AMBER prmtop loader. The second [context-cache-test-exceptions.py] uses the same scheme to try to create 500 copies, catching when a failure occurs.

On my Mac Pro with OpenMM r3380 with a GTX 280 with 1 GB RAM, running the first test succeeds while the second catches an exception after creating and caching 138 Context objects. On Levi's machine (specs and output of "nvidia-smi -q" and "deviceQuery" attached) he has 1.5 GB RAM, but these scripts die after only 22 Context objects have been created and cached.

John

--
GPU Hardware:

3.3Ghz Quad Core i5 processor
4GB DDR3 RAM
2x NVIDIA GTX 580 (1.5 GB ram each)

Software:

CentOS 6.2, 64-bit

NVIDIA driver 258.05.33
CUDA Toolkit 4.2.9
CUDA SDK 4.2.9

netcdf 4.1.3
netCDF4 1.0 (python)
OpenMPI 1.6
mpi4py 1.3

Python 2.6
NumPy 1.6.1
SciPy 0.10.1

OpenMM 4.1
--

./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Found 2 CUDA Capable device(s)

Device 0: "GeForce GTX 580"
CUDA Driver Version / Runtime Version 4.1 / 4.1
CUDA Capability Major/Minor version number: 2.0
Total amount of global memory: 1535 MBytes (1609760768 bytes)
(16) Multiprocessors x (32) CUDA Cores/MP: 512 CUDA Cores
GPU Clock Speed: 1.54 GHz
Memory Clock rate: 2004.00 Mhz
Memory Bus Width: 384-bit
L2 Cache Size: 786432 bytes
Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: No
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 1: "GeForce GTX 580"
CUDA Driver Version / Runtime Version 4.1 / 4.1
CUDA Capability Major/Minor version number: 2.0
Total amount of global memory: 1536 MBytes (1610285056 bytes)
(16) Multiprocessors x (32) CUDA Cores/MP: 512 CUDA Cores
GPU Clock Speed: 1.54 GHz
Memory Clock rate: 2004.00 Mhz
Memory Bus Width: 384-bit
L2 Cache Size: 786432 bytes
Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: No
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 2 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.1, CUDA Runtime Version = 4.1, NumDevs = 2, Device = GeForce GTX 580, Device = GeForce GTX 580

--

==============NVSMI LOG==============

Timestamp : Sun Jun 24 17:36:37 2012

Driver Version : 285.05.33

Attached GPUs : 2

GPU 0000:01:00.0
Product Name : GeForce GTX 580
Display Mode : N/A
Persistence Mode : Disabled
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : N/A
VBIOS Version : 70.10.60.00.82
Inforom Version
OEM Object : N/A
ECC Object : N/A
Power Management Object : N/A
PCI
Bus : 0x01
Device : 0x00
Domain : 0x0000
Device Id : 0x108010DE
Bus Id : 0000:01:00.0
Sub System Id : 0x15803842
GPU Link Info
PCIe Generation
Max : 2
Current : 2
Link Width
Max : 16x
Current : 8x
Fan Speed : 49 %
Performance State : N/A
Memory Usage
Total : 1535 MB
Used : 188 MB
Free : 1346 MB
Compute Mode : Default
Utilization
Gpu : N/A
Memory : N/A
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Temperature
Gpu : 61 C
Power Readings
Power Management : N/A
Power Draw : N/A
Power Limit : N/A
Clocks
Graphics : N/A
SM : N/A
Memory : N/A
Max Clocks
Graphics : N/A
SM : N/A
Memory : N/A
Compute Processes : Not Supported

GPU 0000:02:00.0
Product Name : GeForce GTX 580
Display Mode : N/A
Persistence Mode : Disabled
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : N/A
VBIOS Version : 70.10.49.00.80
Inforom Version
OEM Object : N/A
ECC Object : N/A
Power Management Object : N/A
PCI
Bus : 0x02
Device : 0x00
Domain : 0x0000
Device Id : 0x108010DE
Bus Id : 0000:02:00.0
Sub System Id : 0x15803842
GPU Link Info
PCIe Generation
Max : 2
Current : 2
Link Width
Max : 16x
Current : 8x
Fan Speed : 45 %
Performance State : N/A
Memory Usage
Total : 1535 MB
Used : 78 MB
Free : 1457 MB
Compute Mode : Default
Utilization
Gpu : N/A
Memory : N/A
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Temperature
Gpu : 54 C
Power Readings
Power Management : N/A
Power Draw : N/A
Power Limit : N/A
Clocks
Graphics : N/A
SM : N/A
Memory : N/A
Max Clocks
Graphics : N/A
SM : N/A
Memory : N/A
Compute Processes : Not Supported

Add A Comment: Notepad

Comments:

Message  ↓
Date: 2012-06-25 18:28
Sender: John Chodera

Here is a follow-up from Levi:

--

Following that up, I built OpenMM from the source files and ran the
tests, which all came back positive; I have attached that log file as
well. I installed from this source build and tested the systems again,
with the exact same errors.

Attached Files:

Attachments:
Size Name Date By Download
317.01 KBcontext-cache-test-new.tgz2012-06-24 22:53jchoderacontext-cache-test-new.tgz
21.99 KBopenmm_src_maketest_output.txt2012-06-25 18:28jchoderaopenmm_src_maketest_output.txt

Changes

Field Old Value Date By
File Added464: openmm_src_maketest_output.txt2012-06-25 18:28jchodera
File Added463: context-cache-test-new.tgz2012-06-24 22:53jchodera
Feedback