Issue post-processing - svpost

Provides a system for patient-specific cardiovascular modeling and simulation.
User avatar
Gaetano Aran
Posts: 4
Joined: Sat Nov 24, 2018 12:29 pm

Issue post-processing - svpost

Post by Gaetano Aran » Tue Oct 15, 2019 12:20 am

Hi,

Running a simulation with openmpi in a cluster with 288 cores, post-processing stops at a certain timestep and it doesn't create results.
I tried to run the post-processing, through command line, on another computer and the problem is the same, even if it goes a little further.

Thanks,
Gaetano

User avatar
David Parker
Posts: 1634
Joined: Tue Aug 23, 2005 2:43 pm

Re: Issue post-processing - svpost

Post by David Parker » Thu Oct 17, 2019 9:31 pm

Hi Gaetano,

It might be that svpost is running out of memory. Try using a larger increment (-inc 100) and don't use the -vtkcombo flag if you are.

When not using the -vtkcombo flag svpost should not be allocating more memory when processing the next time step, this is a bug.

Cheers,
Dave

User avatar
Gaetano Aran
Posts: 4
Joined: Sat Nov 24, 2018 12:29 pm

Re: Issue post-processing - svpost

Post by Gaetano Aran » Fri Oct 18, 2019 3:29 am

Hi Dave,
i followed your advices, both of them: it doesn't work yet.
I checked on two different clusters.

Cheers, Gaetano

User avatar
David Parker
Posts: 1634
Joined: Tue Aug 23, 2005 2:43 pm

Re: Issue post-processing - svpost

Post by David Parker » Fri Oct 18, 2019 8:26 am

Hi Gaetano,

Can you post the command you are using and the output of svpost? Are you running CentOS on the cluster or something else.

Cheers,
Dave

User avatar
Gaetano Aran
Posts: 4
Joined: Sat Nov 24, 2018 12:29 pm

Re: Issue post-processing - svpost

Post by Gaetano Aran » Mon Oct 21, 2019 3:06 am

Dear Dave,
thank you for the response.
The cluster i am working has the following charateristics: CentOs 7.4, 288 cores, the core has been compiled with GCC 6.1.0 and openmpi 3.1.1.
The command i am running is as follows: $ svpost.exe -all -indir /288-procs_case -outdir /export -start 2000 -stop 3000 -incr 100 -vtp 018_d -vtu 018_d
This creates output files from the timestep 2000 to 2600 and then i receive the segmentation fault error.

This is an example of what I receive:

Reducing (vin plane traction) results : /gpfs/scratch/userexternal/garan000/018_d/288-procs_case/restart.2700.205
Done reading (vin plane traction) results : /gpfs/scratch/userexternal/garan000/018_d/288-procs_case/restart.2700.205
Reducing (vin plane traction) results : /gpfs/scratch/userexternal/garan000/018_d/288-procs_case/restart.2700.206
Done reading (vin plane traction) results : /gpfs/scratch/userexternal/garan000/018_d/288-procs_case/restart.2700.206
Reducing (vin plane traction) results : /gpfs/scratch/userexternal/garan000/018_d/288-procs_case/restart.2700.207
Segmentation fault (core dumped)

Cheers,
Gaetano

User avatar
David Parker
Posts: 1634
Joined: Tue Aug 23, 2005 2:43 pm

Re: Issue post-processing - svpost

Post by David Parker » Mon Oct 21, 2019 11:45 am

Hi Gaetano,

I've looked in the svpost source to see what might be going on (e.g. a hard-coded value for the maximum number of processors) but did not see anything suspicious.

I've create a new executable that you can download from here https://github.com/ktbolt/cardiovascula ... ter/svpost. I built this on Ubuntu but I think it might work on CentOS. If not then you can download the svSover source from here https://github.com/ktbolt/svSolver/tree ... -exception and build.

Cheers,
Dave

User avatar
Gaetano Aran
Posts: 4
Joined: Sat Nov 24, 2018 12:29 pm

Re: Issue post-processing - svpost

Post by Gaetano Aran » Wed Oct 23, 2019 5:03 am

Hi,
I tried to compile what you sent me, but i have always the same error.

Cheers,
Gaetano

User avatar
David Parker
Posts: 1634
Joined: Tue Aug 23, 2005 2:43 pm

Re: Issue post-processing - svpost

Post by David Parker » Wed Oct 23, 2019 10:04 am

Hi Gaetano,

I've added some print statements to svpost on https://github.com/ktbolt/svSolver/tree ... -exception, download the source and rebuild svpost. Run it and send me all the output.

Can you also try to just export a single results file? Please send me all the output from svpost, I need to see which geombc.dat files are being scanned.

Another thing to do is to enable dumping a core file using ulimit -c unlimited. You can then use gdb to see where the segfault is occurring using gdb svpost.exe -c core.

Cheers,
Dave

User avatar
Rodrigo Romarowski
Posts: 16
Joined: Sat Apr 15, 2017 10:06 am

Re: Issue post-processing - svpost

Post by Rodrigo Romarowski » Thu Oct 24, 2019 1:50 am

Dear Dave,

Thanks a lot for your help. I am working with Gaetano to solve this issue. We have compiled the new branch and hereby I attach the terminal outputs for each of the tests you asked. We performed the following in a Debian cluster with the same set of files (in any case, everything compiles perfectly):

log: svpost.exe -all -indir /home/garan/028_d/288-procs_case/ -outdir . -start 2000 -stop 3000 -incr 100 -vtp 028_d -vtu 028_d > log &
log2: svpost.exe -all -indir /home/garan/028_d/288-procs_case/ -outdir . -start 2600 -stop 3000 -incr 100 -vtp 028_d -vtu 028_d > log2 &
log3: svpost.exe -all -indir /home/garan/028_d/288-procs_case/ -outdir . -sn 2600 -ph > log3 &
log4: svpost.exe -all -indir /home/garan/028_d/288-procs_case/ -outdir . -sn 2300 -ph > log4 &

The fourth command is the only one that completes successfully. I am not familiar with debuging with gdb, could you please send me some more information to make the required test?

Thanks (again),

Rodrigo
Attachments
logfiles.zip
four log files from svpost
(175.42 KiB) Downloaded 45 times

User avatar
David Parker
Posts: 1634
Joined: Tue Aug 23, 2005 2:43 pm

Re: Issue post-processing - svpost

Post by David Parker » Thu Oct 24, 2019 4:33 pm

Hi Rodrigo,

I don't see anything wrong from the scripts. The translation process to vtk results files will take up several GB of memory but the -sn -ph options should just take a few MB.

When you set ulimit -c unlimited the svpost program will create a file called core someplace, usually in the directory that you ran the svpost command in.

You can use the unix debugger gdb to examine the core file to see where the segfault occurs. Using the command

gdb svpost.exe -c core

brings you into gdb and should list the program stack and showing where the segfault occurred.

The only other thing I can think off is that the compute node you are running svpost on has restrictions for the amount of memory a process can use. Typing in ulimit -a will show you the limits for a process.

Sorry this is taking so long to figure out!

Cheers,
Dave

POST REPLY