Page 1 of 2

Issue post-processing - svpost

Posted: Tue Oct 15, 2019 12:20 am
by gaetano.a90
Hi,

Running a simulation with openmpi in a cluster with 288 cores, post-processing stops at a certain timestep and it doesn't create results.
I tried to run the post-processing, through command line, on another computer and the problem is the same, even if it goes a little further.

Thanks,
Gaetano

Re: Issue post-processing - svpost

Posted: Thu Oct 17, 2019 9:31 pm
by davep
Hi Gaetano,

It might be that svpost is running out of memory. Try using a larger increment (-inc 100) and don't use the -vtkcombo flag if you are.

When not using the -vtkcombo flag svpost should not be allocating more memory when processing the next time step, this is a bug.

Cheers,
Dave

Re: Issue post-processing - svpost

Posted: Fri Oct 18, 2019 3:29 am
by gaetano.a90
Hi Dave,
i followed your advices, both of them: it doesn't work yet.
I checked on two different clusters.

Cheers, Gaetano

Re: Issue post-processing - svpost

Posted: Fri Oct 18, 2019 8:26 am
by davep
Hi Gaetano,

Can you post the command you are using and the output of svpost? Are you running CentOS on the cluster or something else.

Cheers,
Dave

Re: Issue post-processing - svpost

Posted: Mon Oct 21, 2019 3:06 am
by gaetano.a90
Dear Dave,
thank you for the response.
The cluster i am working has the following charateristics: CentOs 7.4, 288 cores, the core has been compiled with GCC 6.1.0 and openmpi 3.1.1.
The command i am running is as follows: $ svpost.exe -all -indir /288-procs_case -outdir /export -start 2000 -stop 3000 -incr 100 -vtp 018_d -vtu 018_d
This creates output files from the timestep 2000 to 2600 and then i receive the segmentation fault error.

This is an example of what I receive:

Reducing (vin plane traction) results : /gpfs/scratch/userexternal/garan000/018_d/288-procs_case/restart.2700.205
Done reading (vin plane traction) results : /gpfs/scratch/userexternal/garan000/018_d/288-procs_case/restart.2700.205
Reducing (vin plane traction) results : /gpfs/scratch/userexternal/garan000/018_d/288-procs_case/restart.2700.206
Done reading (vin plane traction) results : /gpfs/scratch/userexternal/garan000/018_d/288-procs_case/restart.2700.206
Reducing (vin plane traction) results : /gpfs/scratch/userexternal/garan000/018_d/288-procs_case/restart.2700.207
Segmentation fault (core dumped)

Cheers,
Gaetano

Re: Issue post-processing - svpost

Posted: Mon Oct 21, 2019 11:45 am
by davep
Hi Gaetano,

I've looked in the svpost source to see what might be going on (e.g. a hard-coded value for the maximum number of processors) but did not see anything suspicious.

I've create a new executable that you can download from here https://github.com/ktbolt/cardiovascula ... ter/svpost. I built this on Ubuntu but I think it might work on CentOS. If not then you can download the svSover source from here https://github.com/ktbolt/svSolver/tree ... -exception and build.

Cheers,
Dave

Re: Issue post-processing - svpost

Posted: Wed Oct 23, 2019 5:03 am
by gaetano.a90
Hi,
I tried to compile what you sent me, but i have always the same error.

Cheers,
Gaetano

Re: Issue post-processing - svpost

Posted: Wed Oct 23, 2019 10:04 am
by davep
Hi Gaetano,

I've added some print statements to svpost on https://github.com/ktbolt/svSolver/tree ... -exception, download the source and rebuild svpost. Run it and send me all the output.

Can you also try to just export a single results file? Please send me all the output from svpost, I need to see which geombc.dat files are being scanned.

Another thing to do is to enable dumping a core file using ulimit -c unlimited. You can then use gdb to see where the segfault is occurring using gdb svpost.exe -c core.

Cheers,
Dave

Re: Issue post-processing - svpost

Posted: Thu Oct 24, 2019 1:50 am
by rodrigoroma
Dear Dave,

Thanks a lot for your help. I am working with Gaetano to solve this issue. We have compiled the new branch and hereby I attach the terminal outputs for each of the tests you asked. We performed the following in a Debian cluster with the same set of files (in any case, everything compiles perfectly):

log: svpost.exe -all -indir /home/garan/028_d/288-procs_case/ -outdir . -start 2000 -stop 3000 -incr 100 -vtp 028_d -vtu 028_d > log &
log2: svpost.exe -all -indir /home/garan/028_d/288-procs_case/ -outdir . -start 2600 -stop 3000 -incr 100 -vtp 028_d -vtu 028_d > log2 &
log3: svpost.exe -all -indir /home/garan/028_d/288-procs_case/ -outdir . -sn 2600 -ph > log3 &
log4: svpost.exe -all -indir /home/garan/028_d/288-procs_case/ -outdir . -sn 2300 -ph > log4 &

The fourth command is the only one that completes successfully. I am not familiar with debuging with gdb, could you please send me some more information to make the required test?

Thanks (again),

Rodrigo

Re: Issue post-processing - svpost

Posted: Thu Oct 24, 2019 4:33 pm
by davep
Hi Rodrigo,

I don't see anything wrong from the scripts. The translation process to vtk results files will take up several GB of memory but the -sn -ph options should just take a few MB.

When you set ulimit -c unlimited the svpost program will create a file called core someplace, usually in the directory that you ran the svpost command in.

You can use the unix debugger gdb to examine the core file to see where the segfault occurs. Using the command

gdb svpost.exe -c core

brings you into gdb and should list the program stack and showing where the segfault occurred.

The only other thing I can think off is that the compute node you are running svpost on has restrictions for the amount of memory a process can use. Typing in ulimit -a will show you the limits for a process.

Sorry this is taking so long to figure out!

Cheers,
Dave