Automating Batch Simulations

Provides a system for patient-specific cardiovascular modeling and simulation.
User avatar
David Parker
Posts: 1716
Joined: Tue Aug 23, 2005 2:43 pm

Re: Automating Batch Simulations

Post by David Parker » Wed Oct 21, 2020 11:38 am

Hi Shannen,

Running svpost including the following options will write results to a single file.

Code: Select all

    -vtkcombo \
    -vtu all_results.vtu  \
    -vtp all_results.vtp
However, it is easier to visualize results when they are written to multiple files, just use a smaller increment to get fewer files.

To output OSI and TAWSS results set the -start value to be non-zero, maybe the number of the first restart file after zero.

Cheers,
Dave

User avatar
Shannen Kizilski
Posts: 17
Joined: Thu Dec 08, 2016 3:39 pm

Re: Automating Batch Simulations

Post by Shannen Kizilski » Thu Oct 22, 2020 10:18 am

Hi Dave,

Thanks, now I'm able to make the results convert to a single file! I want to be able to plot pressure vs. time for a periodic flow simulation, so I need data at small time increments to create the full curve without losing much information.

Another issue I'm encountering with svpost is that it often encounters a seg fault at some point if I try to convert anything other than just -sol.

I ran a simulation for 250 timesteps and made a restart file for each step. I can see in my folder that I have restart.n.1 for n=0 to 250. But if I run svpost in that folder with:

Code: Select all

svpost -all -start 1 -stop 250 -incr 1 -indir ./ -outdir ./ -vtkcombo -vtu all_results.vtu -vtp all_results.vtp
I get the output:

Code: Select all

Will reduce all available data
Opening .//restart.1.1 to scan for number of variables...
Opening .//geombc.dat.1 to scan for number of processors and global modes...
Reducing : .//geombc.dat.1
Number of solution variables found (5)
Number of nodes found (75824)
Number of elements found (428442)
Number of processors used (1)
Create vtkPoints with 75824 nodes.
Create tets for 428442 elements.
Reducing : wall properties from .//geombc.dat.1
No wall properties found from .//geombc.dat.1. Will search from restart files.
Reducing (solution) results : .//restart.1.1
Done reading (solution) results : .//restart.1.1
Reducing (time derivative of solution) results : .//restart.1.1
Done reading (time derivative of solution) results : .//restart.1.1
Reducing (displacement) results : .//restart.1.1
NOTE: No (displacement) in .//restart.1.1
Reducing (varwallprop) results : .//restart.0.1
NOTE: No (varwallprop) in .//restart.0.1
Reducing (vin plane traction) results : .//restart.1.1
Done reading (vin plane traction) results : .//restart.1.1
Reducing (rin plane traction) results : .//restart.1.1
Done reading (rin plane traction) results : .//restart.1.1
Reducing (vwall shear stresses) results : .//restart.1.1
Done reading (vwall shear stresses) results : .//restart.1.1
Reducing (rwall shear stresses) results : .//restart.1.1
Done reading (rwall shear stresses) results : .//restart.1.1
Reducing (ybar) results : .//restart.1.1
NOTE: No (ybar) in .//restart.1.1
No ybar in step 1
Reducing (average speed) results : .//restart.1.1
Done reading (average speed) results : .//restart.1.1
Reducing (average pressure) results : .//restart.1.1
Done reading (average pressure) results : .//restart.1.1
Reducing (speed error) results : .//restart.1.1
NOTE: No (speed error) in .//restart.1.1
Reducing (pressure error) results : .//restart.1.1
NOTE: No (pressure error) in .//restart.1.1
Reducing (solution) results : .//restart.2.1
Done reading (solution) results : .//restart.2.1
Reducing (time derivative of solution) results : .//restart.2.1
Done reading (time derivative of solution) results : .//restart.2.1
Reducing (displacement) results : .//restart.2.1
NOTE: No (displacement) in .//restart.2.1
Reducing (vin plane traction) results : .//restart.2.1
Done reading (vin plane traction) results : .//restart.2.1
Reducing (rin plane traction) results : .//restart.2.1
Done reading (rin plane traction) results : .//restart.2.1
Reducing (vwall shear stresses) results : .//restart.2.1
Done reading (vwall shear stresses) results : .//restart.2.1
Reducing (rwall shear stresses) results : .//restart.2.1
Done reading (rwall shear stresses) results : .//restart.2.1
Reducing (ybar) results : .//restart.2.1
NOTE: No (ybar) in .//restart.2.1
No ybar in step 2
Reducing (average speed) results : .//restart.2.1
Done reading (average speed) results : .//restart.2.1
Reducing (average pressure) results : .//restart.2.1
Done reading (average pressure) results : .//restart.2.1
Reducing (speed error) results : .//restart.2.1
NOTE: No (speed error) in .//restart.2.1
Reducing (pressure error) results : .//restart.2.1
NOTE: No (pressure error) in .//restart.2.1
Reducing (solution) results : .//restart.3.1
Done reading (solution) results : .//restart.3.1
Reducing (time derivative of solution) results : .//restart.3.1
Done reading (time derivative of solution) results : .//restart.3.1
Reducing (displacement) results : .//restart.3.1
NOTE: No (displacement) in .//restart.3.1
Reducing (vin plane traction) results : .//restart.3.1
Done reading (vin plane traction) results : .//restart.3.1
Reducing (rin plane traction) results : .//restart.3.1
Done reading (rin plane traction) results : .//restart.3.1
Reducing (vwall shear stresses) results : .//restart.3.1
Done reading (vwall shear stresses) results : .//restart.3.1
Reducing (rwall shear stresses) results : .//restart.3.1
Done reading (rwall shear stresses) results : .//restart.3.1
Reducing (ybar) results : .//restart.3.1
NOTE: No (ybar) in .//restart.3.1
No ybar in step 3
Reducing (average speed) results : .//restart.3.1
Done reading (average speed) results : .//restart.3.1
Reducing (average pressure) results : .//restart.3.1
Done reading (average pressure) results : .//restart.3.1
Reducing (speed error) results : .//restart.3.1
NOTE: No (speed error) in .//restart.3.1
Reducing (pressure error) results : .//restart.3.1
NOTE: No (pressure error) in .//restart.3.1
Reducing (solution) results : .//restart.4.1
Done reading (solution) results : .//restart.4.1
Reducing (time derivative of solution) results : .//restart.4.1
Done reading (time derivative of solution) results : .//restart.4.1
Reducing (displacement) results : .//restart.4.1
NOTE: No (displacement) in .//restart.4.1
Reducing (vin plane traction) results : .//restart.4.1
Done reading (vin plane traction) results : .//restart.4.1
Reducing (rin plane traction) results : .//restart.4.1
NOTE: No (rin plane traction) in .//restart.4.1
Reducing (vwall shear stresses) results : .//restart.4.1
NOTE: No (vwall shear stresses) in .//restart.4.1
Reducing (rwall shear stresses) results : .//restart.4.1
NOTE: No (rwall shear stresses) in .//restart.4.1
Reducing (ybar) results : .//restart.4.1
NOTE: No (ybar) in .//restart.4.1
No ybar in step 4
Reducing (average speed) results : .//restart.4.1
NOTE: No (average speed) in .//restart.4.1
Reducing (average pressure) results : .//restart.4.1
NOTE: No (average pressure) in .//restart.4.1
Reducing (speed error) results : .//restart.4.1
NOTE: No (speed error) in .//restart.4.1
Reducing (pressure error) results : .//restart.4.1
NOTE: No (pressure error) in .//restart.4.1
Reducing (solution) results : .//restart.5.1
Done reading (solution) results : .//restart.5.1
Reducing (time derivative of solution) results : .//restart.5.1
Done reading (time derivative of solution) results : .//restart.5.1
Reducing (displacement) results : .//restart.5.1
/usr/local/bin/svpost: line 58:  3490 Segmentation fault      (core dumped) $SV_HOME/bin/svpost $*
If I run without the -all option, I get the same error. In a previous simulation I ran, the same error would happen during restart.3.1, so it's not happening only at a certain time step.

Does this error mean that an error occurred during the simulation when the restart files were being generated? Is there a way to inspect the restart files to see if they are corrupted or if certain data are missing from them? The pressure/flow data look reasonable when I convert with the -sol option only, but I also need the WSS and OSI values for my project.

Thanks,

Shannen

User avatar
David Parker
Posts: 1716
Joined: Tue Aug 23, 2005 2:43 pm

Re: Automating Batch Simulations

Post by David Parker » Mon Oct 26, 2020 11:26 am

Hi Shannen,

I think you are correct; the svpost failure seems to be caused by an error occurring during the simulation when a restart file is being generated. On a HPC cluster this could occur if one of the multi-process jobs is killed. But you seem to be running on a single processor so this is not the case.

I don't know of a way to check the restart files, maybe check to see if any files are a lot smaller than the rest.

Be careful when rerunning simulations in the same directory, be sure remove old geombc, restart and numstart files.

If you upload your Simulation directory someplace I can download it I'll have a look.

Cheers,
Dave

User avatar
Shannen Kizilski
Posts: 17
Joined: Thu Dec 08, 2016 3:39 pm

Re: Automating Batch Simulations

Post by Shannen Kizilski » Mon Oct 26, 2020 12:18 pm

Dave,

I am running the simulations on my personal computer, but I have been using mpiexec to run the code due to issues with the non-mpi solver discussed in a previous thread.
How does svpost detect how many processors were used to run a job? Do I need to specify something about the number of processors in the svpre or input files for everything to be run and saved correctly?
I have noticed that when I run my simulation via the following command:

Code: Select all

mpiexec -np 4 svsolver
It starts with the following output text:

Code: Select all

The process ID for myrank (0) is (16940).


The number of processes is 1.


The process ID for myrank (0) is (16947).


The number of processes is 1.


The process ID for myrank (0) is (16951).


The number of processes is 1.


The process ID for myrank (0) is (16952).


The number of processes is 1.

Solver Input Files listed as below:
------------------------------------
 Local Config: solver.inp 
Solver Input Files listed as below:
------------------------------------
 Local Config: solver.inp 
Solver Input Files listed as below:
------------------------------------
 Local Config: solver.inp 
Solver Input Files listed as below:
------------------------------------
 Local Config: solver.inp 
 Default Input File: Not Setup.
It's odd to me that all of the processors claim to be rank 0, but it proceeds to run the simulation to completion and give the expected pressure and flow results, so it can only be so wrong...

Anyway, one of my simulation folders is available here, with the restart files that fail to convert starting at the 5th timestep: https://drive.google.com/drive/folders/ ... sp=sharing

Let me know if you figure anything out!

Thank you,

Shannen

User avatar
David Parker
Posts: 1716
Joined: Tue Aug 23, 2005 2:43 pm

Re: Automating Batch Simulations

Post by David Parker » Tue Oct 27, 2020 2:35 pm

Hi Shannen,

The current version of svpost does not report any errors that it encounters when converting the restart files. I've added error checking and recovery so that it does not fail when encountering a corrupted restart file.

Running the new svpost is see

Code: Select all

Reducing (displacement) results : .//restart.5.1
ERROR parsing header: Unexpected end of line. 
Reducing (vin plane traction) results : .//restart.5.1
ERROR parsing header: Unexpected end of line. 
This means that results were not correctly written to these restart files. It is not clear how this can happen running on a PC unless the solver was somehow interrupted and restarted. I was able to convert your files using start=20.

Note that people typically don't write simulation results for every time step, every 20 or 50 is common depending on the length (the number of time steps) of the simulation.

When svsolver starts it asks MPI how many processors were specified (the -np parameter). For some reason MPI thinks that you are running four svsolver programs using a single processor, curious. What OS are you running on? If you have ParaView installed make sure you are not using its version of mpiexec. Running

Code: Select all

which mpiexec
will show you where mpiexec is being found.

The MPI problem might explain the corrupted restart files if you had four versions of svsolver writing to the same file.

Cheers,
Dave

User avatar
Shannen Kizilski
Posts: 17
Joined: Thu Dec 08, 2016 3:39 pm

Re: Automating Batch Simulations

Post by Shannen Kizilski » Tue Oct 27, 2020 3:36 pm

Hi Dave,

Thank you for your patience while troubleshooting these errors with me.

I'm running Ubuntu 18.04.5 LTS, 64 bit.
My output for which mpiexec is: /usr/bin/mpiexec
Could I be missing any supporting libraries that help svsolver to communicate with MPI?

Do you think that writing restarts for every step is causing the system to trip up and make mistakes when writing the files? Honestly, I don't understand how other people are content with seeing such infrequent snippets of their data. For example, by converting every step (with a timestep of 0.004s), I can plot pressure versus time at the inlet:
pressure_allRestarts.png
pressure_allRestarts.png (6.83 KiB) Viewed 1465 times
But if I only converted every 20th step, I would lose a ton of information about the peak pressure and the overall shape of the pressure wave:
pressure_every20restarts.png
pressure_every20restarts.png (5.82 KiB) Viewed 1465 times
For my project, I want to know the range and spread of values like pressure and WSS, so I feel like I need to look at data with a fine enough time resolution so that I don't mistakenly think the minimum pressure is 90mmHg instead of 65mmHg, like in the example above. Is there an alternative to "converting results" that would let me extract just summarized data for more timesteps, so the full spatial mesh of results could be converted just for times when I want to visualize fields? I could convert every 20th timestep if I ran the simulation with a 20x finer stepsize, but that doesn't sound very efficient.

Thank you again,

Shannen

User avatar
Daniel Emerson
Posts: 12
Joined: Fri Jul 17, 2020 4:33 pm

Re: Automating Batch Simulations

Post by Daniel Emerson » Wed Oct 28, 2020 9:08 am

Shannen,

I dont know if this might help, but I was having issues due to the version of MPI called by my linux system when using the "mpiexec" command. Beyond the "which mpiexec" command, some of the following might help.

In my case, my system was using openmpi. I found this out with the following command:

Code: Select all

mpiexec --version
In my case this output the following:

Code: Select all

mpiexec (OpenRTE) 2.1.1

Report bugs to http://www.open-mpi.org/community/help/
To see where mpich was installed I entered the following command:

Code: Select all

locate mpiexec
And I recieved the output below:

Code: Select all

/etc/alternatives/mpiexec
/etc/alternatives/mpiexec.1.gz
/home/tong/Software/MATLAB_19/bin/mw_mpiexec
/home/tong/Software/MATLAB_19/bin/glnxa64/mpiexec
/home/tong/Software/MATLAB_19/bin/glnxa64/mpiexec.hydra
/home/tong/Software/MATLAB_19/bin/glnxa64/mpiexec2
/usr/bin/mpiexec
/usr/bin/mpiexec.hydra
/usr/bin/mpiexec.mpich
/usr/bin/mpiexec.openmpi
/usr/local/sv/svsolver/2019-02-07/bin/mpiexec
/usr/local/sv/svsolver/2019-02-07/bin/mpiexec.hydra
/usr/local/sv/svsolver/2019-02-07/bin/mpiexec.mpich
/usr/share/man/man1/mpiexec.1.gz
/usr/share/man/man1/mpiexec.hydra.1.gz
/usr/share/man/man1/mpiexec.mpich.1.gz
/usr/share/man/man1/mpiexec.openmpi.1.gz
Looks like I can locate the mpich installation at

Code: Select all

mpiexec.mpich
This is how I call svSolver now without issue:

Code: Select all

mpiexec.mpich -n 24 /usr/local/sv/svsolver/2019-02-07/svsolver
Hope this helps,
Dan

User avatar
Shannen Kizilski
Posts: 17
Joined: Thu Dec 08, 2016 3:39 pm

Re: Automating Batch Simulations

Post by Shannen Kizilski » Wed Oct 28, 2020 10:27 am

Dan,

Thank you for that suggestion! I ran the commands to see more about my MPI version and I have the same situation as you, OpenRTE and then a list of several mpiexec versions. In the GUI, I point Simvascular to use the /usr/bin/mpiexec.mpich, so it makes sense that I need to tell it to do that when running through the command line.

I'm running a simulation now with mpiexec.mpich, and I'm optimistic that the problem is resolved because the simulation now starts with:

Code: Select all

The process ID for myrank (0) is (3689).


The number of processes is 4.


The process ID for myrank (1) is (3688).


The process ID for myrank (2) is (3691).


The process ID for myrank (3) is (3690).
instead of all of the processors being rank 0.

I'll follow up if I still have issues with converting the results, but I think this might've solved it.

Thanks again Dan and Dave!

- Shannen

User avatar
David Parker
Posts: 1716
Joined: Tue Aug 23, 2005 2:43 pm

Re: Automating Batch Simulations

Post by David Parker » Wed Oct 28, 2020 10:48 am

Hi Dan,

Having both OpenMPI and MPICH installed does indeed cause problems. Thanks for posting the solution!

I've created a program to compute the average flow files here https://github.com/ktbolt/cardiovascula ... flow-files. Let me know if this works for you.

Cheers,
Dave

User avatar
David Parker
Posts: 1716
Joined: Tue Aug 23, 2005 2:43 pm

Re: Automating Batch Simulations

Post by David Parker » Wed Oct 28, 2020 10:57 am

Hi Shannen,

The frequency of writing restart files depends on what your simulation is trying to capture and what your time step is relative to that. Since the time step you are using is large compared to what you are simulating then you are correct that it makes sense to write results frequently.

Write results every time step should not cause problems. I think running the job like Dan suggested will solve the problem with corrupt restart files.

Cheers,
Dave

POST REPLY