Hi Julian,
The svSolver is for Ubuntu 18 or 20?
It's not clear to me what is happening. Be careful when running multiple jobs that they are starting from a clean simulations directory and are writing results to different locations (i.e. directories).
If you can upload your simulations directory including your bash script and I'll see if I can reproduce the problem.
Cheers,
Dave
Jump in Wss contour
- Julian Suk
- Posts: 4
- Joined: Fri Nov 06, 2020 12:42 am
Re: Jump in Wss contour
Hi Dave,
I am using svSolver for Ubuntu 20.04 on both my machines. For multiple jobs, I create a clean simulations directory and write to different locations.
I have been able to work around the issue by substituting (bash code)
using SingularityCE and an image I found online (https://hub.docker.com/r/unlhcc/simvascular) which contains an old version of svSolver and the according MPI version.
I have uploaded a portable version of my software project (https://drive.google.com/file/d/1pQ9TVo ... sp=sharing) that reproduces the error for me. I am not sure if it is feasible for you to try to debug this since the failed write only occurs after 4 [h] of simulation on multiple processors. The server that I use for these simulations runs on 33 % CPU utilisation even for 24 parallel processes and had > 100 GB of free RAM at all times. So I suspect some write processes interfering or similar.
Since it works for me with an old combination of MPI and svSolver it would be too much to ask of you to try to find the error, but if you have some intuition on what goes wrong I would appreciate it. I am happy to try it out in my code since I am doing these simulations now anyway!
Thanks a lot,
Julian
I am using svSolver for Ubuntu 20.04 on both my machines. For multiple jobs, I create a clean simulations directory and write to different locations.
I have been able to work around the issue by substituting (bash code)
Code: Select all
#/usr/bin/orterun -np 12 /usr/local/sv/svsolver/2021-09-30/svsolver
singularity exec --bind $PWD /home/sukjm/simvascular_latest.sif /usr/bin/mpiexec.hydra -np 12 /usr/local/sv/svsolver/2019-02-07/svsolver
I have uploaded a portable version of my software project (https://drive.google.com/file/d/1pQ9TVo ... sp=sharing) that reproduces the error for me. I am not sure if it is feasible for you to try to debug this since the failed write only occurs after 4 [h] of simulation on multiple processors. The server that I use for these simulations runs on 33 % CPU utilisation even for 24 parallel processes and had > 100 GB of free RAM at all times. So I suspect some write processes interfering or similar.
Since it works for me with an old combination of MPI and svSolver it would be too much to ask of you to try to find the error, but if you have some intuition on what goes wrong I would appreciate it. I am happy to try it out in my code since I am doing these simulations now anyway!
Thanks a lot,
Julian
- David Parker
- Posts: 1757
- Joined: Tue Aug 23, 2005 2:43 pm
Re: Jump in Wss contour
Hi Julian,
Interesting, I didn't know someone had built a Docker container for SV. I am also planning on creating containers for svSolver some time in the future. Also interesting that the failure does not occur with the older svSolver version.
There doesn't seem to be anything obviously wrong with your bash scripts, running MPI jobs one after the other in separate directories.
Since the jobs only appear to fail after four hours I can only guess that there is some sort of resource failure, not sure what that could be (time, memory or disk space). You can try piping stdout and stderr to a file and see if there is anything suspicious there
Cheers,
Dave
Interesting, I didn't know someone had built a Docker container for SV. I am also planning on creating containers for svSolver some time in the future. Also interesting that the failure does not occur with the older svSolver version.
There doesn't seem to be anything obviously wrong with your bash scripts, running MPI jobs one after the other in separate directories.
Since the jobs only appear to fail after four hours I can only guess that there is some sort of resource failure, not sure what that could be (time, memory or disk space). You can try piping stdout and stderr to a file and see if there is anything suspicious there
Code: Select all
mpiexec -np $processes "$svsolver" &> svsolver.log
Dave