Page 1 of 1

Performance between svSolver and svFSIPlus

Posted: Tue Aug 27, 2024 9:55 am
by ibartol
Good morning,

I have a question regarding the performance between svSolver and svFSIPlus. I was using svSolver and now I migrated to use svFSIPlus and I noticed a downgrade in the performance of the solver, so I am unsure if I am doing something wrong either in the compiling it, or the way that I am executing the files. Basically, I've replicated the Coronary Normal case that I was solving using svSolver in svFSI plus. The svFSI files can be found attached:
svFSIPlus.zip
svFSIPlus_inputFiles
(84.85 KiB) Downloaded 86 times
Just for a comparison, I run for 20 seconds both solvers, and in svFSIPlus it solved for 5 timesteps while in svSolver it reached 69 timesteps.

Code: Select all

timeout 20s mpiexec -np 48 "svFSI" svFSI.xml

Code: Select all

timeout 20s mpiexec -np 48 "svSolver"
I've also tried to run with

Code: Select all

mpirun
but no luck. I am using svFSIPlus and svSolver compiled from source with local VTK 8.2.0. And this is the stack of modules I am using the HPC environment:

Code: Select all

Currently Loaded Modules:
  1) gcc/10.5.0-gcc-8.5.0-5heb         4) curl/7.61.1-gcc-12.3.0-5t73   7) cmake/3.27.7-gcc-12.3.0-5cfk
  2) openmpi/4.1.5_ucx1.14.1           5) ncurses/6.1-gcc-12.3.0-7zps
  3) openblas/0.3.24-gcc-12.3.0-ts56   6) zlib/1.2.13-gcc-12.3.0-c2ww
I know the mpi and Cmake are a bit outdated, do you think that could be causing the problem?

OS info:
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: Rocky
Description: Rocky Linux release 8.10 (Green Obsidian)
Release: 8.10
Codename: GreenObsidian

Thanks a lot!

Re: Performance between svSolver and svFSIPlus

Posted: Tue Aug 27, 2024 1:06 pm
by davep
Hello,

I ran your svFSIplus example and svSolver Coronary pulsatile_sim simulations on Ubuntu.

- The svFSIplus solver does indeed run something like 10x slower but it is interfacing to the svZeroSolver

- svSolver is using the RCR and Coronary boundary conditions supported in svSolver

an interesting comparison though, seems svZeroSolver is doing a lot of work.

I will investigate.

Cheers,
Dave

Re: Performance between svSolver and svFSIPlus

Posted: Tue Aug 27, 2024 2:06 pm
by ibartol
Hi David,

Thanks a lot for your quick response! Just an update on this:
- Also tried to run with some generic RCR boundary conditions from the XML file (not coupled with svZeroDSolver), and does not seem to improve much. I'll attach the testing XML file in case you want to take a look
svFSI.xml
svFSI_xml_RCR
(11.3 KiB) Downloaded 70 times
If you don't mind, I have another question, I saw that there is an option for scalar transport in the old svSolver, is it possible to use it?

I saw that there is an option to add in the solver.inp file

Code: Select all

Solve Scalars : 1           # nsclrS total number of scalars must be <=4
, but I am unsure how can I add the BCs for that equation. If you happent to have a very basic working example on this, it will be very much appreciated.

Thanks a lot again!

Best,
Ignacio

Re: Performance between svSolver and svFSIPlus

Posted: Wed Aug 28, 2024 4:53 pm
by davep
Hi Ignacio,

Thanks for the new XML file!

I ran svfsiplus using the new XML file and do see that svfsiplus is running 6x slower than svsolver. The tests we have done comparing svfsiplus and svsolver show that they run in about the same time.

Because svfsiplus does not have support for a coronary boundary condition I am assuming that these two simulations are solving two different problems. It would be interesting to use the same simple resistance boundary conditions for both simulations and see what the execution times are. This might could uncover an issue with the svfsiplus linear solver or maybe might require a better preconditioner using Trilinos or PETSc.

I saw that there is an option for scalar transport in the old svSolver, is it possible to use it?

The svSolver does have code to solve for advection diffusion but there is no documentation describing how to use it that I know of and no one here has used it. I suggest using svfsiplus.

Cheers,
Dave

Re: Performance between svSolver and svFSIPlus

Posted: Thu Aug 29, 2024 8:01 am
by ibartol
Hi David,

Again, thanks a lot for taking your time to investigate this. I will take a deeper look into this today, I am unsure what it is going wrong with my solver time but I am getting quite slower than 6x. I know what I will say next is not a fair comparison, but I just finished my scalar+fluid simulation yesterday. When I was solving only the fluid for the coronary geoemtry (default parameters in the tutorial) in svSolver, using 96proc (2 nodes - 2 Intel Xeon 8268 CPUs per node, Cascade Lake Platinum chipset - 24 cores per CPU @ 2.90 GHz) it took around ~15min for 6E3 external iterations.

But, now when I switched to svSolver and coupled with HF equation, using the same computational power took ~60hrs to finish. So I did an "scalability test" using 4, 12, 24, 48 cores in a cylinder geoemtry using just resistance BC, 1000 timesteps, then averaged the number of time steps the svFSIPlus was solving per minute, see below:
scalability.png
scalabilityTest
scalability.png (29.81 KiB) Viewed 781 times
So it seems to scale well with the number of procesors. So yesterday went ahead and I sent the same simulation I first attached (with the coronary BC) using 144 proc (3 nodes) and so far it has been ~16hrs and I got until 4.5E3 timesteps. I plotted each timestep for each solver (NS and HF) against time, and does not seem to be a gap between both, usually takes <1s to solve the HF loop. Also, all the threads are having a balanced load, it does not seem that any of the processors is clogging the multi-threading ops as it happened to me before.
timeHF-NS-svFSI.png
time_Solver_FSIPLus-HF-NS
timeHF-NS-svFSI.png (34.53 KiB) Viewed 781 times
Lastly, I did some profiling using

Code: Select all

valgrind --tool=callgrind
on a single thread, (on a cylinder case), and taking a quick look, I didn't find anything suspicious, although I didn't do a deep dive into the profiling result. I am attaching the results for the functions that are taking >1%. I will try to change preconditioner as that may be taking some extra time.

I will do a deeper analysis today and report back if I find anything interesting. If you can let me know which MPI version you are using for svFSIPlus that would be great.

Code: Select all

--------------------------------------------------------------------------------
Ir                      
--------------------------------------------------------------------------------
95,404,227,520 (100.0%)  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir                       file:function
--------------------------------------------------------------------------------
25,481,833,114 (26.71%)  /home/ibartol/svFSIplus-package/svFSIplus/Code/Source/svFSILS/omp_la.cpp:omp_la::omp_sum_v(int, int, double, Array<double>&, Array<double> const&) [/home/ibartol/svFSIplus-package/build/svFSI-build/bin/svFSI]
21,845,359,360 (22.90%)  /home/ibartol/svFSIplus-package/svFSIplus/Code/Source/svFSILS/dot.cpp:dot::fsils_nc_dot_v(int, int, Array<double> const&, Array<double> const&) [/home/ibartol/svFSIplus-package/build/svFSI-build/bin/svFSI]
 8,385,359,946 ( 8.79%)  /home/ibartol/svFSIplus-package/svFSIplus/Code/Source/svFSILS/spar_mul.cpp:spar_mul::fsils_spar_mul_vv(fsi_linear_solver::FSILS_lhsType&, Array<int> const&, Vector<int> const&, int, Array<double> const&, Array<double> const&, Array<double>&) [/home/ibartol/svFSIplus-package/build/svFSI-build/bin/svFSI]
 7,021,782,083 ( 7.36%)  /home/ibartol/svFSIplus-package/svFSIplus/Code/Source/svFSI/fluid.cpp:fluid::fluid_3d_m(ComMod&, int, int, int, double, Array<double> const&, Vector<double> const&, Vector<double> const&, Array<double> const&, Array<double> const&, Array<double> const&, Array<double> const&, Array<double> const&, Array<double> const&, Array<double>&, Array3<double>&) [/home/ibartol/svFSIplus-package/build/svFSI-build/bin/svFSI]
 3,740,615,368 ( 3.92%)  /home/ibartol/svFSIplus-package/svFSIplus/Code/Source/svFSI/fluid.cpp:fluid::fluid_3d_c(ComMod&, int, int, int, double, Array<double> const&, Vector<double> const&, Vector<double> const&, Array<double> const&, Array<double> const&, Array<double> const&, Array<double> const&, Array<double> const&, Array<double> const&, Array<double>&, Array3<double>&) [/home/ibartol/svFSIplus-package/build/svFSI-build/bin/svFSI]
 3,388,837,579 ( 3.55%)  /home/ibartol/svFSIplus-package/svFSIplus/Code/Source/svFSI/heatf.cpp:heatf::heatf_3d(ComMod&, int, double, Vector<double> const&, Array<double> const&, Array<double> const&, Array<double> const&, Array<double> const&, Array<double>&, Array3<double>&) [/home/ibartol/svFSIplus-package/build/svFSI-build/bin/svFSI]
 2,202,734,458 ( 2.31%)  ./malloc/./malloc/malloc.c:_int_free [/usr/lib/x86_64-linux-gnu/libc.so.6]
 1,766,583,006 ( 1.85%)  /home/ibartol/svFSIplus-package/svFSIplus/Code/Source/svFSI/Array.h:spar_mul::fsils_spar_mul_vv(fsi_linear_solver::FSILS_lhsType&, Array<int> const&, Vector<int> const&, int, Array<double> const&, Array<double> const&, Array<double>&)
 1,680,982,663 ( 1.76%)  /home/ibartol/svFSIplus-package/svFSIplus/Code/Source/svFSI/Array.h:gmres::gmres_s(fsi_linear_solver::FSILS_lhsType&, fsi_linear_solver::FSILS_subLsType&, int, Vector<double> const&, Vector<double>&)
 1,605,408,588 ( 1.68%)  /home/ibartol/svFSIplus-package/svFSIplus/Code/Source/svFSI/lhsa.cpp:lhsa_ns::do_assem(ComMod&, int, Vector<int> const&, Array3<double> const&, Array<double> const&) [/home/ibartol/svFSIplus-package/build/svFSI-build/bin/svFSI]
 1,477,185,778 ( 1.55%)  ./malloc/./malloc/malloc.c:malloc [/usr/lib/x86_64-linux-gnu/libc.so.6]
 1,324,011,215 ( 1.39%)  ./string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:__memset_sse2_unaligned [/usr/lib/x86_64-linux-gnu/libc.so.6]
 1,217,267,390 ( 1.28%)  /home/ibartol/svFSIplus-package/svFSIplus/Code/Source/svFSI/nn.cpp:nn::gnn(int, int, int, Array<double>&, Array<double>&, Array<double>&, double&, Array<double>&) [/home/ibartol/svFSIplus-package/build/svFSI-build/bin/svFSI]
   755,234,208 ( 0.79%)  ./malloc/./malloc/malloc.c:free [/usr/lib/x86_64-linux-gnu/libc.so.6]
Thanks a lot in advance!

Best,
Ignacio

Re: Performance between svSolver and svFSIPlus

Posted: Fri Aug 30, 2024 12:24 pm
by davep
Hi Ignacio,

Thanks for the detailed analysis!

We build with OpenMPI. On Ubuntu I need to set OMP_NUM_THREADS=1 when running with MPI.

I'm not sure what might be going on with the compute times. I would test running with a single processor for 10 time steps and see what the run times are. And then run using four processors for 40 time steps. The problem is not that large so be careful using too many processors so that communication is not a bottle neck.

I am going to set up and run coronary simulations with resistance boundary conditions for svfsiplus and svsolver.

Cheers,
Dave