Multicore parallel computing with Moco
- Brian Umberger
- Posts: 48
- Joined: Tue Aug 28, 2007 2:03 pm
Multicore parallel computing with Moco
Hi All,
Over the past couple of years, there have been some discussions about parallel computing in Moco in this forum. Alex Denton, a former PhD student in our group (now a postdoc at Oregon), just published a paper that investigates multicore parallel speed-up in Moco. In the paper, she addresses how parallel speed-up interacts with model complexity, movement task, temporal mesh density, and the type of initial guess. For anyone who is interested, here are the links to the open-access paper and the SimTK page, which includes some example codes.
https://onlinelibrary.wiley.com/doi/ful ... 2/cnm.3777
https://simtk.org/projects/mocoparallel
The tl;dr summary is that most problems had diminishing returns for parallel speed-up above about 6 cores. So while there certainly may be exceptions, the primary advantage to having a machine with lots of cores seems to be the ability to solve multiple independent problems simultaneously, rather than solving a single problem really fast. While our focus was on parallel speed-up, there was also lots of problem specificity in total runtimes, so (unfortunately) there is no substitute for spending some time to see what works for your problem to get the greatest computational performance.
Best regards,
Brian
Over the past couple of years, there have been some discussions about parallel computing in Moco in this forum. Alex Denton, a former PhD student in our group (now a postdoc at Oregon), just published a paper that investigates multicore parallel speed-up in Moco. In the paper, she addresses how parallel speed-up interacts with model complexity, movement task, temporal mesh density, and the type of initial guess. For anyone who is interested, here are the links to the open-access paper and the SimTK page, which includes some example codes.
https://onlinelibrary.wiley.com/doi/ful ... 2/cnm.3777
https://simtk.org/projects/mocoparallel
The tl;dr summary is that most problems had diminishing returns for parallel speed-up above about 6 cores. So while there certainly may be exceptions, the primary advantage to having a machine with lots of cores seems to be the ability to solve multiple independent problems simultaneously, rather than solving a single problem really fast. While our focus was on parallel speed-up, there was also lots of problem specificity in total runtimes, so (unfortunately) there is no substitute for spending some time to see what works for your problem to get the greatest computational performance.
Best regards,
Brian
- Pagnon David
- Posts: 86
- Joined: Mon Jan 06, 2014 3:13 am
Re: Multicore parallel computing with Moco
Thank you for sharing. This gives me the opportunity to ask: if the maximum speed is almost reached with 6 cores, does it mean that GPU computing cannot be leveraged? Do you know [about this](https://simtk.org/projects/gpuexp)?
If I understand right,
- the NLP function evaluations by CasADi can easily be parallelized
- the NLP solving by IPOPT (used within CasADi) is not easy to parallelize, and you did not investigate it
The speed-up rate mostly depends on how much time is spent in evaluating the NLP functions vs. in optimizing the solution, which is problem dependent. Does it sound accurate?
If I understand right,
- the NLP function evaluations by CasADi can easily be parallelized
- the NLP solving by IPOPT (used within CasADi) is not easy to parallelize, and you did not investigate it
The speed-up rate mostly depends on how much time is spent in evaluating the NLP functions vs. in optimizing the solution, which is problem dependent. Does it sound accurate?
- Brian Umberger
- Posts: 48
- Joined: Tue Aug 28, 2007 2:03 pm
Re: Multicore parallel computing with Moco
Hi David,
I think your summary of our results is accurate.
I had not seen the GPU project you linked to. We can't say for certain based on our results, but with the current framework I'm doubtful GPU computing would be fruitful. Even in the best case the speed-up, while certainly beneficial, was far from ideal. The main caveat to that statement is there are many possible types of analyses that could be done with Moco and we only considered a subset of them. Possibly GPUs could be exploited in some other situations.
Best,
Brian
I think your summary of our results is accurate.
I had not seen the GPU project you linked to. We can't say for certain based on our results, but with the current framework I'm doubtful GPU computing would be fruitful. Even in the best case the speed-up, while certainly beneficial, was far from ideal. The main caveat to that statement is there are many possible types of analyses that could be done with Moco and we only considered a subset of them. Possibly GPUs could be exploited in some other situations.
Best,
Brian
- Ross Miller
- Posts: 375
- Joined: Tue Sep 22, 2009 2:02 pm
Re: Multicore parallel computing with Moco
Sounds like I should see what the return policy is on the 12-core CPU / 30-core GPU I just bought.
- Nicholas Bianco
- Posts: 1044
- Joined: Thu Oct 04, 2012 8:09 pm
Re: Multicore parallel computing with Moco
Thanks Brian! This is a great resource for the Moco community
- Brian Umberger
- Posts: 48
- Joined: Tue Aug 28, 2007 2:03 pm
Re: Multicore parallel computing with Moco
Thanks, Nick!
And Ross, I don't know... that sounds like a pretty good machine to me for running several simultaneous Moco simulations, each using 4-6 cores!
Brian
And Ross, I don't know... that sounds like a pretty good machine to me for running several simultaneous Moco simulations, each using 4-6 cores!
Brian
- Lars D''Hondt
- Posts: 1
- Joined: Wed Nov 03, 2021 3:44 am
Re: Multicore parallel computing with Moco
Hi all,
Some performance loss can be caused by how CasADi handles parallelization under the hood. (https://github.com/casadi/casadi/blob/m ... #L658-L672)
The number of mesh intervals that is assigned to each parallel thread is n_mesh/n_thread, and if this is not a round number, it is rounded up. The additional functions that this creates are evaluated, but their outputs are discarded.
Looking at figure 3 of the paper Brian linked, some of the results that fall below the curve could be due to an unfortunate combination of number of mesh intervals and number of cores/threads:
- 10 intervals and 9 cores -> dynamics are evaluated at 18 intervals
- 25 intervals and 12 cores -> dynamics are evaluated at 36 intervals
- 25 intervals and 18 cores -> dynamics are evaluated at 36 intervals
Limiting the number of additional function evaluations won't be the most impactful way to speed up simulations, but it's very low effort.
Kind regards,
Lars
Some performance loss can be caused by how CasADi handles parallelization under the hood. (https://github.com/casadi/casadi/blob/m ... #L658-L672)
The number of mesh intervals that is assigned to each parallel thread is n_mesh/n_thread, and if this is not a round number, it is rounded up. The additional functions that this creates are evaluated, but their outputs are discarded.
Looking at figure 3 of the paper Brian linked, some of the results that fall below the curve could be due to an unfortunate combination of number of mesh intervals and number of cores/threads:
- 10 intervals and 9 cores -> dynamics are evaluated at 18 intervals
- 25 intervals and 12 cores -> dynamics are evaluated at 36 intervals
- 25 intervals and 18 cores -> dynamics are evaluated at 36 intervals
Limiting the number of additional function evaluations won't be the most impactful way to speed up simulations, but it's very low effort.
Kind regards,
Lars
- Brian Umberger
- Posts: 48
- Joined: Tue Aug 28, 2007 2:03 pm
Re: Multicore parallel computing with Moco
Hi Lars,
Thanks for these comments. We had searched without success for documentation on exactly how CasADi handles the work allocation across threads, and I guess we should have gone straight to the source code. You are right that by using a typical mesh interval progression of ..., 10, 25, 50, ... they did not align with the number of cores being used.
Your post made me curious, so I did a couple of quick checks with the 2-D predictive walking simulation. For the "10 interval, 9 core" case, increasing to 10 cores had almost no effect on the computational speed (only 0.75% faster). However, in the "25 interval, 12 core" case, dropping down to 24 intervals improved computational speed by 14%. So, the actual impact in practice seems variable, but the n_mesh/n_thread ratio is definitely worth checking and easy to control, as you noted.
Thanks again.
Brian
Thanks for these comments. We had searched without success for documentation on exactly how CasADi handles the work allocation across threads, and I guess we should have gone straight to the source code. You are right that by using a typical mesh interval progression of ..., 10, 25, 50, ... they did not align with the number of cores being used.
Your post made me curious, so I did a couple of quick checks with the 2-D predictive walking simulation. For the "10 interval, 9 core" case, increasing to 10 cores had almost no effect on the computational speed (only 0.75% faster). However, in the "25 interval, 12 core" case, dropping down to 24 intervals improved computational speed by 14%. So, the actual impact in practice seems variable, but the n_mesh/n_thread ratio is definitely worth checking and easy to control, as you noted.
Thanks again.
Brian
- Nicholas Bianco
- Posts: 1044
- Joined: Thu Oct 04, 2012 8:09 pm
Re: Multicore parallel computing with Moco
Great tip Lars! Thanks for posting.