Is MOCO CPU agnostic?
- Pasha van Bijlert
- Posts: 227
- Joined: Sun May 10, 2020 3:15 am
Is MOCO CPU agnostic?
Hi all,
Perhaps the answer is quite obvious, but I thought I'd double check. I'll be assembling a 3D/CAD workstation with an AMD Ryzen 5900x soon (CAD software in general benefits from a powerful CPU). While it's not the purpose of this computer, I'll be in the situation that I can run MOCO optimization runs on it as well, and I expect that this would be substantially faster than on my 10 year old i5 desktop. So this had me wondering: can MOCO (I suppose more specifically, CASADI) effectively use the 24 threads of that processor? Are there some type of compatibility issues that may occur or am I overthinking things? MOCO can make use of parallelization, correct? Is there a ceiling to this?
Best wishes,
Pasha
Perhaps the answer is quite obvious, but I thought I'd double check. I'll be assembling a 3D/CAD workstation with an AMD Ryzen 5900x soon (CAD software in general benefits from a powerful CPU). While it's not the purpose of this computer, I'll be in the situation that I can run MOCO optimization runs on it as well, and I expect that this would be substantially faster than on my 10 year old i5 desktop. So this had me wondering: can MOCO (I suppose more specifically, CASADI) effectively use the 24 threads of that processor? Are there some type of compatibility issues that may occur or am I overthinking things? MOCO can make use of parallelization, correct? Is there a ceiling to this?
Best wishes,
Pasha
- Ross Miller
- Posts: 375
- Joined: Tue Sep 22, 2009 2:02 pm
Re: Is MOCO CPU agnostic?
Hi Pasha,
At least when using the CasADI solver, Moco for sure can run in parallel. For both machines I've run Moco on (both Macs), it automatically detects how many cores I have and runs two threads on each core in parallel.
I think Brian Umberger has told me previously that he has found the speed increase from giving Moco/CasADI/IPOPT more cores levels off and can even get slower beyond a certain point, but generally speaking for the range of cores on most consumer-grade CPUs (6-20ish), I would expect more cores will make Moco run its IPOPT iterations faster.
Ross
At least when using the CasADI solver, Moco for sure can run in parallel. For both machines I've run Moco on (both Macs), it automatically detects how many cores I have and runs two threads on each core in parallel.
I think Brian Umberger has told me previously that he has found the speed increase from giving Moco/CasADI/IPOPT more cores levels off and can even get slower beyond a certain point, but generally speaking for the range of cores on most consumer-grade CPUs (6-20ish), I would expect more cores will make Moco run its IPOPT iterations faster.
Ross
Re: Is MOCO CPU agnostic?
Hi Pasha,
I can back up Ross and state that Moco automatically detects cores and uses them appropriately it seems. We had some older PCs in our lab that had 24 cores vs. I have a newer laptop with 8 cores - and found that my laptop performed just as quickly, if not quicker. I suspect this had to do with more than just the number of cores but was an interesting finding for me in which set-up to run simulations on.
Aaron
I can back up Ross and state that Moco automatically detects cores and uses them appropriately it seems. We had some older PCs in our lab that had 24 cores vs. I have a newer laptop with 8 cores - and found that my laptop performed just as quickly, if not quicker. I suspect this had to do with more than just the number of cores but was an interesting finding for me in which set-up to run simulations on.
Aaron
- Pasha van Bijlert
- Posts: 227
- Joined: Sun May 10, 2020 3:15 am
Re: Is MOCO CPU agnostic?
Hi Ross & Aaron,
Interesting, so there seems to be a sweet spot with the number of cores? Would you care to speculate as to why this is? I'd think the the computational effort to parallelize something is always going to be lower than performance benefit, but apparently not...
Aaron, was that a 24 core (thus 48 thread) CPU, or also 24 threads? Interesting that it was outperformed by fewer cores. I suppose that the processing power per core is also a factor in how fast the calculations are performed.
Best,
Pasha
Interesting, so there seems to be a sweet spot with the number of cores? Would you care to speculate as to why this is? I'd think the the computational effort to parallelize something is always going to be lower than performance benefit, but apparently not...
Aaron, was that a 24 core (thus 48 thread) CPU, or also 24 threads? Interesting that it was outperformed by fewer cores. I suppose that the processing power per core is also a factor in how fast the calculations are performed.
Best,
Pasha
Re: Is MOCO CPU agnostic?
Hi Pasha,
It was 24 threads, so yes I do recall that CPU having 12 cores. They were much older computers than the laptop I was using so I'd say there was a different balance of processing power per core there. I've found that my laptop (Lenovo X1 Yoga) using 8 threads performs pretty well with respect to the number of iterations it gets through.
Aaron
It was 24 threads, so yes I do recall that CPU having 12 cores. They were much older computers than the laptop I was using so I'd say there was a different balance of processing power per core there. I've found that my laptop (Lenovo X1 Yoga) using 8 threads performs pretty well with respect to the number of iterations it gets through.
Aaron
- Nicholas Bianco
- Posts: 1044
- Joined: Thu Oct 04, 2012 8:09 pm
Re: Is MOCO CPU agnostic?
Thanks Ross and Aaron for the great info.
Pasha, regarding the "sweet spot" with the number of cores: there is some computational overhead to manage all the independent threads when parallelizing in Moco (fyi, we use CasADi's built in parallelization tools). As you increase the number of threads, this overhead increases, and this could be part of the explanation.
Pasha, regarding the "sweet spot" with the number of cores: there is some computational overhead to manage all the independent threads when parallelizing in Moco (fyi, we use CasADi's built in parallelization tools). As you increase the number of threads, this overhead increases, and this could be part of the explanation.
- Pasha van Bijlert
- Posts: 227
- Joined: Sun May 10, 2020 3:15 am
Re: Is MOCO CPU agnostic?
Hi all,
Thanks for the interesting discussion. You can limit the number of parallel cores/workers used in matlab, does this have a downstream effect on how many cores CasADI gets access to? In that case I could play around with it and report back.
Thanks!
Pasha
Thanks for the interesting discussion. You can limit the number of parallel cores/workers used in matlab, does this have a downstream effect on how many cores CasADI gets access to? In that case I could play around with it and report back.
Thanks!
Pasha
- Nicholas Bianco
- Posts: 1044
- Joined: Thu Oct 04, 2012 8:09 pm
Re: Is MOCO CPU agnostic?
Hi Pasha,
The number of cores you set for Matlab's parallelization tools (e.g., parpool) should not (I believe) affect the number of cores used by CasADi in Moco.
You can set the number of threads used by Moco using the "set_parallel()" property. When you run a problem, Moco will print out the number of threads being used for that problem to the console, so you can verify that way.
Best,
Nick
The number of cores you set for Matlab's parallelization tools (e.g., parpool) should not (I believe) affect the number of cores used by CasADi in Moco.
You can set the number of threads used by Moco using the "set_parallel()" property. When you run a problem, Moco will print out the number of threads being used for that problem to the console, so you can verify that way.
Best,
Nick
- Brian Umberger
- Posts: 48
- Joined: Tue Aug 28, 2007 2:03 pm
Re: Is MOCO CPU agnostic?
Hi All,
I was busy with the end of the semester and some admin duties, so I missed this interesting thread started by Pasha a few weeks ago. I can make two small additions.
One of my PhD students, Alex Denton, has an abstract at the ASB meeting this summer on multicore performance in Moco using the CasADi parallelization. Our preliminary results, subject to ongoing work, is that there are diminishing returns beyond approximately 10 physical cores (i.e., actual cores, not hyperthreading). Some problems show continued speedup all the way to 36 cores, but by very small amounts, and some problems do get slower with more cores due to parallel overhead as noted by Ross and Nick. In all cases so far you would be better off with a fast 8 core processor than a slower 24 core processor, consistent with Aaron's experience. That statement is based on the fact that for a fixed monetary cost, the number of processor cores trades off against the clock speed per core. Again, the exact numbers are subject to work still in progress.
However... having many cores can still be beneficial if you have lots of similar but independent problems to run, such as different initial guesses, different weights in the cost function, etc. The Matlab and CasADi parallelization tools function independently. So, if you have say 24 cores available you could set the Moco parallel property to use 8 cores (set_parallel(8)) per optimization, and then run the multiple optimizations in parallel within a Matlab parfor loop with 3 workers. In our experience (thus far) that will yield the results in a fraction of the time of running the multiple optimizations one after the other with set_parallel(24). The speedup in this case, at a project level rather than the level of a single optimization, can be substantial.
Best,
Brian
I was busy with the end of the semester and some admin duties, so I missed this interesting thread started by Pasha a few weeks ago. I can make two small additions.
One of my PhD students, Alex Denton, has an abstract at the ASB meeting this summer on multicore performance in Moco using the CasADi parallelization. Our preliminary results, subject to ongoing work, is that there are diminishing returns beyond approximately 10 physical cores (i.e., actual cores, not hyperthreading). Some problems show continued speedup all the way to 36 cores, but by very small amounts, and some problems do get slower with more cores due to parallel overhead as noted by Ross and Nick. In all cases so far you would be better off with a fast 8 core processor than a slower 24 core processor, consistent with Aaron's experience. That statement is based on the fact that for a fixed monetary cost, the number of processor cores trades off against the clock speed per core. Again, the exact numbers are subject to work still in progress.
However... having many cores can still be beneficial if you have lots of similar but independent problems to run, such as different initial guesses, different weights in the cost function, etc. The Matlab and CasADi parallelization tools function independently. So, if you have say 24 cores available you could set the Moco parallel property to use 8 cores (set_parallel(8)) per optimization, and then run the multiple optimizations in parallel within a Matlab parfor loop with 3 workers. In our experience (thus far) that will yield the results in a fraction of the time of running the multiple optimizations one after the other with set_parallel(24). The speedup in this case, at a project level rather than the level of a single optimization, can be substantial.
Best,
Brian
- Carlos Gonçalves
- Posts: 135
- Joined: Wed Jun 08, 2016 4:56 am
Re: Is MOCO CPU agnostic?
Excellent discussion. Especially that my rowing simulations with metabolic goals are during 8 hours in my old laptop https://www.linkedin.com/posts/carlos-g ... 60961-c88F
Dr. Umberger, any suggestions on how to run these parallelizations in Python?
Best regards.
Dr. Umberger, any suggestions on how to run these parallelizations in Python?
Best regards.