cuda performance [closed]
I'm currently using ROS Electric and have a simple gazebo model consisting of a rectangular base, 3 wheels (3 continuous joint) and a steering joint (1 revolute joint). I've set ode to run at 100hz. When I use CPU "<steptype>quick</steptype",>parallel_quick</steptype>". I've tried tweaking the CUDA batch sizes and block sizes however it seems to be worse than the default setup. I've also spawned 10 of the same vehicles and tried to test both CPU and CUDA performances. CPU wins out as well. Is there something I'm missing? I'm guessing the performance hit is in the memory transfer between GPU and CPU and CUDA is not a good fit for my setup. I'm using a Quadro FX 4800 with 192 cuda cores.
Thanks alot, WW
That's really funny. I'm seeing the same thing on my robot using pcl_cuda. Have you profiled your code and verified that the performance drain is due to the memory transfer?