Reasons for slow response to actions, services, messages?
Hi guys!
I'm trying to debug a tool, which responds to service calls and action goals with a lag of 10 to 20 seconds depending on the load of the machine (50% - 90%). I determined this time lag using the debug output, which shows when the action server receives the action goal.
What could be the reasons for this significant time lag?
In my use case - pick & place using MoveIt - the node running the action servers and service providers is causing most of the load. So my first guess is, that something is blocking the respective callbacks.
Is this just a matter of not enough horse power? Which parts of the code could cause this blocking/slowing down?
Interesting is that actions, services, messages of nodes running in parallel are processed fine, i.e. there is only a small time difference between sending and receiving action goals, responding to services, receiving messages.
Thanks for your help!
/edit:
Additional info using the tools recommended in @Adolfo Rodriguez T 's answer:
- top
- with point cloud processing: ~40% idle, load average: ~2.5 (quad-core CPU: i5-2500 CPU @ 3.30GHz), move_group process at ~130%
- w/o pcp: ~90% idle, load average: ~0.5
- iftop
- with pc: ~2Gb/s
- w/o pcp: ~5Mb/s
- sockets
- When idling few (~5) sockets show up every few seconds (5 - 10). For each motion plan action a few more sockets pop up.
As I mentioned above, the load is usually around 50 ~ 70% (desktop) / 50% ~ 90% (robot) (using top). iftop is an interesting tool! Shows me that the local traffic (lo interface) goes up to 2Gb/s, when starting to process point clouds.
Which process is generating the load? Is it the roscore or one / multiple of the started nodes?
The rosmaster load is neglectable. The main load comes from MoveIt's move_group node (more details added in the question). And it's only that node's topics, services and actions, which are processed extremely slow. The other nodes run fine, what is probably because the CPU is not fully used.