Latency when sending goal to move_base and it starting
Hi All,
I'm currently working with the ROS navigation stack. I am using move_base through its actionlib interface. The problem I am experiencing is that I seem to consistently see ~7 seconds of latency between sending the goal through the move_base action client and when the goal begins to execute. Once the server starts executing the goal, the speed is just fine and updates from the action server to client show no significant latency.
I have checked out both the navigation stack and actionlib from github so I could instrument the code to chase down the source of this latency. Specifically, what I have found is that there is a consistent ~7 seconds of latency between the time sendGoal
(in simple_action_client.h) returns to when goalCallback
(in action_server_base.h) begins to handle the "goal request". I have modified initialize()
(in action_server_imp.h) to set TCP_NODELAY on the subscriber to the "goal" topic. One thought was that this message was getting lost in the ROS transport plumbing. Again, making this change seems to have no noticeable effect on this latency issue.
I have a couple of questions:
1) Is this normal / expected behavior? 2) Where should I look next?
If more context is needed, I'm happy to provide. However, we are not doing anything "exotic" in how we are using the advertised functionality. Any help is greatly appreciated.
Best, Tom
EDIT1: I should also note that I have tried instantiating the action client both with and without the extra thread to enable/disable using the global callback queue. Again, no difference.
EDIT2:
I'm getting somewhere. My hypothesis is that, in my setup, move_base is subscribed to too many "high frequency / large data" topics. Specifically, 4 sensors sources publishing QQVGA point clouds at 30 Hz. My assumption is that the /move_base/goal
topic is sharing the same message queue as these sensor sources (??). Empirically, when I remove two of the input sensor sources, the above described latency drops from ~7 seconds to ~3 seconds. What tipped me off to this was that in a separate shell, I can rostopic echo /move_base/goal
and I see the goal published from the action client immediately whereas in the move_base process, I saw the ~7 seconds of latency.
I plan to turn my attention to tuning my sensor sources to get the responsiveness from move_base that I need while not sacrificing fidelity and our perception subsystem. In the meantime, is there a mechanism (I am not aware of one) within ROS that would allow me to partition out the action server's message queue from the sensor topics coming into move_base?
Thanks in advance for any pointers.
Interesting question. I'm not sure where to start digging further. Can you reproduce this with all nodes being on a single machine (no networking other than loopback)?
Yes. For the nodes in question, they are located on the same machine.
You can change to multithreaded spinning or even provide separate calllback queues for different subscribers, which might help. However you likely have to modify move_base yourself for that. Check http://wiki.ros.org/roscpp/Overview/Callbacks%20and%20Spinning .
Also, maybe throttling the topics that seem to clog up move_base might help. I.e. only reduce the frequency / amount of data the move_base receives, while other parts of your system receive the full sensor data.
Lastly, a faster machine might also help dramatically.
@demmeln Thanks for the comments. A combination of using a multi-threaded spinner and (down) throttling my sensors is looking promising right now. Going to do more testing before confirming a definitive resolution.