rosbag randomly missing topics on record
Hi all. We've got a very weird case. Our configuration:
- main_computer: running roscore
- slave_A_computer (ROS_MASTER_URI=main_computer)
- slave_B_computer (ROS_MASTER_URI=main_computer)
- slave_C_computer (ROS_MASTER_URI=main_computer)
- ROS: indigo
- Ubuntu 14.04 (x86)
All computers are connected in one GigE network and synced using chrony. On main_computer we run https://github.com/ros-drivers/nmea_n... and on each slave computers we run GigE camera driver and record the image and gps topics (along with some other low bandwidth stuff (diagnostics, tf, etc)) with rosbag (using c++ program, i.e. rosrun rosbag record). Recordings on all 3 computers are done simulatenously but locally on each computer.
Now a mysterious thing that happens is that in every e.g. 1 out of 100 bags recorded on slave computers we do not get ONE of the gps topics (out of 4 that nmea_navsat_driver publishes). So for instance slave_A and slave_B have all the topics but on slave_C /gps/fix
would be missing. This is all happening on a field robot during operation and where it is impossible to actually pause and debug.
So my question is how to debug an issue like this? Clearly the topic is being advertised and active since 2 computers get it. Also since the 3rd computer gets 3 of 4 topics from the nmea_navsat_driver the network link is up. Is it then a rosbag tool that is not able to build up all socket connections? Can I somehow constantly log a list of active topics for every computer?
thx upfront
Hi Dejan, curious problem. It happening in only 1 of 100 cases is really weird. I guess you know about
rostopic list
androstopic hz /gps/fix
?Thx @Felix Endress. I know yes and I could for instance write a program that would monitor that. And I am pretty sure it would tell me that the topic is missing and when it happens but then what do I do next?
Meanwhile I also found out that we are also missing some other topics from master comp.