Revision history - ROS Answers: Open Source Q&A Forum

Submitting an answer to my own question since I didn't get any answers.

I ended up doing pretty much what I described in my question. Full post here: https://christophebedard.com/ros-tracing-message-flow/

Submitting an answer to my own question since I didn't get any answers.

I ended up doing pretty much what I described in my question. Full post here: https://christophebedard.com/ros-tracing-message-flow/

Here's a summary/excerpt:

In order to do what I've described above, similar to what I mentioned, some information is needed on:

connections between publishers and subscribers
subscriber/publisher queue states
network packet exchanges

We first need to know about connections between nodes. The ROS instrumentation includes a tracepoint for new connections (new_connection). It includes the address and port of the host and the destination, with an address:port pair corresponding to a specific publisher or subscription.

We also need to build a model of the publisher and subscriber queues. To achieve this, we can leverage the relevant tracepoints. These include a tracepoint for when a message is added to the queue (publisher_message_queued, subscription_message_queued), when it’s dropped from the queue (subscriber_link_message_dropped, subscription_message_dropped), and when it leaves the queue (either sent over the network to the subscriber (subscriber_link_message_write), or handed over to a callback (subscriber_callback_start)). We can therefore visualize the state of a queue over time!

Finally, we need information on network packet exchanges. Although this isn’t really necessary for this kind of analysis, it allows us to reliably link a message that gets published to a message that gets received by the subscriber. This is good when building a robust analysis, and it paves the way for a future critical path analysis based on this message flow analysis.

This requires us to trace both userspace (ROS) and kernel. Fortunately, we only have to enable 2 kernel events for this (net_dev_queue for packet queuing and netif_receive_skb for packet reception). It saves us a lot of disk space, since enabling many events can generate multiple gigabytes of trace data, even when tracing for only a few seconds! Also, as the rate of generated events increases, the overhead also increases. More resources have to be allocated to the buffers to properly process those events, otherwise they can get discarded or overwritten.

Result:

C:\fakepath\result_analysis_initial_zoom.png

Some links for actual code/further information:

Trace Compass code for this analysis: https://git.eclipse.org/c/tracecompass.incubator/org.eclipse.tracecompass.incubator.git/tree/tracetypes/org.eclipse.tracecompass.incubator.ros.core and https://git.eclipse.org/c/tracecompass.incubator/org.eclipse.tracecompass.incubator.git/tree/tracetypes/org.eclipse.tracecompass.incubator.ros.ui
My fork of the original instrumentation fork. I improved and fixed some small things, including adding information about latched messages. https://github.com/christophebedard/ros_comm/tree/tc4ros
My fork of the original tracetools package. https://github.com/christophebedard/tracetools/tree/tc4ros
Repo with a few test traces and a .repos file to easily setup a workspace to trace ROS. https://github.com/christophebedard/tracecompass_ros_testcases

Submitting an answer to my own question since I didn't get any answers.

I ended up doing pretty much what I described in my question. Full post here: https://christophebedard.com/ros-tracing-message-flow/

Here's a summary/excerpt:

In order to do what I've described above, similar to what I mentioned, some information is needed on:

connections between publishers and subscribers
subscriber/publisher queue states
network packet exchanges

We first need to know about connections between nodes. The ROS instrumentation includes a tracepoint for new connections (new_connection). It includes the address and port of the host and the destination, with an address:port pair corresponding to a specific publisher or subscription.

We also need to build a model of the publisher and subscriber queues. To achieve this, we can leverage the relevant tracepoints. These include a tracepoint for when a message is added to the queue (publisher_message_queued, subscription_message_queued), when it’s dropped from the queue (subscriber_link_message_dropped, subscription_message_dropped), and when it leaves the queue (either sent over the network to the subscriber (subscriber_link_message_write), or handed over to a callback (subscriber_callback_start)). We can therefore visualize the state of a queue over time!

Finally, we need information on network packet exchanges. Although this isn’t really necessary for this kind of analysis, it allows us to reliably link a message that gets published to a message that gets received by the subscriber. This is good when building a robust analysis, and it paves the way for a future critical path analysis based on this message flow analysis.

This requires us to trace both userspace (ROS) and kernel. Fortunately, we only have to enable 2 kernel events for this (net_dev_queue for packet queuing and netif_receive_skb for packet reception). It saves us a lot of disk space, since enabling many events can generate multiple gigabytes of trace data, even when tracing for only a few seconds! Also, as the rate of generated events increases, the overhead also increases. More resources have to be allocated to the buffers to properly process those events, otherwise they can get discarded or overwritten.

Result:

~~C:\fakepath\result_analysis_initial_zoom.png~~ $C:\fakepath\result_analysis_initial_zoom.png$

Some links for actual code/further information:

Trace Compass code for this analysis: https://git.eclipse.org/c/tracecompass.incubator/org.eclipse.tracecompass.incubator.git/tree/tracetypes/org.eclipse.tracecompass.incubator.ros.core and https://git.eclipse.org/c/tracecompass.incubator/org.eclipse.tracecompass.incubator.git/tree/tracetypes/org.eclipse.tracecompass.incubator.ros.ui
My fork of the original instrumentation fork. I improved and fixed some small things, including adding information about latched messages. https://github.com/christophebedard/ros_comm/tree/tc4ros
My fork of the original tracetools package. https://github.com/christophebedard/tracetools/tree/tc4ros
Repo with a few test traces and a .repos file to easily setup a workspace to trace ROS. https://github.com/christophebedard/tracecompass_ros_testcases

Revision history [back]