ROS Resources: Documentation | Support | Discussion Forum | Index | Service Status | ros @ Robotics Stack Exchange
Ask Your Question
0

Watchdog to monitor node status

asked 2017-04-05 04:41:12 -0600

l4ncelot gravatar image

Hi,

I'm working on autonomous quadcopter system and I'm using ROS for my algorithms. What I would like to achieve is to create some watchdogs monitoring all my ROS nodes. I also want it to be able to restart node if it detects it's shutdown (for whatever reason).

But I'm quite lost. Can someone give me an advice or perhaps reference to existing code which does exactly this?

Thank you.

edit retag flag offensive close merge delete

3 Answers

Sort by ยป oldest newest most voted
1

answered 2017-04-06 05:23:21 -0600

NEngelhard gravatar image

You could use a bond:

#! /usr/bin/python

import rospy
from bondpy import bondpy

def formed():
    print "We got a bond"

def broken():
    print "SOMEONE DIED"

rospy.init_node("A", anonymous=True)
b = bondpy.Bond("heartbeat_topic_name","bond_name", on_broken=broken, on_formed=formed)
b.start()
if not b.wait_until_formed(): 
    raise Exception('Bond could not be formed')

rospy.spin()

Just start this node twice, wait for the bond to be established and kill one of the nodes. After a short timeout (4 seconds) the other node will notice the missing partner. With this method, you don't need to poll rosnode list or even know the name of the other node, you just have to use the same topic and bond_name to establish the connection and have a callback triggered as soon as one node does not send its heartbeat anymore.

edit flag offensive delete link more

Comments

Interesting answer! Didn't know this technique! But how would use bonding with pre-existing nodes?

gstavrinos gravatar image gstavrinos  ( 2017-04-06 05:31:37 -0600 )edit

So basically if I understand it correctly, if I want to monitor e.g. 10 different nodes I would have to create some watchdog node connected to all of those 10 different nodes with 10 different bonds?

l4ncelot gravatar image l4ncelot  ( 2017-04-06 06:49:50 -0600 )edit

To be honest, this code was the first time I used a bond :) But yeah, you can bond exactly two processes, so you will have N bonds.

NEngelhard gravatar image NEngelhard  ( 2017-04-06 09:19:29 -0600 )edit
1

Cool, I'll try that option. Thanks

l4ncelot gravatar image l4ncelot  ( 2017-04-06 09:47:09 -0600 )edit

@gstavrinos: you have to change the node, so it won't work with existing nodes.

NEngelhard gravatar image NEngelhard  ( 2017-04-06 10:02:22 -0600 )edit
2

answered 2018-10-04 03:11:38 -0600

I know it's a bit late to answer this but I found node_alive to be very helpful in getting the state of the nodes in the system. It publishes a DiagnosticArray on /diagnostics topic with the information about all the nodes visible to rosmaster and their status.

edit flag offensive delete link more

Comments

I'll have a look at it, thanks.

l4ncelot gravatar image l4ncelot  ( 2018-10-04 03:55:41 -0600 )edit
2

answered 2017-04-06 04:49:36 -0600

updated 2017-04-06 04:50:44 -0600

When I want to do something like that, I mainly use two techniques:

  • (This is more ROS-formal) Use the respawn="true" roslaunch parameter.

e.g.

<node pkg="rospy_tutorials" type="listener" name="listener"  respawn="true" />

You can check the official roslaunch wiki page for more info.

  • (This is more hacky, but works like a charm) Use the rosnode command-line tool inside python.

e.g.

nodes = os.popen("rosnode list").readlines()
for node in nodes:
    # Make your tests here
edit flag offensive delete link more

Comments

Not all nodes are removed from the list when they die, especially if they crash (that's the reason why there is 'rosnode cleanup') A node that is listed in 'rosnode list' does not have to be alive!

NEngelhard gravatar image NEngelhard  ( 2017-04-06 09:21:03 -0600 )edit

Question Tools

1 follower

Stats

Asked: 2017-04-05 04:41:12 -0600

Seen: 3,970 times

Last updated: Oct 04 '18