Best practices or code examples of how a "complete system" manages nodes
Most good examples of ROS systems I've seen on Github make strong use of launch files, and keep a good separation between nodes.
What I have not seen is how a complete robot system handles node faults. For example if a node goes down while the robot is running, there are probably several potential courses of action:
- try to relaunch the node
- if after 3 attempts the node does not come back, log or report this to some notification system
While that seems self-explanatory, I am curious if anyone has an example of the implementation of this?
My primary issue with ros2 has always been a lack of open source "professional projects" where one can learn how a production-grade ros2 code base looks like. There are quite a few using ros-neotic out in the wild however.