Hundreds of outbound connections to rosout, OS out of sockets [closed]
This is the output of rosnode info
of one of the nodes running in my computer, all nodes have a similar output.
Applications that usually do not use many resources (like robot_monitor) are consuming 100% of the CPU.
$ rosnode info /robot_monitor_8970_1329732898963
--------------------------------------------------------------------------------
Node [/robot_monitor_8970_1329732898963]
Publications:
* /rosout [rosgraph_msgs/Log]
Subscriptions:
* /clock [rosgraph_msgs/Clock]
* /diagnostics_agg [diagnostic_msgs/DiagnosticArray]
Services: None
contacting node http://denmark:40743/ ...
Pid: 8970
Connections:
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout
* direction: outbound
* transport: TCPROS
* topic: /rosout
* to: /rosout ...
I'd love to be able to dig into this. Previous reports have not been able to provide reproducible test cases. Do you think you have something that is easy to reproduce? Furthermore, can you reproduce this w/o rosbridge running?
I'd second seeing if you can get this error sans rosbridge. Being so thread heavy and so severely abusive of rospy, it's a likely culprit. This is potentially related to the clients disconnecting without unsubscribing memory leak you spotted :). A fix for one may fix the other. I'll look into it.
I have not been able to determine the steps to reproduce it with or without rosbridge, but a coworker had this same issue without running rosbridge at all. He detected it because robotmonitor was using 99% CPU on top and lsof showed that it had hundreds of CLOSE-WAIT connections to localhost.
Can you at least post a the log files from the robot monitor node? Really can't help unless there's more to work with.
I've added the log output of robot_monitor. I had some errors attaching it so I had to copy it to the question.
Thanks, the log is helpful. Can you try the patch here: https://code.ros.org/trac/ros/changeset/16431 . It's done against Fuerte, but I believe it will work against diamondback as well. I do recommend moving to Electric, as this patch does not address the root (unknown) cause.
We're currently migrating to electric, but I will try this in my diamondback installation and see if I stop getting this issue. The problem is that since it's so random it might be a long time before I'm sure it is not happening.
I believe I've run into a similar issue which I was able to solve. It appears to be a symptom of multiple instances of rosout running and competing. They constantly restart in a race condition and thus create many rosout connections before they can be torn down.