Hundreds of outbound connections to rosout, OS out of sockets [closed]

asked 2012-02-19 22:37:47 -0600

Victor Lopez gravatar image

updated 2012-02-29 21:14:02 -0600

This is the output of rosnode info of one of the nodes running in my computer, all nodes have a similar output.

Applications that usually do not use many resources (like robot_monitor) are consuming 100% of the CPU.


$ rosnode info /robot_monitor_8970_1329732898963 
--------------------------------------------------------------------------------
Node [/robot_monitor_8970_1329732898963]
Publications: 
 * /rosout [rosgraph_msgs/Log]

Subscriptions: 
 * /clock [rosgraph_msgs/Clock]
 * /diagnostics_agg [diagnostic_msgs/DiagnosticArray]

Services: None


contacting node http://denmark:40743/ ...
Pid: 8970
Connections:
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout
    * direction: outbound
    * transport: TCPROS
 * topic: /rosout
    * to: /rosout ...
(more)
edit retag flag offensive reopen merge delete

Closed for the following reason question is not relevant or outdated by tfoote
close date 2012-04-23 09:12:50

Comments

I'd love to be able to dig into this. Previous reports have not been able to provide reproducible test cases. Do you think you have something that is easy to reproduce? Furthermore, can you reproduce this w/o rosbridge running?

kwc gravatar image kwc  ( 2012-02-20 07:41:48 -0600 )edit

I'd second seeing if you can get this error sans rosbridge. Being so thread heavy and so severely abusive of rospy, it's a likely culprit. This is potentially related to the clients disconnecting without unsubscribing memory leak you spotted :). A fix for one may fix the other. I'll look into it.

tjay gravatar image tjay  ( 2012-02-20 09:53:39 -0600 )edit

I have not been able to determine the steps to reproduce it with or without rosbridge, but a coworker had this same issue without running rosbridge at all. He detected it because robotmonitor was using 99% CPU on top and lsof showed that it had hundreds of CLOSE-WAIT connections to localhost.

Victor Lopez gravatar image Victor Lopez  ( 2012-02-20 20:19:13 -0600 )edit

Can you at least post a the log files from the robot monitor node? Really can't help unless there's more to work with.

kwc gravatar image kwc  ( 2012-02-21 06:59:44 -0600 )edit

I've added the log output of robot_monitor. I had some errors attaching it so I had to copy it to the question.

Victor Lopez gravatar image Victor Lopez  ( 2012-02-29 21:34:36 -0600 )edit

Thanks, the log is helpful. Can you try the patch here: https://code.ros.org/trac/ros/changeset/16431 . It's done against Fuerte, but I believe it will work against diamondback as well. I do recommend moving to Electric, as this patch does not address the root (unknown) cause.

kwc gravatar image kwc  ( 2012-03-01 06:52:29 -0600 )edit

We're currently migrating to electric, but I will try this in my diamondback installation and see if I stop getting this issue. The problem is that since it's so random it might be a long time before I'm sure it is not happening.

Victor Lopez gravatar image Victor Lopez  ( 2012-03-01 20:59:03 -0600 )edit

I believe I've run into a similar issue which I was able to solve. It appears to be a symptom of multiple instances of rosout running and competing. They constantly restart in a race condition and thus create many rosout connections before they can be torn down.

Asomerville gravatar image Asomerville  ( 2012-12-12 13:33:57 -0600 )edit