message is published delayed (inconsistent)
I have a problem where pub.publish(...) gets called, but the actual publishing is not done directly. This delay in publishing happens occasionally, so it looks like a race condition. Also, when I add a delay after publishing it works correctly, no problems at all!
def vacuumActivity():
print("-before publishing-")
pub.publish("SETVACUUM0")
#time.sleep(1)
print("-after publishing-")
pose_goal = GetPoseGoal()
group.go(pose_goal, wait=True)
group.stop()
pub.publish("MOVEMENT_EXECUTED")
return
in this code pub.publish("SETVACUUM0") will occasionally be published delayed. When this happens it will be published just before pub.publish("MOVEMENT_EXECUTED"), so the next time pub.publish is called.
When I uncomment time.sleep(1) there are no issues and everything is working as expected.
I am publishing to a tcp_client_node, which is connected to a TCP/IP socket and handles messages to/from a PLC over TCP/IP. (I can add the code of that node if necessary, but I'm try to keep this post clean). My function 'vacuumActivity()' is part of 'robotACtivityExecuter()' which is called from my main:
if __name__ == "__main__":
rospy.init_node('activity_executer')
rate = rospy.rate(10)
rospy.on_shutdown(shutdownHook)
rospy.Subscriber("tcp_received", String, tcp_received_callback)
pub = rospy.Publisher("tcp_to_send", String, queue_size=10")
while not rospy.is_shutdown():
try:
robotActivityExecuter()
rate.sleep()
except rospy.ROSInterruptException:
pass
This is not a problem with starting nodes, because this behaviour is not during startup, and it happens inconsistent. This is not a problem with the queue_size; all messages are being published, but sometimes it's not directly.
Anyone who can help me how to get this to work correctly without a delay?
First: you're using Python, which is not known for its determinism. From your mention of "PLC" and the contents of the string ("SETVACUUM0") this seems to be a (traditional) automation context. Perhaps a more direct / less elaborate setup instead of using a separate node to proxy a TCP socket to a PLC via ROS topics and
rospy
would be worth investigating.Having written that: it might be you're seeing the effects of Nagle here. See rospy: Publisher Initialisation on how to disable that (
tcp_nodelay
).No guarantees though as you have a lot of moving parts (ie: intemediate steps and conversions), so there might be other causes.
Edit: re-reading your question: wouldn't a service-based design be better here?
SETVACUUM0
sounds like an action which needs to be initiated, and you have strict time-ordering requirements on actions. That's not something pub-sub was created for.Valid feedback, thank you! I 'more direct' setup is not desired, PLC is already deployed otherwise I'd prefer modbus. (or just add IO to my pc).I looked into services but I don't want to wait untill the tcp/ip call is finished, maybe I will setup a service to test the performance. Nagle sounds like a possible cause, I'll look into it. relevent nagle post
If the PLC has modbus support, couldn't you use one of the available
modbus
packages, or even integratepymodbus
directly into your node?Yes that would be possible, but the PLC has other tcp/ip clients. we want the robot also to be a tcp/ip client so there is one general protocol for all the clients connected to the PLC.
If you're happy I'm happy, but it seems strange to me to implement, debug and maintain your own protocol when you can actually use a standardised and well supported fieldbus instead.
In my own applications I always try to go for the least amount of moving parts, as they all introduce additional points of potential failure, increase maintenance overhead and attack surface.