actionlib server/client race conditions
I am working on a production-level piece of robotic software using ROS.
I've noticed in my extensive cloud testing that there are are rare occasions, where a SimpleActionServer will run setSuceeded(), but any getState() calls by the client still show the state as Active. I don't know exactly if getState() looks at the client's stored info, or polls the server (I'm about to dig into this code). If its the latter, then something is fundamentally wrong with the server. if its the former, then its clear that actionlibs can lose messages back and forth (Even on the same machine) and I don't know the best way to handle this.
Anyone ever see these issues and find a good solution?
Not an answer, but if you have such an MWE, making that available will make the job of reproducing and debugging it much easier if you can make it available.
What it appears is the setSuceeded() on the server side occasionally does NOT set the state from ACTIVE to succeeded. That is the client querying the server state see ACTIVE despite the fact that I know the server ran setSuceeded(). Unfortunately this is VERY rare and hard to reproduce.
Tarball of catkin package that just has a simple server and client that run forever can be found here. You may have to let it run for days to see it fail, and currently there isn't anything informative to help diagnose why it fails. https://www.dropbox.com/s/57hm0gnv1fl...
I ran the server/client in the tarball above and can see that the server sets the succeeded on the goal (And returns from setSucceeded), but the client never sees this. And has no way to see it.