move_base acting weird with nodes running on different hosts.

asked 2016-02-08 04:56:04 -0600

Sietse
168 ●31 ●36 ●40

updated 2016-02-09 08:59:40 -0600

Hello All,

Using examples from "ROS by example" to get move_base working, the trajectory is erratic when the code, apart from rviz, is running on ANOTHER host. It gets there in the end, but is moves very strange, as if it is very "drunk"...

The commands I use are:

  roscore
  roslaunch rbx1_bringup fake_turtlebot.launch
  roslaunch rbx1_nav fake_move_base_blank_map.launch
  rosrun rviz rviz -d `rospack find rbx1_nav`/nav.rviz

When all commands are run on my regular PC (PC, i5, ubuntu 14.04, indigo), all is well. But when the 2 roslaunch commands are running on another host the strange paths occur.

That other host is a quad core i.MX 6Quad from Freescale, 1GHZ, 1GB, running ubuntu 14.04 with indigo. There is no swapping, and nothing else is running there.

Communication is via a 5GHz Wifi link that can sustain 15MB/sec.

I would assume that for this simple setup the performance of the arm-system and wifi-link is not the problem. What could be the cause of this, and how can I proceed in nailing this down further?

EDIT:

roswtf gives the SAME output in both situations. It does give the following error:

ERROR The following nodes should be connected but aren't:
 * /move_base->/move_base (/move_base/global_costmap/footprint)
 * /move_base->/move_base (/move_base/local_costmap/footprint)

but that probably is not relevant now.

move_base often does not start properly, the initialization normally ends with "odom received!", but sometimes it takes a long time (20 seconds) for that last line to appear. It also sometimes does not come at all. I already asked about that in my question "remote core with move group". When I stop the hung program it complains about: Failed to contact master. But it uses it all the time....

When I use the move_base parameters that come with this example, I get the following warnings when the controller and move_base are running on the arm-board.

[ WARN] [1455012218.435531811]: Map update loop missed its desired rate of 3.0000Hz... the loop actually took 1.0281 seconds
[ WARN] [1455012218.953750145]: Control loop missed its desired rate of 3.0000Hz... the loop actually took 0.5194 seconds

I find it strange that calculation one loop takes that much time, given that both global and local maps are empty. Am i missing something?

If I lower the rate to 1 Hz it basically works, but it has hard time following the track, and most of the time it looks like an highly unstable system. After changing some more parameters I can get it better, but only a little bit: not usable.

If move_base now only gives a single /cmd_vel per second, I can imaging that it is difficult to keep it on track. Wouldn't it be better if move base gave a complete trajectory to the final goal, and update that each second? Then the rates of move_base and execution the trajectory could be decoupled.

In summary, two questions. Why does the "odom received!" sometimes not come or after a long delay? And ... (more)

edit retag flag offensive close merge delete

Comments

Communication is via a 5GHz Wifi link that can sustain 15GB/sec.

Are you sure that is correct? That is 15 Gigabyte per second. What kind of technology are you using?

gvdhoorn ( 2016-02-08 10:23:17 -0600 )edit

Oops. 15MB/sec of course, Changed it.

Sietse ( 2016-02-08 11:46:32 -0600 )edit

do you get any useful feedback when you run roswtf ?

nickw ( 2016-02-08 14:56:29 -0600 )edit

I see no differences, please see the edit in my question.

Sietse ( 2016-02-09 08:24:40 -0600 )edit

add a comment

1

answered 2016-02-10 05:42:00 -0600

Sietse
168 ●31 ●36 ●40

updated 2016-02-10 05:44:41 -0600

I've figured it out. The control loop on the arm-board takes about 0.7 seconds, so you have to set the rate accordingly. If you do that it just works, which will include the weirdness. This is due to the design of move_base in that it only sets a new /cmd_vel after each loop. I now realise that this is exacly what I see.

So the question is basically answered, but I want a better behaviour of course. I see several possibilities.

First, and that is what I will use as a workaround, is running move_base on the main PC. And this works perfectly. But because we anticipate to have 10 of these robots running around, I really want the code to run on the arm-board.

Getting a more powerful computer on the robot would be defeat, especially because my intuition says that it should be possible with the current processor. And a more powerful computer would need more power etc.

So in the end I probably need to make move_base faster. As I already wrote in my question a possibility would be to let move_base deliver the complete path, instead of /cmd_vel messages. Then a separate node could use these paths to create /cmd_vel messages at a higher rate. The complete path is already available, as seen in rviz. At some future point I will design this, or does something like this already exists?

Thanks for the help, Sietse

edit flag offensive delete link

Comments

How did you install ROS on the i.MX? Was that a from-source install? If so, did you build everything with optimisations turned on?

gvdhoorn ( 2016-02-10 07:57:43 -0600 )edit

I used the regular ROS repository for the arm architecture. Anything to gain by a source install?

For the ubuntu install I used the recipe from this link

Sietse ( 2016-02-10 08:35:21 -0600 )edit

Anything to gain by a source install?

No. The release binaries are already built with optimisations turned on. It's just that ppl sometimes forget to enable them when doing a from-source installation.

gvdhoorn ( 2016-02-10 08:52:28 -0600 )edit

add a comment

1

answered 2016-02-09 08:45:18 -0600

Stefan Kohlbrecher

24361 ●173 ●299 ●444 https://www.energy-rob...

Here are a few typical checklist items for a multiple machine setup:

Is the network setup correct (i.e. does communication work bidirectionally)? If it is wrong, it sometimes is possible to list topics on a machine for instance, but no data can be received when subscribing.
Is there a timesync offset? If clocks between machines are not synced all sorts of weird things can happen due to components waiting for tf much longer than they should (or failing completely).
Is CPU consumption on one of the machines much too high (A low power ARM CPU could be overburdened and unable to keep up)?
Is bandwidth good enough so all data can be transfered (If the transmitted bandwidth is close to maximum there's a pretty good chance of comms dropouts).

Also, I think you refer solely to move_base here? If so, you should probably edit your question to replace all references to "move_group" (which is part of MoveIt! arm motion planning and unrelated to move_base).

edit flag offensive delete link

Comments

Thanks. Network, time and bandwidth are all ok I think. Only CPU load is a bit high.The core that does the move_base hovers around 80% when running. But that probably doesn't explain why the initialization of the move_base command often takes so long or doesn't finish

Sietse ( 2016-02-09 09:22:12 -0600 )edit

add a comment

move_base acting weird with nodes running on different hosts.

Comments

2 Answers

Comments

Comments

Question Tools

Stats

Related questions

move_base acting weird with nodes running on different hosts. edit

Comments

2 Answers

Comments

Comments

Question Tools

Stats

Related questions

move_base acting weird with nodes running on different hosts.