Bloom takes ~15 minutes to pull down rosdistro

asked 2020-04-24 14:23:13 -0500

jbinney gravatar image

I've noticed this when releasing packages for noetic. It looks like it is downloading about 140 megabytes, and github is letting me download at 50 to 200 KB/sec. I'm on a gigabit connection and have a fast connection to other sites.

Open a pull request from 'jonbinney/rosdistro:bloom-laser_proc-0' into 'ros/rosdistro:master'?
Continue [Y/n]? 
==> git checkout -b bloom-laser_proc-0
Switched to a new branch 'bloom-laser_proc-0'
==> Pulling latest rosdistro branch
remote: Enumerating objects: 9, done.
remote: Counting objects: 100% (9/9), done.
remote: Compressing objects: 100% (7/7), done.
Receiving objects:   7% (10814/140559), 3.65 MiB | 69.00 KiB/s

Anyone know what's going on? Is the rosdistro really just that big? Anyway to speed this up?

edit retag flag offensive close merge delete

Comments

I haven't seen 15 minutes before, but 1-3 is typical for me, which is still a little irritating. I think its because of the crazy number of diffs its applying from 41,000+ commits. Perhaps we should squash the first 30,000 into 1 to help.

stevemacenski gravatar image stevemacenski  ( 2020-04-24 16:03:37 -0500 )edit

Interestingly i can clone the the rosdistro repo in 30 seconds. Not sure why "Pulling latest rosdistro branch" takes so long.

jbinney gravatar image jbinney  ( 2020-04-24 18:22:37 -0500 )edit
1

Seems to be a github issue. I just reproduced it by cloning using an xauth token - it ran very slow. Then tried again, and it ran quickly. It looks like github has had some "degradation" events on their status page.

jbinney gravatar image jbinney  ( 2020-04-24 18:33:35 -0500 )edit

Huh interesting. Perhaps my internet is just crap then or 30 seconds feels like a small eternity these days.

stevemacenski gravatar image stevemacenski  ( 2020-04-24 21:48:33 -0500 )edit

In my experience, most of the "why does X take so long with tool Y" where Y uses some part of Github come down to Github either intentionally throttling or having some sort of transient problem.

It's like that with wstool, rosdep, Bloom and some of their friends.

gvdhoorn gravatar image gvdhoorn  ( 2020-04-25 07:11:50 -0500 )edit

There was an attempt to use shallow clones https://github.com/ros-infrastructure... but it didn't end up working for the contributor. @tfoote and I recently had a casual discussion about the sustainability of the GitHub-as-a-Database approach used by the official rosdistro and the conclusion was that it can't last forever without taking some affordances but there's no current plan to change. One thing that might be worth doing is adding additional special-case behavior to make both the content change and pull request via the GitHub API when the target rosdistro index is hosted on GitHub using either the repository contents API https://developer.github.com/v3/repos... or the Git Data API https://developer.github.com/v3/git/ and falling back to a local clone strategy only when that fails.

nuclearsandwich gravatar image nuclearsandwich  ( 2020-04-25 09:55:17 -0500 )edit

Just to make sure someone hears this: whatever we end up doing, it would be really nice to make sure we keep history in tact. So the squash suggested by @stevemacenski would not be what I would like to see happen.

Future software archeology (like we did with rosinstall_generator_time_machine) is made really difficult with such operations.

+1 to see whether GH's APIs could be used for this.

gvdhoorn gravatar image gvdhoorn  ( 2020-04-25 10:00:25 -0500 )edit

One alternative to shallow clones might be to have bloom use the "--reference" option when cloning rosdistro. Then there could be one long-lived clone in "~/.config/bloom/rosdistro" and only new commits would be pulled down each time the user made a release.

jbinney gravatar image jbinney  ( 2020-04-25 12:46:43 -0500 )edit