Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up git history #350

Closed
pitdicker opened this issue Mar 28, 2018 · 24 comments
Closed

Clean up git history #350

pitdicker opened this issue Mar 28, 2018 · 24 comments

Comments

@pitdicker
Copy link
Contributor

When the repro was created, to places where rand lived in the rustc repro where combined. One from librand, and one from the standard library (writing from memory here). Things went a bit messy, and rand ended up with two git roots. git filter-branch did remove a lot, but still we have most of the commits to the rust repro until 2014.

I did some effort to clean things up (and that took a surprising amount of work) in https://github.com/pitdicker/rand_clean_history. That is half a year ago, so it would need updating.

Do we want to clean up the git history. Now we need a huge checkout, for what could be a couple of mb. And a large part of the history is basically undecipherable now.

The bad side is that it would cause breakage, and I don't know how bad. It would bring the forks and branches from others badly out of sync. Would merged PR's still be as explorable via github?

So I am not sure it is worth the effort, but would sure like to see this.

@vks
Copy link
Collaborator

vks commented Mar 29, 2018

Does that mean we should merge or close all open pull requests before doing this?

@dhardy
Copy link
Member

dhardy commented Mar 29, 2018

I used qgit to look at your repo and it still shows merged branched properly. A lot of the early history seems to be reformatting, but there's also some useful stuff there, like the introduction of the Rand trait. Looks good, but quite out of date.

I also wonder about branches like the 0.3 and 0.4 branches and release tags; are these going to need recreating?

As @vks says, it doesn't seem sensible to do this migration while we have many PRs open.

@pitdicker
Copy link
Contributor Author

Most of these PR's are from me, and I'll manage 😄. Then there are three PR's left. With the exploration I am doing in dhardy#82, I don't think we will take #198. #144 would need some work, but shouldn't be hard to rebase. I don't care much yet for #152, but we could always recreate that one.

Recreating branches and tags if necessary should not be all that much effort.
What I'd like to know more is, do we want to do this?

@vks
Copy link
Collaborator

vks commented Mar 29, 2018

What I'd like to know more is, do we want to do this?

I would be in favor of it, assuming it is not too much work.

@pitdicker
Copy link
Contributor Author

Updated https://github.com/pitdicker/rand_clean_history/. But somewhere along the way git started uploading the 100mb of old history, and I am not yet sure why.

@pitdicker
Copy link
Contributor Author

Created a new repro https://github.com/pitdicker/rand_clean/. It is less than 2 mb.

@pitdicker
Copy link
Contributor Author

This should be the operation to bring it over to this repo (tested on my rand_clean_history repro):

git clone https://github.com/pitdicker/rand_clean
cd rand_clean
git remote add nursery https://github.com/rust-lang-nursery/rand

# From the github interface:
# - make gh-pages the default branch
# - delete all other branches

# delete all tags (one by one -- ugly!)
git push --delete nursery 0.1.1
git push --delete nursery 0.1.2
git push --delete nursery 0.1.3
git push --delete nursery 0.1.4
git push --delete nursery 0.2.0
git push --delete nursery 0.2.1
git push --delete nursery 0.3.0
git push --delete nursery 0.3.1
git push --delete nursery 0.3.10
git push --delete nursery 0.3.11
git push --delete nursery 0.3.12
git push --delete nursery 0.3.13
git push --delete nursery 0.3.14
git push --delete nursery 0.3.15
git push --delete nursery 0.3.16
git push --delete nursery 0.3.17
git push --delete nursery 0.3.18
git push --delete nursery 0.3.19
git push --delete nursery 0.3.2
git push --delete nursery 0.3.20
git push --delete nursery 0.3.21-pre.0
git push --delete nursery 0.3.22
git push --delete nursery 0.3.3
git push --delete nursery 0.3.4
git push --delete nursery 0.3.5
git push --delete nursery 0.3.6
git push --delete nursery 0.3.7
git push --delete nursery 0.3.8
git push --delete nursery 0.3.9
git push --delete nursery 0.4.0-pre.0
git push --delete nursery 0.4.1
git push --delete nursery 0.4.2
git push --delete nursery derive_rand-0.1.1
git push --delete nursery rand_core-0.1.0-pre.0
git push --delete nursery rand_derive-0.3.0
git push --delete nursery rand_derive-0.3.1
git push --delete nursery rand_macros-0.1.10
git push --delete nursery rand_macros-0.1.2
git push --delete nursery rand_macros-0.1.3
git push --delete nursery rand_macros-0.1.4
git push --delete nursery rand_macros-0.1.5
git push --delete nursery rand_macros-0.1.6
git push --delete nursery rand_macros-0.1.7
git push --delete nursery rand_macros-0.1.9

# restore branches and tags
git push master 0.4 0.3 --tags

# From the github interface:
# - make master the default branch

@pitdicker
Copy link
Contributor Author

@dhardy What do you think, would it be okay if I execute the commands above this afternoon? Very, very carefully and with plenty of backups...

@dhardy
Copy link
Member

dhardy commented Apr 4, 2018

@pitdicker make a local clone of the old repo first, then go ahead. But you don't need to delete remote branches; you can simply overwrite: git push origin master:master (or LOCAL_ID:REMOTE_ID).

Also, please check these are actually equivalent first, i.e. git diff origin/master..master.

I think we should also keep prior history around for a while with an old-master branch. This should keep the PR diffs visible. It unfortunately means that cloning the repo will still be slow unless shallow or single-branch checkout is used: git clone https://github.com/rust-lang-nursery/rand.git --single-branch.

@pitdicker
Copy link
Contributor Author

Done. I checked with git diff, and the 0.3, 0.4 and master branches are the same.

I think we should also keep prior history around for a while with an old-master branch. This should keep the PR diffs visible.

Yes, good idea. No need to hurry here.

@pitdicker
Copy link
Contributor Author

Oops, apparently GitHub doesn't like PRs with more than 10.000 commits that differ. I will rebase #320 and document the steps.

@dhardy
Copy link
Member

dhardy commented Apr 4, 2018

Great. git reset --hard origin/master and my 'master' is good. git cherry-pick works to recreate PRs.

@pitdicker
Copy link
Contributor Author

Okay. The way I do it may be primitive... It works for me:

schermafdruk

# Look up the commit SHA1 where your branch diverged
# git replace <old> <new>
git replace 463491 dbc1f1
# Make the replacement permanent
git filter-branch -- --all
# Rebasing on the latest master may be a good idea
git rebase master

@pitdicker
Copy link
Contributor Author

It is now 3½ months later, and things have gone reasonably smooth.

I just found out the 0.1.1 tag was on its own line of commits from its own root, probably from the old_master branch. Just pushed the correct tag.

For some reason GitHub shows the derive_rand-0.1.1 tag as latest version, but only recently. No idea what's going on. It is one of the three tags with the same oldest date, 2015-02-03.

I think we should also keep prior history around for a while with an old-master branch. This should keep the PR diffs visible.

How long should we keep the old_master branch around?

@dhardy
Copy link
Member

dhardy commented Jul 23, 2018

I don't think we should care much about rand_derive tags.

old_master was supposed to allow us to look at old PRs on GitHub and still make sense of them, but it doesn't work. So there isn't really any point keeping it I think.

@mpizenberg
Copy link

Hi, I just cloned the repo and was surprised git downloaded 85 mb of data. I guess it's linked to this issue?

@dhardy
Copy link
Member

dhardy commented Apr 22, 2019

Is it already so large? But yes, this effort is the reason it's not even bigger. I don't believe we can reduce it again without a lot of disruption, however we should consider migrating some sub-crates to new repos to stop the issue getting worse.

@mpizenberg
Copy link

Some of the worst offenders seem to be large binary files, and fonts, committed around 2013 in the history.

rand-large-files

@dhardy
Copy link
Member

dhardy commented Apr 22, 2019

What would you suggest doing about this? Rebasing the history again? I'm not keen; we have quite a few active PRs and many complete ones whose history would be messed up.

@pitdicker
Copy link
Contributor Author

I think the old-master branch still needs to be deleted.

@mpizenberg
Copy link

I don't know the rand code base at all sorry. I just tried to pinpoint some commits bringing few Mb to the repo. Since you said

old_master was supposed to allow us to look at old PRs on GitHub and still make sense of them, but it doesn't work. So there isn't really any point keeping it I think.

I was thinking, if those commits are from that old_master branch (that I don't know) maybe deleting it would save quite a lot Mbs.

@mpizenberg
Copy link

Actually I just checked and most of those heavy files seem to be only accessible from old-master. Right now, cloning only the master branch of this project with git clone -b master --single-branch ... results in 3Mb of download which is a lot better.

@dhardy
Copy link
Member

dhardy commented Apr 22, 2019

Aha. Thanks @pitdicker. Well, we haven't needed that branch for a long time so we should be able to safely delete it (I will keep a local copy for a while just in case).

@dhardy
Copy link
Member

dhardy commented Apr 22, 2019

$ git clone [email protected]:rust-random/rand.git
Cloning into 'rand'...
remote: Enumerating objects: 172, done.
remote: Counting objects: 100% (172/172), done.
remote: Compressing objects: 100% (98/98), done.
remote: Total 12363 (delta 113), reused 115 (delta 73), pack-reused 12191
Receiving objects: 100% (12363/12363), 4.29 MiB | 365.00 KiB/s, done.
Resolving deltas: 100% (7094/7094), done.
16:37 dhardy@underdesk:~/tmp$ du -hs rand
6.1M    rand

Seems to have fixed it. Thanks @mpizenberg for bringing this up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants