Hybrid environment setup during migration from SVN to GIT

Moving a +-150 developers dev group from SVN to git is a challenging task. Management wanted some POC and pilot. Team leaders wanted that the team agenda and timeline will be taken under consideration. And no one wanted to decrease productivity more than necessary.
In an effort to make everyone happy we established an hybrid SVN-GIT environment for the transition period. Started on the POC phase and lasted until the last team was trained and moved to git, this transition took a little less than a whole year. In this article i will describe the hybrid environment and working methodologies we use during the transition period.
Since we were using a “Branch on release” methodology, we planned to move only the trunk to git. Other branches (released versions maintenance branches) will continue to live in svn. There was very little (if any) merging from the maintenance branches to trunk and we wanted to keep things simple. so the main requirements were:

    1. Every change being pushed to git central repository must be committed to SVN as well.
    2. Git users must have an easy access to SVN commits from within git local repo.
    3. All developers (both SVN and GIT users) must have an immediate feedback, by email, from jenkins on build-breaking commits.
    4. There should be a possibility to abandon GIT and move back to SVN without losing any development history.

The resulting environment was made out of 3 repositories: The projects original SVN repository, A ‘bare’ GIT repository (git-main.git) that was the main sync point for all developers that moved to GIT and, eventually, became the only code repository of the project trunk. And another git repository that was transparent to developers and functioned as sync point between SVN and the main GIT repo. I will call it the ‘git-svn repo’.
SVN server was remote and unavailable for administration. Both GIT repositories were installed on a RHEL5 server. the git-main repo was exposed by httpd and ssh to all developers.
The ‘git-svn repo’ was indeed a git-svn clone of the SVN trunk url. It was created using the git-svn tool:

git-svn clone http://svn-server/svn/repo/trunk

Only trunk was cloned so all git-svn configuration regarding branches and tag handling were unnecessary.
The result of the git-svn clone command is a non-bare git repository. It means that pushing to it is disabled by default. Instead of opening the repo for push, it was cloned again using:

cd /path/to/git-main/repo
git clone --bare /path/to/git-svn/repo

To The git-main repository (the new bare repo) we add another branch “work” and made it the current (by editing the HEAD file in the bare repository) this branch was the one all developers were pulling from and pushing to.
From the developer-using-git point of view, after cloning the git-main repo, there were 2 remote-tracking branches:
master and work. Work was the one you work on. Master was the one representing SVN trunk. Merges from master to work (syncing the changes from SVN trunk to the working git environment) was the developers responsibility and had no automation.
However, Keeping the master branch synced with the SVN trunk HEAD, and sync the changes pushed to work branch into svn trunk was the build team responsibility. We did it using a jenkins jobs and a sync script.
The first job svn2git was polling from SVN (with 1 min time interval) and run a very simple script:
(jobs were running inside a jenkins slave on the git server)

#go to the git-svn repo.
cd /path/to/git-svn/repo
#make sure we are on the correct branch
git checkout master
#get SVN latest
git svn rebase
#and push it to the git main repo.
git push -f git-main master:master

For this script to run a remote must be defined on the git-svn repo pointing to the git-main repo:

git remote add git-main file:///path/to/git-main/repo

That way we kept the master branch synced with the SVN trunk HEAD.
The second job, git2svn, was polling from git work branch (again every one min) and run this script upon every push:

git-svn-autosync.sh </path/to/git/repo> </path/to/git-svn/repo> <branch to sync> <svn tracking branch> <git-repo remote name>

The git-svn-autosync.sh script was the heart of the sync infrastructure. It is a spinoff of a (bit simpler) script i found online. Unfortunately I was not able to find it again so i can’t give a proper credit. Here I will only go over the main logic of it.
Generally, the script does those tasks (in the git-svn repo context):

  1. Fetches the latest commits from the git-main repository. Using a temporary remote and branch.
  2. Create a log message that document the new commits.
  3. Merge the commits into the master branch (the master is the branch that track SVN trunk)
  4. Call git-svn dcommit command that push the new commits to SVN.

In more details this is how we do it:

We use “die” to clean and restore the environment in case of failure (a conflict in merge or rebase is treated as failure):

die() {
       echo "sync-git-to-svn died: $*"
       git rebase --abort  #in case arebase failed
       git branch -D ${TMP_DST_BRANCH} #remove the temp branch we use to sync.
       git remote rm ${TMP_REMOTE_NAME} # Remove the temp remote we use the sync
       git reset --hard ${HEAD_SHA1} #we record the current commit before we start so we can reset to it upon failure.
       exit 1
}

All work is done inside the git-svn repo (it is a non bare repo)

cd /path/to/git-svn/repo

We connect to the git-main server and fetch using a temporary remote and branch. Note that this branch also acts as sort of a mutex. we always use the same name. 

git remote add  ${TMP_REMOTE_NAME} ${SRC_REPO_PATH} || die 'Could not create remote that point to the git repo'
git branch -v ${TMP_DST_BRANCH} || die 'Could not create temporary working branch, maybe another sync is in progress?'
git fetch -v ${TMP_REMOTE_NAME} +${SRC_BRANCH}:${DST_BRANCH}

Since we use merge (as oppose to rebase). Only one new svn commit will be created for every run of this script (normally for every push to the git-main repo). This SVN commit owner will be the admin user that cloned the git-svn repo. We want all the meta information from all the git commits participating in this push to be inserted to the SVN commit log message. To do that We have a permanent remote tracking branch that tracks the work branch. We use it as a pointer to the last commit that was merged in the previous run of this script.
Now we use it to list the commits in this merge and gather the meta information we need.(see git help log for the pretty format syntax)

# we use the --first-parent to filter out merges of master into the working branch.
git log --first-parent --pretty=format:"%h %ae: %n%s%n%b=======" git-main/work..${TMP_DST_BRANCH} > ${MSG_FILE}
#avoid empty logs that will fail merge -m...
echo "====" >> ${MSG_FILE}

Now we are ready to merge

# Make sure we work on the correct branch
git checkout  ${SVN_SYNC_BRANCH} || die "Cant checkout svn sync branch."
# Make sure we uptodate with svn (we do not use git-svn rebase to avoid creation of unnecessary new git commits. git-svn fetch & merge does the trick.
git svn fetch || die "svn fetch failed."
git merge --ff-only remotes/git-svn || die "Fail to sync ${SVN_SYNC_BRANCH} and svn HEAD"
git merge --no-ff -v -m "$(cat $MSG_FILE)" ${DST_BRANCH} || die "Merge failed. There may be unresolved conflicts."

Note the use of --ff-only in the sync merge. (if it is not ff - something went wrong and some manually troubleshooting is needed.) And the --no-ff in the actual merge. We must have a new commit so we can insert the meta-info in its log message.
If all went well by now - lets push it to SVN:

git svn dcommit -v  --add-author-from || die "Dcommmit failed."

Clean tmp branch and remote

git branch -D ${TMP_DST_BRANCH}
git remote rm ${TMP_REMOTE_NAME}

Sync the (permanent) remote tracking branch. This must be the last thing we do since we do not want that sync to happen if the script has failed for any reason.

git fetch -v main-git
exit 0

As you can see, every change that was pushed to GIT was committed to SVN but GIT commits was squashed into one svn commit per push (or per script run, we did had some failures and the script had to run again on more than one push). This was acceptable. Every “squashed” commit was pushed back to git-main since it was an SVN commit and invoked the svn2git job when committed. This process created an anomaly because every change appear twice in GIT history. It was acceptable as well and git merge treated the anomaly with no issues. Developers using SVN could see the meta information of a “squashed” commit by reading the SVN commit log. The log message was a list of the “original” GIT commits sha1, committer and comment.
We created 2 jenkins continuous integration jobs. One polling from SVN and the other from GIT. The nightly build use the sources from SVN.
To summarize: I do not recommend using Hybrid environment as a standard development environment. It has too many limitations. But, as a solution for relatively long transition period It worked for us and enable us to train and move the teams from SVN to GIT one by one, according to teams planned timeline with no development downtime and a very small productivity penalty.     


 

 

Thank you for your interest!

We will contact you as soon as possible.

Send us a message

Oops, something went wrong
Please try again or contact us by email at info@tikalk.com