Converting from CVS to Subversion

How I Converted from CVS to Subversion - 09/16/2005

- also: Setting up public read only access and requiring https for commit in an svn repository via apache

I'm not going to go in to the why of converting, just the how. I assume if you found this page you've already debated the why and are interested in finding some tips and gotchas in moving from cvs to svn.

First I should point out that my cvs tree may not have been normal, which may be why I had so many problems. Plus I'm picky and I wanted the data to make it in clean without having to remember corner cases about what I had to do to get it imported.

The first tool I tried was cvs2svn, a python script that does a great job of importing cvs trees in to svn. You point it at a cvs root and it imports the files to svn under trunk/, branches/, and tags/

The problem here is that I organize my cvs tree like this:

Using cvs2svn it will import everything under /trunk/ and tags under /tags/. Unfortunately early on some programs have release tags like v_01 or v_05, which means you get /tags/v_05/ but you have no idea what it's for. Later I started using the program name like xautomation-0.95 or whatever, but still even with that the /tags/ directory was getting very full and I didn't like the way it looked. I wanted instead each project to have it's own trunk/, tags/, and branches/ subdirectories so there would be no confusion. This would also make it easier to make some stuff public, some stuff read only, and some stuff password protected.

After experimenting with cvs2svn I realized it didn't need a cvs root, just a part of a cvs tree to import. It just recursively processes the rcs files and creates change sets from that. This meant (I thought anyway) that I could be able to do something like:

cvs2svn --trunk=powwow/trunk --branches=powwow/branches --tags=powwow/tags -s svn cvs/c/powwow

Turns out this doesn't work though, and cvs2svn as of version 1.3 emits and error saying that it can't handle the paths. That sucked and so I thought about it some more and decided there wasn't any reason I couldn't just import to / and then move /trunk /branches and /tags to /projects/powwow/ or whereever they need to go. So I built a quick perl script to do this and imported a few projects to see how it looked. It would run cvs2svn from some subdirectory in my cvs tree, like java/cheeselab3, and import that in to my svn directory at /trunk. Then it would create the /projects/name directory and move /trunk, /tags, and /branches under /projects/name using the svn command line. After I was happy with the way things looked I went ahead and imported all my cvs projects.

At this point things seemed great, but little did I know that the history of moving the projects from / to their home in /projects/name/ would bite me in the ass later. However thinking things were great I proceeded to figure out how to set up remove svn access. One thing I was excited for was http and https access because I hated the cvspserver and having to have ssh accounts for revision control. It seemed like overkill. Since I use debian I just had to aptitude install apache2 and the libapache2-svn packages and follow the instructions for setting up webdav and svn access from the svn book site.

Things got frustrating here because I wanted read only access available from http://svn.hoopajoo.net and write access available if you use HTTPS at https://secure.hoopajoo.net which turned out to be a little tricky. Plus I run apache 1.3 for my sites and I don't want to migrate them to apache2 and I don't want my svn repo to have some funky port on it. So after some thinking I used apache 1.3's reverse proxy from both svn.hoopajoo.net and secure.hoopajoo.net along with some limit directives and setenvif on apache2 to make it work.

My apache2 config for this virtual host looks like this:

NameVirtualHost *:8022
<VirtualHost *:8022>
        ServerAdmin bpk@hoopajoo.net
        ServerName localhost
        ServerAlias svn.hoopajoo.net
        DocumentRoot /home/svnroot/htdocs

        <Location /svn>
                DAV svn
                SVNPath /home/svnroot/svn

                # Per directory access
                AuthzSVNAccessFile /home/svnroot/auth.conf

                Satisfy any
                Require valid-user

                # Auth info for writing
                AuthType Basic
                AuthName "Subversion repository"
                AuthUserFile /home/svnroot/htpasswd

                SetEnvIf X-Forwarded-Host "secure.hoopajoo.net" HTTPS=yes
                <LimitExcept GET PROPFIND OPTIONS REPORT>
                        # Require https for anything but reading
                        Satisfy all
                        Order Deny,Allow
                        Deny from all
                        Allow from if env=HTTPS

Under apache 1.3 which is listening on port 80 I set up:

<VirtualHost *:80>
        ServerName svn.hoopajoo.net
        ProxyPass / http://localhost:8022/
        ProxyPassReverse / http://localhost:8022/

And in my mod_ssl.conf I added the same ProxyPass elements but under /svn/ instead of just /. This way I could get to the apache2 webdav stuff from svn.hoopajoo.net over http, or secure.hoopajoo.net over https. The LimitExcept line makes it so that you have to use https in order to do commits, since my https hostname is secure.hoopajoo.net and I redirect the regular http from secure.hoopajoo.net to https of the same there is no way to access the subversion repository using regular http and the secure hostname.

Something very important to note in your Location directive, do not include the trailing slash otherwise you'll get wierd errors like:

apache2: subversion/libsvn_subr/path.c:376: svn_path_basename: Assertion `is_canonical (path, len)' failed.

In your error log. Apparently if you include the trailing slash, going to something like http://svn.hoopajoo.net/svn/ and trying to do a list using the svn client will fail because internally the webdav converts that to the local path svn:// instead of svn:/ which fails the is_canonical check. So if you do that anything under /, like /projects, will work as expected. But trying to do any svn commands on the root of your subversion tree will fail. Irritatingly.

At this point I thought I was home free. Read only access was available over regular http for anyone under /svn/. Well that's not entirely true, I started the read only access point under /svn/projects/ which is where my FOSS projects live. I also set up password protected directories using my auth.conf file. One thing to watch out for there is to not include trailing slashes on directories when setting permissions there either. For example:

bpk = rw

Does not work for some operations and the authentication does wierd things. So Don't include that trailing slash. Also worth noting, if you override all users access using the * = then you also need to reset permissions for other users that were set earlier in the tree. For example:

* = 
bpk = rw

* = r

Will fail to let bpk have rw access to /public since you set * = to read only. This seems obvious to me now but was very frustrating at the time. I kind of assumed that giving bpk = rw at / would give bpk rw access on the whole tree. And it would have except that the * = overrides that. After wrestling with the cvs conversion and the bizarre apache2 webdav assertion error, I was getting pretty irritated. The fix is of course to do:

* = 
bpk = rw

* = r
bpk = rw

OK so at this point I thought I was done. I had /projects set to read only except for me, and / was set to readable by only me so no one can list the root (I'm crazy like that). So I went to test anonymous access and pulled out the files fine... then I tried to pull a log on a file. I got nothing. Well not nothing really but basically nothing. It would say that there were 4 log lines or something, but they would all be blank. At this point I freaked out a bit. I had already committed some data to the repository and now I didn't even know if it had retained my history. Frantically, I tried to check the history several ways and it didn't work, via http, https, authenticated, not, etc. So I logged in as root and used a file:/// access to the repository and lo-and-behold all my history was there.

Puzzled I tried some more things out then remembered reading about "pegs" in the svn book, and how different files may have occupied the same space at different times. In my case, all my imports started as /trunk, /tags, and /braches. When I added read-only access to those directories, all my history suddenly came back over http. Well this is OK I thought, but after a bit of time I realized something bad, I have some repositories password protected. Those were at one point under /trunk, and I have to make /trunk readable to allow history to work for other projects.... Well if you peg the release you can pull out /trunk from before those files were in the password protected area. Well crap now I'm screwed.

By this time I was about to ditch svn and head back to cvs land in disgust, but I remembered one other command from glossing through help documents. The svnadmin dump command. I dumped my svn repository and checked out the human readable format. It looked easy to understand and initially my intention was to write a perl script to mungle up the paths so they never existed in /trunk, etc. I started working out a script when a better idea hit me though. After checking out how to load a file, I saw you could specify a parent directory to put the dump in to. This was what was going to save me and my sanity.

The new plan was to use cvs2svn to export a cvs directory in to a temporary svn repo. Then dump this repo out, using svnadmin, and load it up in to my new master svn repo using svnadmin load and a --parent-dir prefix. I did some tests and amazingly this would work. Everything imported under it's own directory, never existing in some bizzare location in the past opening up security holes. So after all this I made a script that takes a directory from a cvs repository, the svn repository you want to put it in, and the subdir that it should live in. I then made a tab-delimited file listing all the various cvs directories and destination svn directories and I got these scripts.



use FindBin;
my $svnroot = "$FindBin::Bin/svn";
while( <> ) {
        my( $proj, $location ) = split( /\t+/ );

        next unless $location;

        my @dirs = split( /\//, $location );
        print "Creating parent path for $location...\n";
        my $path = "/";
        for( @dirs ) {
                $path .= "$_/";
                print "  > $path\n";
                system( "svn", "mkdir", "-m", "preparing import tree", "file://$svnroot/$path" );

        print "Importing $proj to $location\n";
        system( "./mycvs2svn", "$proj", $svnroot, "$location" );



my( $from, $to, $module ) = @ARGV;

die( qq{
Usage: $0 fromcvsdir tosvnrepo prefix

} ) unless $to;

$from .= "/" unless $from =~ m!/$!;
if( ! $module ) {
        ( $module ) = $from =~ m!^.*/(.*)/!;

print "Executing cvs2svn...\n";
system( "rm", "-rf", "tmpsvn" );
system( "cvs2svn", "--force-branch", "DEFAULT", "-s", "tmpsvn", $from );

print "Dumping temp svn\n";
system( "svnadmin dump tmpsvn > tmpsvn.dump" );

print "Setting up new module dir and importing...\n";
#system( "svn", "mkdir", "-m", "creating module $module", "file://$to/$module" );
system( "svnadmin load --parent-dir $module $to < tmpsvn.dump" );

print "Deleting cvs cruft...\n";
for my $move ( qw/trunk tags branches/ ) {
        for my $cruft ( qw/DEFAULT START HEAD/ ) {
                system( "svn", "delete", "-m", "removing cvs cruft", "file://$to/$module/$move/$cruft" );

imports.txt (tab delimited)

cvs/bpk/c/powwow                      projects/powwow
cvs/bpk/java/cheeselab3               projects/cheeselab3

The doimp script is fed the imports.txt either via STDIN or the command line, perl's <> operater does that, and it initializes the parent directories for mycvs2svn to load the cvs data in to. mycvs2svn calls cvs2svn to create a temp svn repository from the source cvs directory, dumps it out, then imports it again in to the final svn repository under the prefix specified on the command line via the config file. The directory should contain the cvs directory, the new subversion repo (named svn), and the scripts. It went something like this:

$ svnadmin create svn
$ ./doimp imports.txt
Creating parent path for foss/termkey...
  > /foss/

Committed revision 1.
  > /foss/termkey/

Committed revision 2.
Importing cvs/bpk/c/termkey to foss/termkey
Executing cvs2svn...
----- pass 1 -----
Examining all CVS ',v' files...

... a bunch more stuff in here ...

<<< Started new transaction, based on original revision 2
     * adding path : foss/termkey/trunk/Makefile ... done.
     * adding path : foss/termkey/trunk/keytable.h ... done.
     * adding path : foss/termkey/trunk/mktable ... done.
     * adding path : foss/termkey/trunk/termkey.c ... done.

------- Committed new rev 4 (loaded from original rev 2) >>>

<<< Started new transaction, based on original revision 3
     * adding path : foss/termkey/branches/HEAD ...COPIED... done.

------- Committed new rev 5 (loaded from original rev 3) >>>

<<< Started new transaction, based on original revision 4

------- Committed new rev 6 (loaded from original rev 4) >>>

<<< Started new transaction, based on original revision 5
     * adding path : foss/termkey/branches/DEFAULT ...COPIED... done.

------- Committed new rev 7 (loaded from original rev 5) >>>

Deleting cvs cruft...
svn: URL 'DEFAULT' does not exist
svn: URL 'START' does not exist
svn: URL 'HEAD' does not exist
svn: URL 'DEFAULT' does not exist
svn: URL 'START' does not exist
svn: URL 'HEAD' does not exist

Committed revision 8.
svn: URL 'START' does not exist

Committed revision 9.

And now I finally have a pristine svn repository that retains all my cvs history, and formatted the way I want with decent access controls. The final format worked out something like: