Background
I am trying to salvage code from a CVS repo. I am using reposurgeon
for the purpose, and I have tried the following tools to get myself a git-fast-import
stream:
cvs-fast-export
, which errors out (alleged cyclic branch, but doesn't provide details)cvs2git
followed bygit-fast-export
, which mashes up things beyond comprehensiongit-cvsimport
followed bygit-fast-export
, which creates the best results so far, but also ends up throwing stuff on branches that they don't belong on.
This CVS repo has been run on a variety of CVS versions and tags and branches have been forcibly moved. I know that this means I cannot salvage those branches and tags anymore. But so be it.
Nevertheless I have half a dozen branches (out of many many more), plus MAIN
, which I am interested in retaining during converion into a git-fast-import
stream. My target VCS is not Git, but the point is that reposurgeon
handles its input this way and outputs this way, too.
In order to make sense of the artifacts and clean as much of the old stuff (including orphaned revisions) out in a pre-processing stage by means of rcs -o<rev>
(of course on a copy of my repo ;)), I need to understand how the innards of the rcsfile
format work.
Parsing is a piece of cake after modifying the rcsfile.py
module from rcsgrep
. But that doesn't yet provide me with any information about what the revision numbers, especially those without a corresponding delta+log, mean.
What I see
According to the RCS files man page, there shouldn't be a case where the third segment of a revision ID is 0. Yet I see exactly that condition.
Here is what I did (as an experiment).
- On
MAIN
: commit a file (1.1
) - From
MAIN
: branch toBranchX
(1.1
) - On
BranchX
: change the file (1.1.2.1
) - On
BranchX
: change the file again (1.1.2.2
) - On
MAIN
: change the file (1.2
) - On
MAIN
: tag the filefoobar
(1.2
) - From
MAIN
: branch toBranchX
, moving the branch tag (1.2
), effectively orphaning the previous branch at1.1.2.x
- On
BranchX
: delete the file (1.2.2.1
) - On
MAIN
: change the file (1.3
) - On
MAIN
: forcibly tag the filefoobar
(1.3
) - On
MAIN
: change the file (1.4
) - On
MAIN
: tag the filefoobarbaz
(1.4
)
As you can see in the list above and also in the fully reproduced file below, there is no revision 1.2.0.2
in the form of delta with log.
Now my questions
If I branch off revision x.y
freshly (no file changes!), the resulting revision ID is x.y.0.2
. That is similar to the mysterious revision ID I am seeing and asking about.
- Does the
0
indicate that the file doesn't have deltas, such that I have to go back to its ancestor for the actual contents? - Or does the 0 simply indicate the "root" of that branch, with the fourth segment being the latest revision on that branch?
Can anyone shed light on these questions or point to more comprehensive material than the above linked man page?
Below is the full RCS file:
head 1.4;
access;
symbols
foobarbaz:1.3
foobar:1.4
BranchX:1.2.0.2;
locks; strict;
comment @# @;
1.4
date 2014.12.11.13.46.46; author username; state Exp;
branches;
next 1.3;
1.3
date 2014.12.11.13.44.49; author username; state Exp;
branches;
next 1.2;
1.2
date 2014.12.11.13.39.31; author username; state Exp;
branches
1.2.2.1;
next 1.1;
1.1
date 2014.12.11.13.31.41; author username; state Exp;
branches
1.1.2.1;
next ;
1.1.2.1
date 2014.12.11.13.34.36; author username; state Exp;
branches;
next 1.1.2.2;
1.1.2.2
date 2014.12.11.13.35.08; author username; state Exp;
branches;
next ;
1.2.2.1
date 2014.12.11.13.42.32; author username; state dead;
branches;
next ;
desc
@@
1.4
log
@Change on MAIN
@
text
@NOTE: this file will be removed!
Another change on MAIN@
1.3
log
@Change on MAIN
@
text
@d3 1
a3 1
ANother change on MAIN@
1.2
log
@Change on MAIN
@
text
@d3 1
a3 1
File on MAIN will be forcibly tagged X again ... how does this affect the rev ID?@
1.2.2.1
log
@Removing the two files from X
@
text
@@
1.1
log
@Adding the experiment file
@
text
@d3 1
a3 1
Introducing file on MAIN@
1.1.2.1
log
@Changing the file on the X branch
@
text
@d3 1
a3 1
Changing on X branch@
1.1.2.2
log
@Another change on the X branch
@
text
@d3 1
a3 1
Another change on the X branch@
Okay, turns out the answer to this is buried deep down in the CVS source code.
For starters here are the important files if you are looking at the CVS source tree:
src/rcs.c
src/rcs.h
doc/RCSFILES
In addition to that you have the
rcsfile(5)
man page. And don't forget to usegrep
to the utmost extend (unless you have something more sophisticated at your disposal, that is).The gist:
x.y.z
, e.g.1.1.2
, which is a branch off of revision1.1
.x.y.0.z
, or1.1.0.2
. Where 0 is a magic value defined asRCS_MAGIC_BRANCH
in the CVS code. Note that no delta will ever have the third segment set to0
, as these are "virtual revision numbers".z
(third segment of a branch revision, fourth of a virtual revision number) will ever only be an even number equal or bigger than twoassert((z >= 2) && (z % 2 == 0))
1
is also reserved for vendor branches as per the comment inrcs.h
(see below).symbols
list in the admin section of the RCS file (e.g. viarlog -h <file>
, if you don't want to parse it) for revisions which have the second-to-last segment set to0
. That is, you have a revision that would match the (PCRE) regular expression(?:\d+\.\d+\.)+0\.\d+
(hope I got that right).From a comment in
rcs.h
Interesting functions using
RCS_MAGIC_BRANCH
areRCS_tag2rev()
andRCS_gettag
.From comments in
rcs.c
Comment on
RCS_magicrev()
:The answers
0
indicate that the file doesn't have deltas, such that I have to go back to its ancestor for the actual contents?0
, the revision number is a virtual revision number used to make a "reservation" for a branch number.0
simply indicate the "root" of that branch, with the fourth segment being the latest revision on that branch?