In the RCS file format in a CVS repo, what does x.y.0.2 as a revision indicate?

637 views Asked by At

Background

I am trying to salvage code from a CVS repo. I am using reposurgeon for the purpose, and I have tried the following tools to get myself a git-fast-import stream:

  • cvs-fast-export, which errors out (alleged cyclic branch, but doesn't provide details)
  • cvs2git followed by git-fast-export, which mashes up things beyond comprehension
  • git-cvsimport followed by git-fast-export, which creates the best results so far, but also ends up throwing stuff on branches that they don't belong on.

This CVS repo has been run on a variety of CVS versions and tags and branches have been forcibly moved. I know that this means I cannot salvage those branches and tags anymore. But so be it.

Nevertheless I have half a dozen branches (out of many many more), plus MAIN, which I am interested in retaining during converion into a git-fast-import stream. My target VCS is not Git, but the point is that reposurgeon handles its input this way and outputs this way, too.

In order to make sense of the artifacts and clean as much of the old stuff (including orphaned revisions) out in a pre-processing stage by means of rcs -o<rev> (of course on a copy of my repo ;)), I need to understand how the innards of the rcsfile format work.

Parsing is a piece of cake after modifying the rcsfile.py module from rcsgrep. But that doesn't yet provide me with any information about what the revision numbers, especially those without a corresponding delta+log, mean.

What I see

According to the RCS files man page, there shouldn't be a case where the third segment of a revision ID is 0. Yet I see exactly that condition.

Here is what I did (as an experiment).

  1. On MAIN: commit a file (1.1)
  2. From MAIN: branch to BranchX (1.1)
  3. On BranchX: change the file (1.1.2.1)
  4. On BranchX: change the file again (1.1.2.2)
  5. On MAIN: change the file (1.2)
  6. On MAIN: tag the file foobar (1.2)
  7. From MAIN: branch to BranchX, moving the branch tag (1.2), effectively orphaning the previous branch at 1.1.2.x
  8. On BranchX: delete the file (1.2.2.1)
  9. On MAIN: change the file (1.3)
  10. On MAIN: forcibly tag the file foobar (1.3)
  11. On MAIN: change the file (1.4)
  12. On MAIN: tag the file foobarbaz (1.4)

As you can see in the list above and also in the fully reproduced file below, there is no revision 1.2.0.2 in the form of delta with log.

Now my questions

If I branch off revision x.y freshly (no file changes!), the resulting revision ID is x.y.0.2. That is similar to the mysterious revision ID I am seeing and asking about.

  • Does the 0 indicate that the file doesn't have deltas, such that I have to go back to its ancestor for the actual contents?
  • Or does the 0 simply indicate the "root" of that branch, with the fourth segment being the latest revision on that branch?

Can anyone shed light on these questions or point to more comprehensive material than the above linked man page?


Below is the full RCS file:

head    1.4;
access;
symbols
    foobarbaz:1.3
    foobar:1.4
    BranchX:1.2.0.2;
locks; strict;
comment @# @;


1.4
date    2014.12.11.13.46.46;    author username;    state Exp;
branches;
next    1.3;

1.3
date    2014.12.11.13.44.49;    author username;    state Exp;
branches;
next    1.2;

1.2
date    2014.12.11.13.39.31;    author username;    state Exp;
branches
    1.2.2.1;
next    1.1;

1.1
date    2014.12.11.13.31.41;    author username;    state Exp;
branches
    1.1.2.1;
next    ;

1.1.2.1
date    2014.12.11.13.34.36;    author username;    state Exp;
branches;
next    1.1.2.2;

1.1.2.2
date    2014.12.11.13.35.08;    author username;    state Exp;
branches;
next    ;

1.2.2.1
date    2014.12.11.13.42.32;    author username;    state dead;
branches;
next    ;


desc
@@


1.4
log
@Change on MAIN
@
text
@NOTE: this file will be removed!

Another change on MAIN@


1.3
log
@Change on MAIN
@
text
@d3 1
a3 1
ANother change on MAIN@


1.2
log
@Change on MAIN
@
text
@d3 1
a3 1
File on MAIN will be forcibly tagged X again ... how does this affect the rev ID?@


1.2.2.1
log
@Removing the two files from X
@
text
@@


1.1
log
@Adding the experiment file
@
text
@d3 1
a3 1
Introducing file on MAIN@


1.1.2.1
log
@Changing the file on the X branch
@
text
@d3 1
a3 1
Changing on X branch@


1.1.2.2
log
@Another change on the X branch
@
text
@d3 1
a3 1
Another change on the X branch@
1

There are 1 answers

0
0xC0000022L On

Okay, turns out the answer to this is buried deep down in the CVS source code.

For starters here are the important files if you are looking at the CVS source tree:

  • src/rcs.c
  • src/rcs.h
  • doc/RCSFILES

In addition to that you have the rcsfile(5) man page. And don't forget to use grep to the utmost extend (unless you have something more sophisticated at your disposal, that is).

The gist:

  • A branch revision is defined by the first three ( or bigger odd number) segments, i.e. x.y.z, e.g. 1.1.2, which is a branch off of revision 1.1.
    • The symbol for such branch will point to revision x.y.0.z, or 1.1.0.2. Where 0 is a magic value defined as RCS_MAGIC_BRANCH in the CVS code. Note that no delta will ever have the third segment set to 0, as these are "virtual revision numbers".
  • In stock CVS z (third segment of a branch revision, fourth of a virtual revision number) will ever only be an even number equal or bigger than two
    • assert((z >= 2) && (z % 2 == 0))
  • A branch number 1 is also reserved for vendor branches as per the comment in rcs.h (see below).
  • To check for a branch, simply look in the symbols list in the admin section of the RCS file (e.g. via rlog -h <file>, if you don't want to parse it) for revisions which have the second-to-last segment set to 0. That is, you have a revision that would match the (PCRE) regular expression (?:\d+\.\d+\.)+0\.\d+ (hope I got that right).

From a comment in rcs.h

CVS reserves all even-numbered branches for its own use. "magic" branches (see rcs.c) are contained as virtual revision numbers (within symbolic tags only) off the RCS_MAGIC_BRANCH, which is 0. CVS also reserves the ".1" branch for vendor revisions. So, if you do your own branching, you should limit your use to odd branch numbers starting at 3.

Interesting functions using RCS_MAGIC_BRANCH are RCS_tag2rev() and RCS_gettag.

From comments in rcs.c

Comment on RCS_magicrev():

Return a "magic" revision as a virtual branch off of REV for the RCS file. A "magic" revision is one which is unique in the RCS file. By unique, I mean we return a revision which:

  • has a branch of 0 (see rcs.h RCS_MAGIC_BRANCH)
  • has a revision component which is not an existing branch off REV
  • has a revision component which is not an existing magic revision
  • is an even-numbered revision, to avoid conflicts with vendor branches The first point is what makes it "magic".

As an example, if we pass in 1.37 as REV, we will look for an existing branch called 1.37.2. If it did not exist, we would look for an existing symbolic tag with a numeric part equal to 1.37.0.2. If that didn't exist, then we know that the 1.37.2 branch can be reserved by creating a symbolic tag with 1.37.0.2 as the numeric part.

[...]

Note: We assume that REV is an RCS revision and not a branch number.

The answers

  • Does the 0 indicate that the file doesn't have deltas, such that I have to go back to its ancestor for the actual contents?
    • Basically yes. When the second-to-last segment is 0, the revision number is a virtual revision number used to make a "reservation" for a branch number.
  • Or does the 0 simply indicate the "root" of that branch, with the fourth segment being the latest revision on that branch?
    • No, see above.