Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cvs2svn: Got Malformed RCS delta error while reproducing newer revison file from older revision file. #18

Open
futatuki opened this issue Dec 2, 2021 · 2 comments

Comments

@futatuki
Copy link

futatuki commented Dec 2, 2021

(This was reported by @jun66j5 on twitter, in Japanese language, with larger example.)

It seems cvs2svn cannot handle some RCS files which contains "a" command with in complete (not end with '\n') line before the last line in the target file.

Here is an example file irregular.txt,v:

head    1.5;
access;
symbols;
locks; strict;
comment @# @;


1.5
date    2021.11.26.04.16.45;    author futatuki;        state Exp;
branches;
next    1.4;

1.4
date    2021.11.26.04.14.23;    author futatuki;        state Exp;
branches;
next    1.3;

1.3
date    2021.11.26.03.46.06;    author futatuki;        state Exp;
branches;
next    1.2;

1.2
date    2021.11.26.03.41.39;    author futatuki;        state Exp;
branches;
next    1.1;

1.1
date    2021.11.26.03.34.28;    author futatuki;        state Exp;
branches;
next    ;


desc
@create a new file irregular.txt
@


1.5
log
@r1.5:
@
text
@aaa
bbb
ccc
hhh@


1.4
log
@r1.4:
@
text
@d4 1
a4 1
ggg@


1.3
log
@r1.3: s/eee/fff/
@
text
@d3 1
a3 1
fff@


1.2
log
@r1.2: s/ddd/eee/
@
text
@d3 1
a3 1
eee@

1.1
log
@r1.1
@
text
@d3 1
a3 1
ddd@

For each revision diff text, the last 'a' command adds incomplete line. Especially in r1.3 from r1.4, this diff text replaces an existing comlete line into an incomplete line, and as a result, this line connect the next existing line (although I don't believe CVS/RCS can produce such a curious diff, but there exists other than this in a real use case, and CVS/RCS can handle it).

With a sample CVS repository contains this irregular.txt,v only (except RCSROOT), cvs2svn produces an error:

$  cvs2svn -s irregular-rcs-test-broken.svn /home/cvs/irregular-rcs-test
Writing temporary files to '/tmp/cvs2svn-Tlxzju'
Be sure to use --tmpdir='/tmp/cvs2svn-Tlxzju' if you need to resume this conversion.
----- pass 1 (CollectRevsPass) -----
Examining all CVS ',v' files...
/home/cvs/irregular-rcs-test/irregular.txt,v
/home/cvs/irregular-rcs-test/CVSROOT/checkoutlist,v

  ...

Starting Subversion r7 / 7
ERROR: Malformed RCS delta in /home/cvs/irregular-rcs-test/irregular.txt,v, revision 1.5: Deletion beyond file end

With the incomplete Subversion repository irregular-rcs-test-broken.svn, r1.1 to r1.4 revision files of irregular.txt exactly equal to ones of original CVS repo's.

Although I also think those RCS delta was malformed, I think it is better to continue to build new revisions for Subversion repo, because newer revision files before mailformed delta are correct files.

I'll submit a PR for the issue.

@mhagger
Copy link
Owner

mhagger commented Dec 2, 2021

Thanks for the careful report and the research. Your conclusions could be corroborated if you try converting the same file, first with the --use-rcs and then with the --use-cvs option, to see if those conversions give the result that you expect. If they do, then that's pretty conclusive proof that the --use-internal-co version needs to be adjusted. If they fail, or if they give different results, then the analysis might need to be adjusted.

@futatuki
Copy link
Author

futatuki commented Dec 2, 2021

Thank you for the suggestion. Both of with --use-rcs option and with --use-cvs option, cvs2svn produced as I expected, all r1.1 to r1.5 rivision of irregular.txt file checkout correctly from converted repository as r3 to r7.

Diff of dump files with CVS and with RCS are just UUID and svn:date of r1.

--- irregular-rcs-test-cvs-dump.txt     2021-12-02 19:59:31.895085000 +0900
+++ irregular-rcs-test-rcs-dump.txt     2021-12-02 19:59:46.421823000 +0900
@@ -1,6 +1,6 @@
 SVN-fs-dump-format-version: 2
 
-UUID: 7caf5ac0-5e53-ec11-9708-001c4272a9f0
+UUID: 78e207b6-5e53-ec11-9708-001c4272a9f0
 
 Revision-number: 0
 Prop-content-length: 56
@@ -9,7 +9,7 @@
 K 8
 svn:date
 V 27
-2021-12-02T10:58:16.104635Z
+2021-12-02T10:57:58.781240Z
 PROPS-END
 
 Revision-number: 1

And with --use-internal-co,

With the incomplete Subversion repository irregular-rcs-test-broken.svn, r1.1 to r1.4 revision files of irregular.txt exactly equal to ones of original CVS repo's.

This was incorrect. Only r1.1 == r3 was the same. All other revisions are broken.

After applying PR #19, cvs2svn with --use-internal-co successfully convert the repository, and its dump file differ only repos UUID and svn:date of r1.

futatuki added a commit to futatuki/cvs2svn that referenced this issue Dec 20, 2021
…t_db.

As the return value of RCSStream.invert_diff() is not to be applied
for flat text content but for internal logical lines in RCSStream,
which may contain some unterminated logical lines at any position,
so we should use the content of internal logical lines in RCSStream
as base text which is used to get the content of the next newer revision.
With this commit, it is implemented by splitting the checkout() method
in TextRecord, checkout() method for external use for as in the past
and checkout_as_lines() method for internal use.

* cvs2svn_lib/checkout_internal.py
  (TextRecord.checout_as_lines): New method. Replacement of checkout()
    method but returns internal lines in RCSStream instead of a plain
    text.
  (TextRecord.checkout):
    Use checkout_as_lines() for default implementation.
  (FullTextRecord.checkout, DeltaTextRecord.checkout):
    Removed to use default implementation.
  (FullTextRecord.checkout_as_lines):
    New method. Just same logic as the past checkout() method.
  (DeltaTextRecord.checkout_as_lines):
    New method. Just same logic as the past checkout() method but
    uses internal lines in rcs_stream instead of its text.
  (_Sink.set_revision_info):
    Record the internal lines in rcs_stream instead of its text
    at revision 1.1.

* cvs2svn_lib/rcs_stream.py
  (RCSStream.__init__):
    Allow to set lines directly in addition to a text.
  (RCSStream.get_lines):
    New method.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants