Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue #18:RCSStream: Fix issue on inserting incomplete line on RCS diff #19

Closed

Conversation

futatuki
Copy link

@futatuki futatuki commented Dec 2, 2021

The issue was that the reversed edits object were created against "logical lines" which can contains incomplete lines other than the last line created from latest revision with applying for each diff text through the revisions, while it always applied to "physical lines" that newly created by msplit function from its target file.

This commit changes it: make reversed edits object enable to apply to "physical lines".

  • cvs2svn_lib/rcs_stream.py
    (generate_edits_from_blocks): Connect incomplete lines in old_lines
    and new_lines in command blocks.
    (RCSStream.generate_blocks): Add special case handling in 'a' command.

* cvs2svn_lib/rcs_stream.py
  (generate_edits_from_blocks): Connect incomplete lines in old_lines
    and new_lines in command blocks.
  (RCSStream.generate_blocks): Add special case handling in 'a' command.
@mhagger
Copy link
Owner

mhagger commented Dec 2, 2021

Thanks for your contribution! I can easily believe that the handling of unterminated lines is incorrect, but it's been ages since I've looked at this code, so it could take a while to figure out the context again.

Would it be possible to add a test to the test suite covering this problem? This would help document exactly what the problem is, and also prevent regressions in the future. Ideally, show the CVS commands that would be needed to create the repo that triggers the bug (e.g., in a comment in the test).

/cc @ossilator, who I think did a lot of work in this area back in the day.

@ossilator
Copy link

oh my, a blast from the past! 😅

while my memory is still rather exceptional, it's not that exceptional - i'd have to investigate this from scratch to get a grip again, which is not a priority for me right now (hot project stack depth is currently 3).

i can tell right away that the patch is rather inefficient and fiddly, though. try to address the problem closer to the source. yeah, that might be a somewhat bigger change ... or quite the opposite.

@futatuki
Copy link
Author

futatuki commented Dec 2, 2021

Would it be possible to add a test to the test suite covering this problem? This would help document exactly what the problem is, and also prevent regressions in the future.

Unfortunately I'm not familiar with the test suites of this package yet, so it takes for a while to add a test. (As I knew how rcsparse works because of my code tweaking experience on ViewVC, I could find the cause and could show a concept how to fix. So I read very limited part of the code only.)

Ideally, show the CVS commands that would be needed to create the repo that triggers the bug (e.g., in a comment in the test).

As I don't know the condition that CVS(or RCS) makes such a RCS file, the way to create such a repository I know is only cvs -d REPOS init ; cp irregular.txt,v REPOS where irregular.txt,v is a crafted RCS file I described in issue #18.

@futatuki
Copy link
Author

futatuki commented Dec 3, 2021

I found the fix in d42a9c7 is still incomplete. Once an unterminated line is inserted before the last line, unterminated line can be at any position in self._lines.

So we should check it everywhere in RCSstream.generate_blocks(), if it is used for generating invert diff (say, called from RCSstream.apply_and_invert_edits()).

@futatuki
Copy link
Author

futatuki commented Dec 7, 2021

Once an unterminated line is inserted before the last line, unterminated line can be at any position in self._lines.

As a new revision diffs is always built from a complete file text and older diffs are not affected by newer diffs, I believe that a unterminated line can exist only as a last line in internal logical lines which CVS/RCS hold when calculate a past revision file from latest one.

So checking unterminated lines everywhere in self._lines is usefull only when we want to emulate CVS/RCS behaviour against some broken or crafted RCS files.
(although I've already implemented it...)

@futatuki
Copy link
Author

i can tell right away that the patch is rather inefficient and fiddly, though.

Actually with this PR patch, it is about 1.5 times slower in pass 4 FilterSymbolsPass than the case without the patch, even this patch does not check more complex cases as I mentioned above. Also, I tried to implement cheks for all case of appearance of unterminated line with optimization all I could, but it is still 1.5 times slower than original one.

So I'll try another approach: use logical lines derived from trunk HEAD via deltas instead of flat text content of each revision file at any time to derive newer revision content, in checkout_internal.py. (generate_blobs.py also uses RCSStream, but it does not use the invert_diff method)

@futatuki
Copy link
Author

So I'll try another approach: use logical lines derived from trunk HEAD via deltas instead of flat text content of each revision file at any time to derive newer revision content, in checkout_internal.py. (generate_blobs.py also uses RCSStream, but it does not use the invert_diff method)

As I've implemented it as PR #22, I close this PR.

@futatuki futatuki closed this Dec 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants