Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: sort the order of bulk-renamed files #1801

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

yw1ee
Copy link

@yw1ee yw1ee commented Oct 17, 2024

Resolve #1766 . Sort bulk-renamed files in a similar way to topological sort.

@sxyazi
Copy link
Owner

sxyazi commented Oct 17, 2024

Thanks for the PR! Could you please add tests to the sort() function?

@yw1ee
Copy link
Author

yw1ee commented Oct 18, 2024

Sure! I add a very simple unittest for sort.

@sxyazi
Copy link
Owner

sxyazi commented Oct 19, 2024

Thanks for the change! I added more tests and found that the last two tests are failing. Could you please fix them?

Specifically, the second test is producing unstable sorting results. It could be:

&[("3", "4"), ("2", "3"), ("1", "3")]

or

&[("3", "4"), ("1", "3"), ("2", "3")]

This might be related to the HashMap being used, which has an unpredictable order. Switching it to a BTreeMap or IndexMap might solve the issue.

As for the last test, it hits todo!("Consider cycle"). Normally, a->b and b->a shouldn't happen, but since the data comes from user input, it's an uncontrollable facto, so we need to handle this case to avoid crashing the program here.

@yw1ee
Copy link
Author

yw1ee commented Oct 20, 2024

I also had some concerns while writing the code, so thank you for creating tests for that. I have two questions.

  • input validation
    • I think the input of second test case is invalid. 1 -> 3 and 2 -> 3 cannot do in one bulk rename. In the sort function, is it enough to just return the order without validation? I think it will be catched at maybe_exist and not be renamed, so is it ok?
  • how to handle error
    • I think it is sub-problem of the first question. As you mentioned, we can handle the error when cyclic rename detected. But I'm not sure about how to handle this invalidate input and what is proper behavior of sort function.

So, could I ask you, the author of this project, to clearly define the roles and responsibilities of the sort function? If it doesn't really matter, I'll work on it at my own discretion.

@sxyazi
Copy link
Owner

sxyazi commented Oct 20, 2024

I haven't fully understood the implementation of sort, but I think we should try to keep the user's input order and only prioritize certain items on top of that, like:

a1 -> b
a2 -> b
b  -> c

In this case, we have a1 -> b and a2 -> b appearing twice, which means the data itself is invalid.

However, we don't need to take responsibility for invalid data; that's the user's responsibility, and the program should only handle valid data, as we already know that b -> c can be prioritized, so the sorted result should be:

b  -> c
a1 -> b
a2 -> b

Invalid data remain unchanged and are passed to the upper layer for handling, which in this case is maybe_exist.

@yw1ee
Copy link
Author

yw1ee commented Oct 20, 2024

So do you mean that we will rename a valid subset of the user's input even if the user's full input set is invalid (duplicate target name or cyclic rename)? For that purpose, the role of sort function is just prioritize some rename that could be executed without any concerns, right? Please let me know if I have misunderstood anything.

@sxyazi
Copy link
Owner

sxyazi commented Oct 20, 2024

Yes, the sorting function should only elevate those valid renaming items that have no side effects and can be executed early to the front of the execution queue, assisting the user as much as possible (rather than taking over the user's work and responsibilities) in doing all valid operations, while leaving invalid operations for higher-level resolution — here, that means throwing errors to the user for manual intervention.

@yw1ee
Copy link
Author

yw1ee commented Oct 20, 2024

One thing I would like to suggest is that the sort function returns a separate rename set containing a cycle. And in the above function, if there is a cycle, how about showing the list to the user on stderr to indicate that the cycle exists and just return (or user interaction to do only valid things or not?).

let (valid, cycle) = Self::sort(old, new);
if !cycle.is_empty() {
    writeln!(stderr, "cyclic rename detected");
    for (old, new) in cycle {
        wrintln!(stderr, "{old} -> {new}");
    }
    return Ok(()); // or user interaction to do or not
}

@sxyazi
Copy link
Owner

sxyazi commented Oct 20, 2024

Sorry - no. I don't really see the practical significance of "cycle detection" for users, as it's more of an internal implementation detail. The program just needs to ensure that it safely handles the valid data provided by the user, while leaving the invalid data for the user to handle manually. Introducing this detection seems to discard the entire bulk rename operation, rather than just the invalid parts

@yw1ee
Copy link
Author

yw1ee commented Oct 20, 2024

I misthought. I thought that if there is a cycle, the rename will be performed and all the files will be messed up, but it will not be performed because it is caught in maybe_exist.

I make result consistent and just return cyclic rename entries instead of panic. Please check it. Thanks for your comment.

@sxyazi
Copy link
Owner

sxyazi commented Oct 21, 2024

Thanks! I tested the latest changes, and it seems the ordering issue still exists. For example, this test will fail:

#[rustfmt::skip]
cmp(
    &[("b", "b_"), ("a", "a_"), ("c", "c_")],
    &[("b", "b_"), ("a", "a_"), ("c", "c_")],
);

@yw1ee
Copy link
Author

yw1ee commented Oct 22, 2024

I miss reserving user input order. I fix it!

@sxyazi sxyazi force-pushed the main branch 5 times, most recently from 9ddaba4 to ef1a31a Compare October 26, 2024 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Analyze the renaming order of bulk-renamed files to avoid conflicts
2 participants