Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Files and Directories #64

Merged
merged 4 commits into from
Jul 23, 2019
Merged

Move Files and Directories #64

merged 4 commits into from
Jul 23, 2019

Conversation

dmichalowicz
Copy link
Contributor

@dmichalowicz dmichalowicz commented Jul 16, 2019

Add support for moving files and directories.

try:
super(PostgresTestCase, self).test_rename_file()
finally:
del HybridContentsManager.get_file_id
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This setup isn't great but I didn't have any other ideas at the moment other than doing this mocking, or removing this check: https://github.com/quantopian/pgcontents/pull/64/files#diff-ecde3e655b312f914854e679ceacaeccR195

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the test_rename_file test in PostgresContentsManagerTestCase could be re-written to not need file id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

@dmichalowicz
Copy link
Contributor Author

@ssanderson The build is still failing but this should still be ready for a first look.

@dmichalowicz
Copy link
Contributor Author

I'm pretty sure I did something wrong with managing connections to the database, so the tests are hitting the limit of the maximum number of allowed connections.


# Move the populated directory over to "biz".
renamed = cm.rename_file('foo/bar/populated_dir', 'biz/populated_dir')
assert renamed == 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably misleading now that I think about it...

Copy link
Contributor

@ssanderson ssanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmichalowicz left some comments for you.

@@ -221,6 +221,7 @@ def _extra_root_dirs(self):

save = path_dispatch2('save', 'model', True)
rename = path_dispatch_old_new('rename', False)
rename_file = path_dispatch_old_new('rename_file', False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's up with this addition? Is there some new codepath now that needs this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed due to the fact that one of the hybrid manager tests subclasses PostgresContentsManagerTestCase, which now call rename_file directly.

@@ -376,26 +376,80 @@ def save(self, model, path):
model['message'] = validation_message
return model

@outside_root_to_404
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's up with this refactor? Are there cases where we now need the ability to rename multiple files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed in slack, going to remove this in favor of renaming multiple files by making multiple http requests to rename_file.

Returns
-------
renamed : int
The count of paths that were successfully renamed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this has to be len(old_paths) right? When would a user need this return value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They wouldn't, it's mainly for testing purposes. I'm going to either change rename_file to return nothing, or to return a simple string saying whether we "renamed_file" or "renamed_directory".


Parameters
----------
old_paths : list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a dict mapping old -> new might be a simpler API here? Do we need order here?

assert nb_model['name'] == 'Untitled.ipynb'

# A simple rename of the file within the same directory.
file_id = cm.get_file_id('Untitled.ipynb')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than depend on get_file_id for this test (which isn't part of the contents spec; we added it to make notebook sharing work internally at Quantopian I believe), could we just set the content of the file when we create it and make assertions about that? Or do we want to assert specifically that this operation doesn't change the file id?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use Untitled.ipynb here vs. nb_name, which is already in scope?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or do we want to assert specifically that this operation doesn't change the file id?

Nope, it was more just an easy way to check we're dealing with the right file. I think it's fine to simply remove.

I'll change this to nb_name

try:
super(PostgresTestCase, self).test_rename_file()
finally:
del HybridContentsManager.get_file_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the test_rename_file test in PostgresContentsManagerTestCase could be re-written to not need file id?

def test_move_empty_directory(self):
cm = self.contents_manager

parent_folder_model = cm.new_untitled(type='directory')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same notes as above here re: actually naming these.


# A simple rename of the file within the same directory.
file_id = cm.get_file_id('Untitled.ipynb')
renamed = cm.rename_file(nb_path, 'new_name.ipynb')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should probably be using rename, not rename_file. rename is the API called by the server; it's responsible for both moving files and moving checkpoints. rename_file is just the file-moving portion of that responsibility. We generally test with rename elsewhere in this suite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that but actually thought calling rename_file was better/more explicit considering it's the function we are overriding. I can change this to rename, but if that's always the preferred method I kind of wish rename_file was made private 😕

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a language like C++ or Java, rename_file would probably have been "protected" (i.e., visible to subclasses, but not to code outside the class hierarchy); it's publicly-named on the base class because it's designed for subclasses to override, but it's not really public for external consumers.

for old_path in old_api_paths:
with assertRaisesHTTPError(self, 404):
cm.get(old_path)

Copy link
Contributor

@ssanderson ssanderson Jul 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I think we're missing here is a set of tests asserting that checkpoints get moved correctly when files move. The best model for that would probably be to add a test similar to test_checkpoints_follow_file (defined in test_contents_api in the upstream jupyter notebook test suite) and add it to our _APITestBase that defines a suite of tests that should be passed by all of our implementations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call.

# manager that implements it.
def test_move_multiple_objects(self):
pass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a test asserting that HybridContentsManager doesn't allow file renames between sub-managers? I think that behavior is still preserved here, but it doesn't look like we have any coverage for it, which we probably should.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This would probably want to go in MultiRootTestCase).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya I can add that

@@ -374,6 +411,9 @@ def teardown_class(cls):
super(PostgresContentsFileCheckpointsAPITest, cls).teardown_class()
cls.td.cleanup()

def test_checkpoints_move_with_file(self):
pass
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this test fails for this case because the FileCheckpoints class is not recognizing any checkpoints when renaming a file. See https://github.com/jupyter/notebook/blob/master/notebook/services/contents/filecheckpoints.py#L98-L99. I'm not sure what to do about this without making upstream changes, but I can imagine that method wanting to list out all checkpoints for the given path if the path is a directory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah, that looks like it's a bug upstream. I opened #68 to track this. Can you add a comment pointing to that issue? I'll push an issue and/or bugfix upstream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya I'll add a comment.

@dmichalowicz
Copy link
Contributor Author

Thinking more about the database connection issues, I suppose it's also possible there's no actual problem with my changes but rather the tests I added simply pushed the total number of connections over the maximum? Even if that's the case I'm still not so sure how to debug.

@dmichalowicz
Copy link
Contributor Author

@ssanderson I rebased this and the build is fixed, and I removed the "renaming multiple files at once" stuff, so it's ready for another look when you get a chance.

Copy link
Contributor

@ssanderson ssanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmichalowicz couple small questions. Otherwise LGTM.

except (FileExists, DirectoryExists):
self.already_exists(path)
except RenameRoot as e:
self.do_409(str(e))
except (web.HTTPError, PathOutsideRoot):
raise
except Exception as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do uncaught exceptions not get logged by default by tornado? I thought they did.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like they do (I didn't know this), I'm just going to remove this except clause entirely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, despite this fact, I think the manual logging here is clearer/more explicit about there being an issue in our code and about what exact rename call caused the problem. So I don't think it hurts to keep this. I also noticed this is what save does above.

@@ -197,6 +197,43 @@ def test_delete_non_empty_dir(self):
# be deleted)
super(_APITestBase, self).test_delete_non_empty_dir()

def test_checkpoints_move_with_file(self):
# Read initial file state.
self.api.read('foo/a.ipynb')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this call doing anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope I had copied one of the notebook tests to model off of but this isn't needed.

@@ -374,6 +411,9 @@ def teardown_class(cls):
super(PostgresContentsFileCheckpointsAPITest, cls).teardown_class()
cls.td.cleanup()

def test_checkpoints_move_with_file(self):
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah, that looks like it's a bug upstream. I opened #68 to track this. Can you add a comment pointing to that issue? I'll push an issue and/or bugfix upstream.

@dmichalowicz dmichalowicz merged commit cf7bc6d into master Jul 23, 2019
@dmichalowicz dmichalowicz deleted the move-files-rebased branch July 23, 2019 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants