Skip to content
This repository has been archived by the owner on Aug 3, 2023. It is now read-only.

fix: skip kv pairs that have already been uploaded #1970

Merged
merged 1 commit into from
Jun 25, 2021

Conversation

nilslice
Copy link
Contributor

This adds an optional param to the function which creates the kv pairs for upload. The logic was pulled from the filter_files function, which itself has been removed (along with its test).

The primary motivation here is to skip any kv pair that has already been uploaded (checked by listing keys from the namespace), so we don't grow the vec so needlessly large.

@nilslice nilslice requested a review from a team as a code owner June 25, 2021 16:12
@nilslice
Copy link
Contributor Author

would be great if someone could verify that the asset_manifest should contain the full list of file metadata even when we can skip the pre-existing kv pair.

Copy link
Contributor

@jyn514 jyn514 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why the test was removed - would it be possible to test directory_keys_values instead?

@@ -95,15 +97,23 @@ pub fn directory_keys_values(

validate_key_size(&key)?;

// asset manifest should always contain all files
asset_manifest.insert(url_safe_path, key.clone());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if there are duplicate keys? Will we upload both and the second gets overridden?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can there be duplicates? these are based on files on disk no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with this service 😅 if there can't be duplicates it's not worth worrying about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not that familiar with it either, but the code suggests that it's iterating over files and creating a bundle to upload. the filepaths which ultimately determine the key & value would be unique

@@ -64,6 +65,7 @@ pub fn add_namespace(user: &GlobalUser, target: &mut Target, preview: bool) -> R
pub fn directory_keys_values(
target: &Target,
directory: &Path,
exclude: Option<&HashSet<String>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to remove the Option - if callers don't want to ignore any keys, they can pass an empty set.

Suggested change
exclude: Option<&HashSet<String>>,
exclude: &HashSet<String>,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create a new HashSet and pass a ref to it, instead of None?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, exactly: &HashSet::new()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the benefit, can you elaborate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option<HashSet> has two ways to represent "don't exclude any files": Some(HashSet::new()) and None. HashSet only has one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also removes the if let Some(...) line below, which decreases the indentation a little.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm. yea, i suppose. None is pretty clear from an intent standpoint. Going to leave as-is, and if its worthwhile we can revisit

use std::path::Path;

#[test]
fn it_can_filter_preexisting_files() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this would be too painful to make work with directory_keys_values for now? Would be good to add back in later with a mock Walk or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, exactly -- the old test really didn't prove much... and with the logic embedded in the other function now it's a bit hairy to test. We'd want to change the implementation to make it testable but since this is just a quick patch I think it's ok.

cc/ @jyn514 (w/r/t your question above)

@nilslice nilslice merged commit 0b14af2 into master Jun 25, 2021
@delete-merged-branch delete-merged-branch bot deleted the sm/pre-filter-uploads branch June 25, 2021 18:49
This was referenced Jun 29, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants