Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add filename sanitizing to adapters #113

Merged
merged 4 commits into from
Nov 9, 2022
Merged

Add filename sanitizing to adapters #113

merged 4 commits into from
Nov 9, 2022

Conversation

masaball
Copy link
Contributor

@masaball masaball commented Oct 27, 2022

This PR moves filename sanitizing from the ffmpeg and passthrough adapters into a separate module. Additional sanitizing is added and applied in the two listed adapters as well as the AWS adapters.

Sanitizing is strict, replacing any special characters outside the allowed set with _ and removing all excess periods. The allowed set is A-Z, a-z, 0-9, ., -, and _. This is a stricter version of the list provided by AWS for object-keys. Additionally, / is not sanitized because of how Avalon filenames are formatted when passed to Active Encode.

Add failing ffmpeg tests

Add filename sanitizing to ffmpeg and aws adapters

Appease rubocop and adjust sanitizing

Sanitzing adjustment

Appease rubocop

Adjust Sanitization

Add additional sanitizing
Copy link
Member

@cjcolvar cjcolvar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about making ActiveEncode be the path for calling the sanitization methods?

ActiveEncode.sanitize_filename(input_url)

This would avoid having to require the new module in all of the adapters and would make it easily available outside of ActiveEncode's codebase for use in apps that use ActiveEncode. It would make method calls in adapters longer but I think it would avoid some bloat since the methods are generic and the adapters don't really need to provide those methods.

I think this is possible by adding the require to lib/active_encode.rb then declaring the module to extend the sanitization module:

module ActiveEncode
  extend ActiveEncode::FilenameSanitizer
end

lib/active_encode/filename_sanitizer.rb Outdated Show resolved Hide resolved
Copy link
Member

@cjcolvar cjcolvar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good to me, but I'm a little worried that there is the possibility that previously created encodes could be orphaned since this changes where ActiveEncode looks for files. Do you think this could possibly happen? If so, we could make this a major release due to the change in behavior.

@masaball
Copy link
Contributor Author

masaball commented Nov 1, 2022

I think the FileLocator is only really used by the engine adapters when doing the encoding and I don't think that previously created encodes reference it. But I may have overlooked something in the code or how external apps/prior encodes interact with it. So, I hope its not the case but it orphaning some previous encodes is a possibility.

Co-authored-by: Chris Colvard <[email protected]>

If a file is already successfully in an s3 bucket and is not being moved
to a new bucket we do not need to do anything with the filename. If the
file is being moved to a new bucket, we should sanitize it to the
stricter set of allowed characters.
Copy link
Member

@cjcolvar cjcolvar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for your effort on this!

@cjcolvar cjcolvar merged commit 34c0828 into main Nov 9, 2022
@cjcolvar cjcolvar deleted the filename_sanitization branch November 9, 2022 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants