Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework diplacement filter to sample-based approach #3311

Merged
merged 5 commits into from
Sep 8, 2021

Conversation

klecki
Copy link
Contributor

@klecki klecki commented Sep 3, 2021

Description

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactoring (Redesign of existing code that doesn't affect functionality)
  • Other (e.g. Documentation, Tests, Configuration)

What happened in this PR

Refactoring:

  • Get rid of unnecessary DataDependantSetup
  • Introduce SetupImpl
  • Change pass-by-pointer to pass-by-reference
  • Rework GPU Op to process flattened blocks instead
    of whole images accessed via offset to one global
    batch pointer.
  • Instead of accessing underlying contiguous
    TL buffer we access individual samples in GPU Op.
  • Rework CPU Op to use HostWorkspace
  • Keep the optimized implementation for aligned
    samples.

Fixes:

  • Fix masking in CPU Op - access it as int instead
    of bool to be consistent with the default and GPU Op.
  • Fix masking in CPU Op - copy does not try to access
    stream in CPU workspace which causes error.

Tests:

  • Add paths in water test to cover more code paths
    • prime-sized image to fall into the non-optimized
      kernel
    • mask support
    • both input types: uint8 and float32.

Signed-off-by: Krzysztof Lecki [email protected]

Additional information

  • Affected modules and functionalities:
    Old displacement filter implementation, basis for Water, Sphere and Jitter.

  • Key points relevant for the review:

Checklist

Tests

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

* Get rid of unnecessary DataDependantSetup
* Introduce SetupImpl
* Change pass-by-pointer to pass-by-reference
* Rework GPU Op to process flattened blocks instead
  of whole images accessed via offset to one global
  pointer.
* Instead of accessing underlying contiguous
  TL buffer we access individual samples in GPU Op.
* Rework CPU Op to use HostWorkspace
* Fix masking in CPU Op - access it as int instead
  of bool to be consistent with the default and GPU Op.
* Fix masking in CPU Op - copy does not try to access
  stream in CPU workspace which causes error.
* Add paths in water test to cover more code paths
   * prime-sized image to fall into the non-optimized
     kernel
   * mask support
   * both input types: uint8 and float32.

Signed-off-by: Krzysztof Lecki <[email protected]>
@klecki
Copy link
Contributor Author

klecki commented Sep 3, 2021

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2908674]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2908674]: BUILD FAILED

Signed-off-by: Krzysztof Lecki <[email protected]>
@klecki
Copy link
Contributor Author

klecki commented Sep 6, 2021

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2923073]: BUILD STARTED

Signed-off-by: Krzysztof Lecki <[email protected]>
reinterpret_cast<const typename Displacement::Param *>(raw_params);
displace.param = params[n];
__device__ __host__ inline void operator()(Displacement &displace, const void *raw_params) {
const auto *const params = reinterpret_cast<const typename Displacement::Param *>(raw_params);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cast from void* is a static_cast.

Suggested change
const auto *const params = reinterpret_cast<const typename Displacement::Param *>(raw_params);
const auto *const params = static_cast<const typename Displacement::Param *>(raw_params);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +83 to +85
const int H = sample.shape[0];
const int W = sample.shape[1];
const int C = sample.shape[2];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could be fast_div in sample.shape....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't be trying to benchmark that.

Comment on lines 95 to 97
const int c = out_idx % C;
const int w = (out_idx / C) % W;
const int h = (out_idx / W / C);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const int c = out_idx % C;
const int w = (out_idx / C) % W;
const int h = (out_idx / W / C);
int64_t idx = out_idx;
const int c = idx % C;
idx /= C;
const int w = idx % W;
idx /= W;
const int h = idx;

or at least

Suggested change
const int c = out_idx % C;
const int w = (out_idx / C) % W;
const int h = (out_idx / W / C);
const int c = out_idx % C;
const int w = (out_idx / C) % W;
const int h = (out_idx / C / W);

The way it was written, it prevented optimization of the last two divisions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@mzient mzient left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicks, mostly.

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2923073]: BUILD FAILED

Comment on lines 223 to 224
flat_block_setup_(32),
channel_block_setup_(32) {
Copy link
Contributor

@mzient mzient Sep 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just put it in the member definition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular reason, but due to the intricacies of C++, I tried it and could not use flat_block_setup_ = {32}; only, FlatBlockSetup flat_block_setup_ = FlatBlockSetup(32);. Not sure which one is nicer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I can flat_block_stup_{32}.

@@ -208,7 +219,10 @@ class DisplacementFilter<GPUBackend, Displacement,
explicit DisplacementFilter(const OpSpec &spec) :
Operator(spec),
displace_(spec),
interp_type_(spec.GetArgument<DALIInterpType>("interp_type")) {
interp_type_(spec.GetArgument<DALIInterpType>("interp_type")),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Order of initialization is wrong - and this could be an assignment inside the constructor body - the type of the variable is trivial whereas the initialization expression is not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the order.

Signed-off-by: Krzysztof Lecki <[email protected]>
Signed-off-by: Krzysztof Lecki <[email protected]>
@klecki
Copy link
Contributor Author

klecki commented Sep 8, 2021

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2937973]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2937973]: BUILD PASSED

@klecki klecki merged commit f76a60e into NVIDIA:main Sep 8, 2021
@JanuszL JanuszL mentioned this pull request Mar 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants