Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: add parallelization #34

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

perf: add parallelization #34

wants to merge 1 commit into from

Conversation

gabyx
Copy link
Contributor

@gabyx gabyx commented Jul 12, 2024

Proposed Changes

  • Add parallelization over rayon which lets you convert an iterator to a ParallelIterator which
    is parallelized over rayon's thread pool.
    Note: rio's into_iterator is abit stupid written, dont see the intention to store a full Vec<T>

  • Add a CLI flag --parallel which will use parallelization

  • If the parallelization pays out has yet to be tested. The Mutex on the Writer is not particularly good I guess. -> use slogs async writter which is buffered and is Send + Sync...

Types of Changes

What types of changes does your code introduce? Put an x in the boxes that
apply

  • A bug fix (non-breaking change which fixes an issue).
  • A new feature (non-breaking change which adds functionality).
  • A breaking change (fix or feature that would cause existing
    functionality to not work as expected).
  • A non-productive update (documentation, tooling, etc. if none of the
    other choices apply).

Checklist

Put an x in the boxes that apply. You can also fill these out after creating
the PR. If you're unsure about any of them, don't hesitate to ask. We're here to
help! This is simply a reminder of what we are going to look for before merging
your code.

  • I have read the
    CONTRIBUTING
    guidelines.
  • I have added tests that prove my fix is effective or that my feature
    works.
  • I have added the necessary documentation (if appropriate).

Further Comments

@gabyx gabyx marked this pull request as draft July 12, 2024 08:49
@gabyx gabyx requested a review from cmdoret July 12, 2024 09:05
@gabyx gabyx marked this pull request as ready for review July 12, 2024 09:05
@cmdoret cmdoret changed the title feat: add parallelization for first-pass perf: add parallelization for first-pass Jul 12, 2024
Copy link
Member

@cmdoret cmdoret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clippy settings and code cleanup are welcome additions, but the parallelization of tripsu index degrades performance on my side (see comments).

src/pass_first.rs Show resolved Hide resolved
src/rules.rs Show resolved Hide resolved
@gabyx gabyx changed the title perf: add parallelization for first-pass perf: add parallelization Jul 12, 2024
@gabyx
Copy link
Contributor Author

gabyx commented Jul 12, 2024

@cmdoret , @supermaxiste : the write call on Writer was wrong, only clippy mentioned this: -> write returns how many bytes have been written, which was ignored lol, I thought is a general error =), not all bytes being written is a possibility. So the correct function here is write_all. 👍

fix: correct all clippy errors

fix: add CLI argument to setup the thread pool

fix: CLI handling

fix: wrong argument

docs(readme): add copyright notice (#35)

fix: parallelize also second pass

fix: correct output in `write` -> `write_all`

- `write` may fail an return the number of bytes written, which is
  the wrong function.

fix: remove from first-pass since no benefit
@gabyx gabyx marked this pull request as draft July 15, 2024 14:32
@gabyx
Copy link
Contributor Author

gabyx commented Jul 15, 2024

@gabyx : This PR is draft, and should be revisited once parallelization is needed with longer work on the pseudonomization function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants