Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rwlock: flush and fsync #3076

Merged
merged 1 commit into from
Jan 11, 2020
Merged

rwlock: flush and fsync #3076

merged 1 commit into from
Jan 11, 2020

Conversation

efiop
Copy link
Contributor

@efiop efiop commented Jan 6, 2020

It is important that we ensure that rwlocked contents are flushed to
disk, because otherwise other rwlock readers might get outdated state.

Kudos @shcheklein for catching this one.

  • ❗ Have you followed the guidelines in the Contributing to DVC list?

  • 📖 Check this box if this PR does not require documentation updates, or if it does and you have created a separate PR in dvc.org with such updates (or at least opened an issue about it in that repo). Please link below to your PR (or issue) in the dvc.org repo.

  • ❌ Have you checked DeepSource, CodeClimate, and other sanity checks below? We consider their findings recommendatory and don't expect everything to be addressed. Please review them carefully and fix those that actually improve code or fix bugs.

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

It is important that we ensure that rwlocked contents are flushed to
disk, because otherwise other rwlock readers might get outdated state.

Kudos @shcheklein for catching this one.
@efiop efiop requested review from a user, Suor and pared January 6, 2020 21:55
Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe adding a link to the issue, @efiop ?

This one is hard to understand at first, even with the comment 😅

@efiop
Copy link
Contributor Author

efiop commented Jan 6, 2020

@MrOutis There is no issue, it is just a common practice to ensure that data hits the disc. From fsync docs:

If you’re starting with a buffered Python file object f, first do f.flush(), and then do os.fsync(f.fileno()), to ensure that all internal buffers associated with f are written to disk.

@Suor
Copy link
Contributor

Suor commented Jan 7, 2020

I don't think this is the issue. You can't get old data from fs when you don't do fsync. It is only needed if you care about disk going power off, not our case.

Sure we need to be sure python is passing things to fs, but fd.close() should handle that.

Copy link
Contributor

@Suor Suor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I suggest dropping it.

@efiop
Copy link
Contributor Author

efiop commented Jan 7, 2020

I don't think this is the issue. You can't get old data from fs when you don't do fsync. It is only needed if you care about disk going power off, not our case.

@Suor Imagine some network-fs, where you have local cache and others won't get the changes until you fsync. And well, not breaking the file on power-outage is nice too 🙂

That being said, those concerns apply to regular stage-files as well, as we rely on them to build a DAG and check for conflicts. But we are not protecting them with fsync, so this PR solves potential issues, but only partially. So far we haven't seen any related issues being reported, but usually, those types of issues are really hard to debug and often only a skilled developer that knows what to look for would think about fsync. So it is totally possible that users have encountered these issues but never reported them or never realised what those were caused by. 🤔

@Suor
Copy link
Contributor

Suor commented Jan 9, 2020

@efiop Power outage only matter for databases that serve users not for our case. I.e. a user gets ok, thinks his change was written, acts on it, but then the change is lost later because data have not been actually written to disc before power out. If there is no user then there is no discrepancy and we shouldn't protect from it.

So this is a non-issue for us.

If that is not enough then this fix won't fix it anyway since we don't flush and fsync all the other things like stage files and user code doesn't flush and fsync their outs.

@efiop
Copy link
Contributor Author

efiop commented Jan 10, 2020

@Suor As I've noted above, power outage is not our main concern, but rather wonky filesystems. And I would rather have this fsync than not. Just think of it as another level of protection.

@efiop efiop merged commit 3e97886 into iterative:master Jan 11, 2020
@Suor
Copy link
Contributor

Suor commented Jan 13, 2020

This doesn't add any new level of protection while tricking people into thinking it does. A dangerous situation.

@efiop efiop deleted the rwlock_sync branch January 13, 2020 09:23
@efiop
Copy link
Contributor Author

efiop commented Jan 13, 2020

@Suor it only protects rwlock from wonky filesystems, not other things. Not dangerous by itself.

@Suor
Copy link
Contributor

Suor commented Jan 13, 2020

@efiop file close causes a commit on NFS, same as flush, so it doesn't add any new protection for rwlock file either. Even if that was the case that would be misleading since everything else is not protected, so the fact that rwlock is fsynced doesn't protect from output being written simultaneously.

@efiop
Copy link
Contributor Author

efiop commented Jan 13, 2020

file close causes a commit on NFS, same as flush, so it doesn't add any new protection for rwlock file either.

@Suor If it was that way, no one would be doing fsync manually.

the fact that rwlock is fsynced doesn't protect from output being written simultaneously.

I don't think that is true. flock that we use for .dvc/lock doesn't suffer from those issues AFAIK, and rwlock is now covered, so the only place where we are not covered is stage file dumps. So I agree that this PR is not crucial, but it doesn't hurt either. Please feel free to create a PR to revert it if you disagree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants