Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recover datastore on corruption #15

Merged
merged 1 commit into from
Jun 3, 2018
Merged

recover datastore on corruption #15

merged 1 commit into from
Jun 3, 2018

Conversation

Stebalien
Copy link
Member

(but not if we're opening it read-only)

fixes ipfs/kubo#4981

  • Should this be this automatic?
  • Should we log (probably)?
  • Should we recover when in read-only mode (probably not)?

(but not if we're opening it read-only)

fixes ipfs/kubo#4981
@ghost ghost assigned Stebalien May 5, 2018
@ghost ghost added the status/in-progress In progress label May 5, 2018
@djdv
Copy link

djdv commented May 5, 2018

I wonder if it would be best to just warn and have recovery as a non-automatic function.
In a go-ipfs context, ipfs could warn the user of corruption and suggest|prompt them to do a repo fsck, and have fsck trigger the actual recovery.

However, if there's no drawbacks in automatically recovering, then maybe it would be better to just do it implicitly when corruption is detected.

@Stebalien
Copy link
Member Author

I think the reason this recovery function is separate is so that one can warn the user that we're recovering (may take some time). We could just suggest ipfs repo fsck but, really, I prefer to make stuff like this automatic. The fact that you have to run fsck after a filesystem corruption has always bugged me. Modern filesystems like btrfs do this for you.

OT: Given your lockfile fixes, we can probably remove ipfs repo fsck at this point (although keeping it may be a good idea regardless "just because").

@djdv
Copy link

djdv commented May 5, 2018

The fact that you have to run fsck after a filesystem corruption has always bugged me. Modern filesystems like btrfs do this for you.

Agreed, sounds good to me. The only thing on my mind was convention, but I didn't stop to think if it was a good convention. I can't think of a situation where you would not want to recover corrupted data, but users should be aware of the fact, corrupt data was detected.


we can probably remove ipfs repo fsck at this point

I believe so. If all platforms store lock files in a temporary manner (like in memory), there shouldn't be a scenario where it exists already, without an instance running.

However, it doesn't hurt to keep it for portability's sake. Without it, we would have to test Lock() on a lot of other platforms.
The current lock implementation uses the os pkg as a fallback, which should be consistent across platforms.

@kevina
Copy link
Contributor

kevina commented May 5, 2018

Should this be this automatic?

I would say only if the recovery is 100% safe and won't lead to data loss. If the damage is extensive and some data can't be recovered than it should abort and let the user deal with the situation manually.

@Stebalien
Copy link
Member Author

@kevina I would hope the recovery is safe. It looks like it just deletes and rebuilds the manifest.

@kevina
Copy link
Contributor

kevina commented May 6, 2018

Than that should be okay. I agree that we should automatically recover from stall lockfiles. My main concern is when the recovery process can make things worst, a good example is a filesystem fsck recovering deleted files, but that doesn't look like it is the case here.

@Stebalien Stebalien requested a review from kevina June 3, 2018 03:36
Copy link
Contributor

@kevina kevina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seams simple and safe enough.

@Stebalien Stebalien merged commit 0193241 into master Jun 3, 2018
@ghost ghost removed the status/in-progress In progress label Jun 3, 2018
@Stebalien Stebalien deleted the feat/recover branch August 27, 2018 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Leveldb corruption
3 participants