Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ipfs filestore verify process files sequentially #3922

Closed
dokterbob opened this issue May 13, 2017 · 9 comments
Closed

Make ipfs filestore verify process files sequentially #3922

dokterbob opened this issue May 13, 2017 · 9 comments
Assignees

Comments

@dokterbob
Copy link
Contributor

Version information:

go-ipfs version: 0.4.8-
Repo version: 5
System version: amd64/darwin
Golang version: go1.8

Type: Enhancement

Severity: Medium

Description:

Currently, ipfs filestore verify verifies blocks ordered by hash. This causes an immense amount of unnecessary seeks which, especially on devices with rotating disks, makes verification much slower than it could be and unncessarily stresses system resources as well as causing wear.

If, instead, blocks would be orderd by file (and order therein) this would cause sequential reads which ought to be much faster.

Having no knowledge of the underlying implementation, the question would be: what's necessary to implement this? Are there any architectural features limiting the implementation of this? Is it realistic to do a full sort before the verify procedure and, if not, what could be alternatives (i.e. partial sorting, multiple sorting queues etc)?

@kevina
Copy link
Contributor

kevina commented May 13, 2017

The easiest way to implement this would be to read all the hashes, then do a full sort based on the filename and offset. It will be memory intensive but could be implemented as an option.

@dokterbob
Copy link
Contributor Author

Good idea to make it optional. Might it not be easier/better to do the sorting on the datastore side, though?

@kevina kevina self-assigned this May 18, 2017
@kevina
Copy link
Contributor

kevina commented May 18, 2017

Sorting on the datastore side would effectively mean sorting based on the value of the key, so there isn't any real benefit there.

@whyrusleeping should I just go ahead and implement a sort?

@whyrusleeping
Copy link
Member

@kevina Hrm... I think theres something here that we can do better. Given the root hash of a file stored in the filestore, we could do a graph traversal to iterate over the blocks on disk sequentially. That seems like the right way to approach this, but now we need a way to get those hashes. This could be done by a scrape of all the hashes in the repo, checking if each is a unixfs file, and then for each unixfs file we find, check if it contains at least one filestore node.

I'm not sure if that would be more efficient than enumerating and sorting everything in the filestore, but thats my thoughts on the matter

@dokterbob
Copy link
Contributor Author

dokterbob commented May 18, 2017 via email

@kevina
Copy link
Contributor

kevina commented May 19, 2017

@whyrusleeping yes I considered that but rejected it.

The first problem is that we don't store the roots so finding them will be rather expensive. We can't tell by just the keys so we have to read the value from the flatfs datastore to determine if it is a unixfs file and then have to check if the children are in the filestore: until raw leaves becomes very common, this could be very expensive. There is also the problem @dokterbob pointed out that the performance will depend on what is stored outside the filestore.

But even if we have a list of roots stored somewhere there is another problem. This second problem is that to do it the way you suggested we could be checking the same key twice, to avoid that we need to keep some sort of data structure in memory of the keys we visited, this negates much of the memory improvident of doing a graph traversal.

This it is my opinion simply scanning the entire filestore, storing the path and key in a vector or list and then sorting that list based on the path would be the best way to handle it with our current implementation.

@whyrusleeping
Copy link
Member

Alright, you guys have convinced me. Sorting that way seems good

@kevina
Copy link
Contributor

kevina commented May 23, 2017

@dokterbob could you try out #3938 and let me know if it works okay for you. the usage would be "ipfs filestore verify --file-order". I image this will only make a difference on devices with rotating disks on SDD it might be slower as seeking isn't an issue.

@dokterbob
Copy link
Contributor Author

Thanks guys! 🥇

@NiKiZe NiKiZe mentioned this issue Jul 26, 2018
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants