Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ipfs add performance for large trees of files & directories #6523

Open
dirkmc opened this issue Jul 16, 2019 · 71 comments
Open

Improve ipfs add performance for large trees of files & directories #6523

dirkmc opened this issue Jul 16, 2019 · 71 comments
Labels
kind/enhancement A net-new feature or improvement to an existing feature topic/meta Topic meta

Comments

@dirkmc
Copy link
Contributor

dirkmc commented Jul 16, 2019

Calling ipfs add on a directory with a large tree of sub-directories and files is slow. This use case is particularly important for file-system based package managers.

Background

IPFS deals with immutable blocks. Blocks are stored in the blockstore.
The UnixFS package breaks files up into chunks, and converts them to IPLD objects.
The DAG Service stores IPLD objects in the blockstore.

The Mutable File System (MFS) is an abstraction that presents IPFS as a file system. For example consider the following directory structure with associated hashes:

animals         <Qm1234...>
  land          <Qm5678...>
    dogs.txt    <Qm9012...>
    cats.txt    <Qm3456...>
  ocean         <Qm7890...>
    fish.txt    <Qm4321...>

If the contents of fish.txt changes, the CID for fish.txt will also change. The link from ocean → fish will change, so the CID for ocean will change. The link from animals → ocean will change so the CID for animals will change. MFS manages those links and the propagation of changes up the tree.

Algorithm

ipfs add uses the MFS package to add files and directories to the IPFS blockstore. To add a directory with a large tree of sub-directories and files:

  • Create an MFS root for the root directory (animals in the example above)
  • Recurse through the directory structure in "depth first search" fashion.
    For each directory
    • Create a corresponding empty directory in MFS eg animals/ocean
      This adds the empty directory to the blockstore.
    • For each file in the directory eg animals/ocean/fish.txt
      • Read the file contents
      • Convert the contents into a chunked IPLD node
      • Add the IPLD Node and all its chunks to the blockstore
      • Create the directory in MFS, if it doesn't exist (†) eg animals/ocean
      • Add the IPLD Node representing the file to MFS at the correct path (eg animals/ocean/fish.txt)
        Note: This again adds the IPLD Node root to the blockstore
  • Recurse through the MFS representation of the directory structure
    • For each directory, call directory.GetNode()
      Note that at this stage, the links to files in the directories have been created, so the directory created here will have a different CID than the empty directory created before the files were added. Calling directory.GetNode() (confusingly) writes the directory with links to files to the blockstore

(†) Although we've already created the directory, it's necessary to again ensure it exists before adding the file, because after processing every 256k files, the MFS internal directory cache structure is dereferenced to allow for golang garbage collection

Areas for Improvement

  • The IPLD Node root for a file is added to the blockstore twice
    • When the file is converted to an IPLD node
    • When the file is added to MFS
    • Note: @Stebalien points out that this mitigated by the fact that we check if we already have a block before writing it (in the blockservice itself)
  • The MFS directory structure is kept in memory
  • Recursion over the directory structure happens twice
    • While reading the structure from input
    • While writing out the directories to the blockstore
  • The progress indicator pauses for a long period when it is almost at 100% while the directories are being written to the blockstore

Proposed Improvements

The above issues would be mitigated if we interact directly with the UnixFS API instead of with the MFS API:

  • Recurse once over the directory structure
  • Add files as we go
  • Add directories once all their files have been added

Future Work

It has been noted that disk throughput and CPU usage are not close to maximum while adding large numbers of files. Future work should focus on analyzing these findings.

@dirkmc dirkmc added the kind/enhancement A net-new feature or improvement to an existing feature label Jul 16, 2019
@dirkmc
Copy link
Contributor Author

dirkmc commented Jul 16, 2019

@Stebalien & @magik6k this analysis came from spelunking through the source code. Please let me know if there's anything there that doesn't sound right. In particular I'd like a second opinion on the theory (untested) that branches of the directory tree may be missed when writing the directories to the blockstore

@Stebalien
Copy link
Member

This is mostly correct.

The IPLD Node root for a file is added to the blockstore twice

This should be slightly mitigated by the fact that we check if we already have a block before writing it (in the blockservice itself).

The second recursion over the directory structure will therefore miss these branches and they will not be written to the blockstore

Not quite. We flush everything first. Unfortunately, this does mean that we end up storing a bunch of intermediate directories that we don't need.


For some context, we primarily use MFS to make it easy to pause for GC. However, honestly, that probably isn't worth it given the performance hit. I agree we should explore using unixfs directly.

@dirkmc
Copy link
Contributor Author

dirkmc commented Jul 16, 2019

We flush everything first

Do you mean when we call root.FlushMemFree()? It calls dir.Flush() but AFAICT that will only Flush the root dir (not any child dirs)

@Stebalien
Copy link
Member

It will recursively flush. Flush calls dir.GetNode() which calls dir.sync() which walks all the cached entries, calling GetNode on each.

Finally, it calls GetNode() on the unixfs directory itself then adds that to the DAGService.

@Stebalien
Copy link
Member

(unless I'm missing something)

@dirkmc
Copy link
Contributor Author

dirkmc commented Jul 16, 2019

Ah yes you're right, thanks 👍

I'll remove that section from the original issue above.

@dirkmc
Copy link
Contributor Author

dirkmc commented Jul 16, 2019

@Stebalien given this analysis, my uneducated guess is that changing ipfs add to use the UnixFS API directly is going to give us a moderate performance improvement, maybe on the order of 10% - 20%. Would your guess fall somewhere in that range?

@magik6k
Copy link
Member

magik6k commented Jul 16, 2019

We should measure how much datastore is called when adding stuff with different layouts.

  • Count Put/Get/Has when adding
    • Many tiny files (ideally above the 256k treshold, but that may not be practical)
    • 1000 100kb files
      • in lots of directoires
    • 100 1mb files
    • 10 10mb files
    • 1 100mb file
    • (some variations based on directory depth)

My bet would be that for large directories / lots of directories, Has calls will create a significant overhead.

@magik6k
Copy link
Member

magik6k commented Jul 16, 2019

Another note on using unixfs directly - It would make it easier to add small files in parallel - each call to addFile could try to buffer say 100kb of data, and if an entire file fits in that amount of data, add it async (which, when adding lots of small files should help with waiting on IO)

@dirkmc
Copy link
Contributor Author

dirkmc commented Jul 16, 2019

I was thinking along those lines - maybe a queue for incoming files and a queue for outgoing with backpressure in each

@dirkmc
Copy link
Contributor Author

dirkmc commented Jul 16, 2019

we primarily use MFS to make it easy to pause for GC

@Stebalien can we just add the current root for any in-progress ipfs add to bestEffortRoots?

@Stebalien
Copy link
Member

Probably? We really do need some kind of internal in-memory pinning system (which shouldn't be too hard). The primary issue is that the importer would need to be GC aware so it can pause.

@dirkmc
Copy link
Contributor Author

dirkmc commented Jul 16, 2019

We can just do something similar to how ipfs add works now, right, ie check periodically if GC has been requested, and if so, create a temporary fake tree and pin the root.

@Stebalien
Copy link
Member

Yeah, you're right.

@Kubuxu
Copy link
Member

Kubuxu commented Jul 17, 2019

If there is a significant Has performance hit, the bloom filter cache in blockstore will help a lot.

@magik6k
Copy link
Member

magik6k commented Jul 17, 2019

In most cases where we call Has while adding, the thing is already in the blockstore (likely in all cases), so Bloom filter won't really help here since we still need to check in the underlying blockstore.

(hack / poc idea: check if has always returns True when adding, if that's the case, create blockstore wrapper which replaces Has with hardcoded return true. If that works we can quickly measure the overhead of all the has calls, and see if we're fixing hhe right thing)

@Kubuxu
Copy link
Member

Kubuxu commented Jul 17, 2019

Has is called before adding a block to blockstore (always). If you always return true it won't add the files to datastore.

@magik6k
Copy link
Member

magik6k commented Jul 18, 2019

Yeah, that happens in blockservice, and it's easy to disable that check there (or just override Has in blockservice)

(https://github.com/ipfs/go-blockservice/blob/master/blockservice.go#L141, https://github.com/ipfs/go-blockservice/blob/master/blockservice.go#L173)

@dirkmc
Copy link
Contributor Author

dirkmc commented Jul 18, 2019

These data were generated with this script running on my laptop. It creates count x <file size> files of random data in a single directory (in memory), creates an Adder with an in-memory datastore and calls AddAllAndPin() on the directory.

       Name          Has      Get      Put   Objects      In-Memory         Badger
 1 x      100M:     1230        3      411         2    281.674222ms     301.512398ms
 10 x      10M:     1248       12      417        11    250.425109ms     393.516821ms
 100 x      1M:     1518      102      507       101    260.626669ms     404.573155ms
 1024 x   100k:     3090     1026     1031      1025    375.101811ms     647.932807ms
 10240 x   10k:    30738    10242    10247     10241    1.314768727s     3.186459614s
 20480 x    5k:    61458    20482    20487     20481    3.366398431s     6.938569625s
 51200 x    2k:   153618    51202    51207     51201    27.019718374s    36.833479179s
 68266 x  1.5k:   204816    68268    68273     68267    1m2.91561757s    1m38.99733823s
 81920 x 1.25k:   245778    81922    81927     81921    2m5.529349576s   2m52.524424918s
 102400 x   1k:   307218   102402   102407    102401    4m6.188296442s          †

Edit: I added the timings for Badger DB
(†) The final Badger test timed out after 10 minutes

Objects is the number of Objects output to the response stream.

@Kubuxu
Copy link
Member

Kubuxu commented Jul 18, 2019

We really need tracing in those code paths, this would let us know what is the precise cause.

It might be worth comparing that with badger.

@dirkmc
Copy link
Contributor Author

dirkmc commented Jul 18, 2019

@Kubuxu good call 👍

I'm continuing to work on generating some useful profiling information, I just wanted to get this out here as a start as @magik6k had some questions on the how many calls we're making to Has/Get/Put

@dirkmc
Copy link
Contributor Author

dirkmc commented Jul 18, 2019

Also please let me know if there's other particular things you'd like to see stats about, you guys are more familiar with where bottlenecks most likely to be

@Kubuxu
Copy link
Member

Kubuxu commented Jul 18, 2019

You might also want to get stats from localhost:5001/debug/metrics/prometheus it will have stats on how much time total was spent in datastore in a given call.

@dirkmc
Copy link
Contributor Author

dirkmc commented Jul 18, 2019

I added the timings for Badger DB to the table in my comment above

@Kubuxu
Copy link
Member

Kubuxu commented Jul 18, 2019

Very interesting.

@dirkmc
Copy link
Contributor Author

dirkmc commented Jul 19, 2019

I wanted to test the impact of adding many files in a directory tree with a varying branch out factor.
For example 16 files in a directory tree with

  • a branch out factor of 4 would have 2 levels of directories
  • a branch out factor of 2 would have 4 levels of directories
    (all the files are at the leaves of the directory tree)

It doesn't seem to have a big performance impact.

In these graphs the branch out factor (items / directory) is on the x-axis and duration is on the y-axis, where each line represents a <total number of files> x <file size>:

Screen Shot 2019-07-19 at 1 52 29 PM

These graphs show Has / Get / Put / Object counts for 68,266 x 1.5k files and for 102,400 x 1k files, for several different branch-out factors (x-axis)

Screen Shot 2019-07-19 at 2 27 27 PM

(Spreadsheet / Code)

@dirkmc
Copy link
Contributor Author

dirkmc commented Jul 21, 2019

I ran some simulations for varying numbers of files of size 1k, with a few different branch-out factors. Again branch-out factor doesn't appear to make a difference, and it appears that time complexity increases polynomially (Spreadsheet - See Sheet 2).

Screen Shot 2019-07-21 at 11 26 27 AM

@warpfork
Copy link
Member

Are there any good comparisons we could make with things that are non-ipfs, to get some other baselines to contrast things with?

I'm loving these graphs, and they're super awesome for helping us see some things about our own code paths, but after ogling for a while, I realized the number of zeros on most of the axes are hard for my brain to parse -- are these comparable to how fast a plain filesystem would flush? If there's a constant factor difference, that's useful to know; the rough order of mag of that constant factor would also be useful to know.

(plain filesystem flush might not be a fair comparison -- we're hashing, etc, And That's Different; maybe 'git add' or operations like that would be better? There's also a pure golang git implementation out there, which could be amusing if we want to avoid comparing directly to one of the most hyper-optimized c things on the globe.)

@aschmahmann
Copy link
Contributor

aschmahmann commented Oct 23, 2019

Did some analysis here and I'm having trouble reproducing a slow ipfs add.

Definition: A slow ipfs add is one where recursively adding a directory D takes more than 50% longer than directly copying the D (e.g. cp -r, dd, robocopy, etc.). In practice I've rarely seen anything more than 10-20%, but 50% should protect us from measurement variance.

Test setup: I've been running the tests on an 8TB HDD with the benchmark below, on Windows 10:

CrystalDiskMark 6.0.2 x64 (C) 2007-2018 hiyohiyo
                          Crystal Dew World : https://crystalmark.info/
-----------------------------------------------------------------------
* MB/s = 1,000,000 bytes/s [SATA/600 = 600,000,000 bytes/s]
* KB = 1000 bytes, KiB = 1024 bytes

   Sequential Read (Q= 32,T= 1) :   252.674 MB/s
  Sequential Write (Q= 32,T= 1) :   252.276 MB/s
  Random Read 4KiB (Q=  8,T= 8) :     1.329 MB/s [    324.5 IOPS]
 Random Write 4KiB (Q=  8,T= 8) :     1.219 MB/s [    297.6 IOPS]
  Random Read 4KiB (Q= 32,T= 1) :     1.287 MB/s [    314.2 IOPS]
 Random Write 4KiB (Q= 32,T= 1) :     1.142 MB/s [    278.8 IOPS]
  Random Read 4KiB (Q=  1,T= 1) :     0.756 MB/s [    184.6 IOPS]
 Random Write 4KiB (Q=  1,T= 1) :     1.055 MB/s [    257.6 IOPS]

  Test : 1024 MiB [E: 0.0% (0.3/7452.0 GiB)] (x3)  [Interval=5 sec]
  Date : 2019/10/07 15:20:04
    OS : Windows 10 Professional [10.0 Build 18362] (x64)

Test 1: Add arch repo to go-ipfs 0.4.22 and also copy paste the repo

I used WSL to rsync the repo, but because of WSL weirdness the directory symlinks are annoying to work with so I only interacted with the pool directory (which holds 80% of the data as shown in ipfs-inactive/package-managers#79).

Note: robocopy is generally considered a pretty fast Windows copy-paste utility, but if someone has a better alternative I'm open to it.

Folder Size: 38.6 GB

Powershell:

ipfs init --profile=badgerds --empty-repo
ipfs pin rm QmUNLLsPACCz1vLxQVkXqqLX5R1X345qqfHbsf67hvA3Nn #removes only pin
ipfs repo gc

Measure-Command{ipfs22 add -r --offline --silent .\addtest\arch\arch\pool\};Measure-Command {robocopy /MIR /NFL /NDL .\addtest\arch\arch\pool\ .\testarch\} 

Results (minutes:seconds):
ipfs add : 08:48
robocopy: 08:33

Test 2: Add extrapolated arch repo to go-ipfs 0.4.22 and also copy paste the repo

I tried running @dirkmc s tests above and got similar results to what he did on both the add and copy-paste. I then modified the tests to more precisely follow the distribution from ipfs-inactive/package-managers#79. I unfortunately neglected to include the functionality allowing file sizes to be non-exponents of 2. I doubt that will really change the results, but I will post an update after I re-run the code.

Folder parameters:
Max directory width : 32
Number of files : 256k
Resulting directory size : 319GB

Powershell:

ipfs init --profile=badgerds --empty-repo
ipfs pin rm QmUNLLsPACCz1vLxQVkXqqLX5R1X345qqfHbsf67hvA3Nn #removes only pin
ipfs repo gc

Measure-Command{ipfs22 add -r --offline --silent .\addtest\256k\};Measure-Command {robocopy /MIR /NFL /NDL .\addtest\256k\ .\test256k\}

Results (hours:minutes:seconds):
ipfs add : 01:28:02
robocopy: 01:23:31

Update: Ran the code again with some variance of file sizes (for each file of size 2^i uniformly chose a size between 2^i +/- 2^i-1). For a 312GB directory was:

ipfs add : 01:20:24
robocopy : 01:11:10

@Kubuxu
Copy link
Member

Kubuxu commented Oct 23, 2019

@aschmahmann Windows is quite known for its slow filesystem. I would suggest trying on Linux.

@aschmahmann
Copy link
Contributor

@Kubuxu while it's possible that Windows is not experiencing issues that OS X/Linux do either bc the file system is slow, or bc there's some bug in OS X/Linux not present in Windows, I have yet to see any evidence that this is a problem on OS X/Linux either.

Additionally, the arch test is basically the same as the one @dirkmc and @andrew ran and runs in a similar amount of time. So at least at the 40 GB range the speed differences don't appear to be super noticeable.

@dirkmc
Copy link
Contributor Author

dirkmc commented Oct 24, 2019

@aschmahmann could you try replicating this walk-through that Andrew wrote up: ipfs-inactive/package-managers#18

@aschmahmann
Copy link
Contributor

aschmahmann commented Oct 24, 2019

@dirkmc I can try, but it's going to be a little rough depending on how accurate I want the replication to be. I alluded to above I rsynced the Ubuntu repo (1.2 TB) in WSL which took quite a long time, but the WSL links won't resolve in Windows. I'm going to see if there's a way I can do one of:

A) Turn the WSL links to Windows links and use IPFS on Windows
B) Try to get WSL data into Linux and run the whole thing in Linux
C) Just use IPFS in WSL, but after doing some performance checks blocked by #4242 (although maybe WSL2 would work)

@aschmahmann
Copy link
Contributor

Test 3: Add entire ubuntu repo to go-ipfs 0.4.22 and also copy paste the repo

I used deleted the WSL links and rsync'd as admin and got normal Windows links! (Windows normally requires admin to create symlinks, it doesn't seem like this shouldn't be necessary in Windows 10 developer mode but occurred nonetheless). So with my now Windows compatible WSL-rsync'd ubuntu repo...

Folder parameters:
Number of files : 958,560
Number of directories: 59,057
Resulting directory size : 1.24TB

Powershell:

ipfs init --profile=badgerds --empty-repo
ipfs pin rm QmUNLLsPACCz1vLxQVkXqqLX5R1X345qqfHbsf67hvA3Nn #removes only pin
ipfs repo gc

Measure-Command{ipfs22 add -r --offline --silent .\ubuntu\};Measure-Command {robocopy /MIR /NFL /NDL .\ubuntu\ .\testubuntu}

Results (hours:minutes:seconds):
ipfs add : 05:34:06
robocopy: (will test tomorrow, but expected to have similar performance since IPFS Add is running at the same ~60MB/s rate as it and robocopy did in the other tests)

Looks like ipfs add performance time is pretty comparable to copy paste. While it's possible we might be able to achieve better (e.g. since we're using a database and don't need many small files), this seems out of scope since as long as add performance is reasonable what we really need to work on is update performance (which @djdv is so kindly working on via his ipfs mount implementation work).

Something I noticed was that towards the very end of the test I came back to my machine and noticed that my RAM was maxed out (I have only 16GB of ram), I was hitting many hard faults, and disk read speeds were very low. I did some brief looking at the ram with VMMap and it looks like a lot of it is from Badger (1.3 TB of mapped files, but also 10GB of ram allocated). It looks like we're probably using more ram than we should, but not necessarily by a huge margin. For those of you more experienced then me at looking through these ram allocation things I've attached a dump a took towards the end of the program running ipfs22.txt.

Given that this issue was filed because of the ubuntu add experiment being slow and that I'm having trouble replicating it on reasonable hardware I recommend we pause this issue for now until a user comes back to us with a data set that they're having trouble importing.

Note, I haven't tested this on Linux or OS X yet, but as @Kubuxu mentioned Windows is supposed to be the slow one and it's doing just fine 😄. If you feel like testing on another OS is important here, please post away and we can discuss.

@Stebalien @dirkmc thoughts?

@Stebalien
Copy link
Member

Two possible reasons:

  1. We've updated badger since those tests were run.
  2. "Syncing" may behave differently. I wonder if badger is unsafe on windows? Note: if this turns out to be the reason, we may be able to achieve the same thing by using one massive transaction for the entire add process on linux. That should prevent us from writing blocks synchronously, as far as I know.

Note: WRT linux/windows filesystem operations, badger shouldn't hit the windows slowness issues as it doesn't perform tons of filesystem operations. Instead, it just writes to mmapped files.

@dirkmc
Copy link
Contributor Author

dirkmc commented Oct 25, 2019

It would be great if it's performing well enough that we can reprioritize other projects above this one, I'm glad you did this research.

It would be good to understand why the use case Andrew was looking at was so slow. It looks like he was doing this on Ubuntu, so maybe there's an issue on Linux that doesn't manifest on Windows? It's also possible that the hardware he was using wasn't up to the task, perhaps because it was severely memory limited or had a very slow disk.

@aschmahmann
Copy link
Contributor

aschmahmann commented Oct 25, 2019

It looks from ipfs-inactive/package-managers#18 like his machine had plenty of 32GB of ram (my initial though too). No idea what the drive performance was though, it was a cloud box but data wasn't in my face when I went to the website.

If I have time will try and run this on Ubuntu over the weekend (just got to get my machine set up). I'll probably use ntfs drivers instead of ext4 to avoid another 6hr file copy though (unless we think ext4 is potentially the problem).

@magik6k
Copy link
Member

magik6k commented Oct 26, 2019

Note that NTFS on Linux has a way different performance characteristic than Ext4. From the datastore benchmarking dataset (done on c5d instance on aws, nvme ssd, not the greatest / cleanest dataset, but gives the idea) - https://ipfs.io/ipfs/QmPjUTbqYAsHfeuSsqZtPLZueH4r9oPnyj1kKckca9XANJ

Badger:


FlatFs:


LevelDB:

Also some notes:

  • In me experience Windows is really slow when it comes to small IOs.
    • WSL being way slower that that
  • If we are testing adding small files, those tests should happen on fast SSDs, which is where the problem is most visible, testing on HDDs with 1000x less IOPS than a decent SSD won't reveal any problems

@aschmahmann
Copy link
Contributor

@magik6k

If we are testing adding small files, those tests should happen on fast SSDs, which is where the problem is most visible, testing on HDDs with 1000x less IOPS than a decent SSD won't reveal any problems

If the problem only exists on SSDs then according to @Stebalien it's fine to deprioritize this issue, since if we're really focusing on 1TB+ package mirrors they will likely be stored on HDDs (the issue that spun this off ipfs-inactive/package-managers#18 that also used HDDs).

Unless I'm misreading your first graph (add performance on badger) the results are confusing. It looks like NTFS is by far the fastest and for some reason ext4 performance is by far the worst. Is this accurate? If so then perhaps we're dealing with badger+ext4 issues.

@magik6k
Copy link
Member

magik6k commented Oct 27, 2019

It looks like NTFS is by far the fastest and for some reason ext4 performance is by far the worst

That's the correct observation, though it may indicate some issues with how ntfs implementation on Linux handles sync writes (given that it's faster than everything else too)

If the problem only exists on SSDs then according to @Stebalien it's fine to deprioritize this issue, since if we're really focusing on 1TB+ package mirrors they will likely be stored on HDDs

That's fair, and I agree with this decision, but we should keep in mind that there are scenarios where we can still get better.

@aschmahmann
Copy link
Contributor

aschmahmann commented Oct 28, 2019

Test 1-ext4: Add arch repo to go-ipfs 0.4.22 and also copy paste the repo

I rsync'd the data from the NTFS partition to a blank ext4 partition on the same drive. Then tried ipfs add with Badger, and cp -a. This is running Ubuntu 19.10 on the same machine as the other tests.

Powershell:

ipfs init --profile=badgerds --empty-repo
ipfs pin rm QmUNLLsPACCz1vLxQVkXqqLX5R1X345qqfHbsf67hvA3Nn #removes only pin
ipfs repo gc

Measure-Command {ipfs22 add -r --offline --silent ./arch/}
Measure-Command {cp -a ./arch/ ./testarch/}

Results (hours:minutes:seconds):
ipfs add : 0:24:16
ipfs add with sync off: 0:11:00
ipfs add with sync + journaling off: 0:10:16
ipfs add using flatfs: 1:24:57
cp -a: 0:8:12

This seems to match with our expectations of ext4+badger being very slow and not with @dirkmc and @andrew 's results. @dirkmc any idea if the partitions were ext4 formatted and if they were running on HDDs or SSDs?

Note: Given how poorly ext4+badger performed compared to NTFS in @magik6k 's benchmark we might have expected an even larger gap than just 3x. @magik6k were the NTFS results collected on Linux or Windows? Also, of course those benchmarks were done on SSDs (which badger is optimized for) instead of HDDs so they may not translate super well. Do we have HDD benchmarks or the code to generate them?

Addendum: I've tried a number of different tests to poke at this (flatfs+ext4 is even slower than badger despite the benchmarks, copy-paste to NTFS is fast, but adding to NTFS from Linux is slow, etc.). It looks like this may have to do with Windows vs Linux, where ipfs+badger+Windows is fine, but ipfs+badger+Linux is slow. Not yet sure if it's us using badger improperly or badger issues. A big 👍 for developers using multiple OSes or we probably don't learn things like this.

@dirkmc
Copy link
Contributor Author

dirkmc commented Oct 28, 2019

Agreed, these are really useful findings, thanks Adin.

My tests were all done on my mac with an SSD and plenty of RAM. I'm not sure about the characteristics of the machine Andrew was using originally.

@magik6k
Copy link
Member

magik6k commented Oct 28, 2019

were the NTFS results collected on Linux or Windows?

That was on one of the Debian versions AWS provided at the time, not sure exactly which one

@aschmahmann
Copy link
Contributor

aschmahmann commented Oct 29, 2019

@dirkmc @magik6k thanks for the info.

I ran more tests and Ubuntu 19.10+badger+ext4 performs much much better if you turn off syncwrites in badger (doable as an IPFS config option), and performance continues to increase if you turn off ext4 journaling (I updated results in the post above).

Test 3-ext4: Add entire ubuntu repo to go-ipfs 0.4.22

I then went and ran the ubuntu repo test after turning off sync writes and ext4 journaling. For those interested in turning off journaling I followed the slightly wrong instructions at https://foxutech.com/how-to-disable-enable-journaling/. Instructions for posterity:

MyExt4MountPt = /media/me/myextpartiton
MyExt4Partition = /dev/sda3
#umount MyExt4MountPt 
#tune2fs -O ^has_journal MyExt4Partition
#e2fsck –f MyExt4Partition #optional health check, bc why not

reboot

#dmesg | grep EXT4 #this should show the MyExt4Partition has journaling disabled

Powershell:

ipfs init --profile=badgerds --empty-repo
ipfs pin rm QmUNLLsPACCz1vLxQVkXqqLX5R1X345qqfHbsf67hvA3Nn #removes only pin
ipfs repo gc
#manually edited IPFS config file to use "syncWrites" : false, could probably have used the cli

Measure-Command{ipfs22 add -r --offline --silent .\ubuntu\}

Results (hours:minutes:seconds):
ipfs add : 05:51:34

This super snazzy result (plus some profiling I did with pprof - thanks @dirkmc for the easy setup) indicate that the problem is that linux+ext4+badger is really bad with syncwrite turned on. There are a number of ways we can deal with this going forward.

Two main camps going forward are probably:

  1. Figure out why this is happening, is it linux or ext4 (or the combination) that's causing the badger problem. Do they have a bug or is it system to how linux and/or ext4 work? This issue seems to be what we're running in to Why on linux slower than on win10? dgraph-io/badger#1084
  2. Who care why it happens, let's just make our transactions better so that we effectively emulate syncwrite = false when doing ipfs add.

My plan is to go for option 2 as it is the fastest way to get results. Have also posted on the badger issue and maybe they'll have some advice and/or a fix.

@dirkmc
Copy link
Contributor Author

dirkmc commented Oct 29, 2019

Amazing ❤️

@momack2
Copy link
Contributor

momack2 commented Apr 15, 2020

@aschmahmann - I created a placeholder graph here for us to organize this information. Can you help add in the right data for 0.5 vs 0.4.23 using badger vs flat fs on linux?

@phillmac
Copy link

phillmac commented Aug 7, 2020

Just my 2¢, Is it possible to include benchmarks for both ext4, btrfs and optionally ntfs underlying filesystems? I'm in the position of deploying some VMs dedicated purely to ipfs related tasks, and having a recommended file system type for maximum performance tuning seems appropriate and useful imho.
Edit: Or if a recommendation isn't a good idea at least a comparison so that an informed choice can be made

@Stebalien
Copy link
Member

Unfortunately, running these benchmarks is a manual process so we're not going to be able to re-run them on all filesystems. However, if you want to try a multi-filesystem benchmark, I'd be interested in the results.

(in general, I'd recommend against NTFS).

@alexlatif
Copy link

Has this been abandoned?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement A net-new feature or improvement to an existing feature topic/meta Topic meta
Projects
None yet
Development

No branches or pull requests