Fix quadratic algorithm in merge_streams #314

lars-t-hansen · 2023-12-14T08:50:26Z

For the longer load reports, more than 50% of the running time is spent in merge_streams, though perf is not terribly helpful at pinpointing more than that.

Adding a little debug printing we find:

ml1.hpc.uio.no: MERGING 222 STREAMS, OUTER ITER 6936
ml8.hpc.uio.no: MERGING 9746 STREAMS, OUTER ITER 8619
ml4.hpc.uio.no: MERGING 1476 STREAMS, OUTER ITER 7461
ml9.hpc.uio.no: MERGING 61 STREAMS, OUTER ITER 8631
ml7.hpc.uio.no: MERGING 661 STREAMS, OUTER ITER 8632
ml6.hpc.uio.no: MERGING 42846 STREAMS, OUTER ITER 8631
ml3.hpc.uio.no: MERGING 629 STREAMS, OUTER ITER 8631
ml2.hpc.uio.no: MERGING 24 STREAMS, OUTER ITER 8631

That's probably more streams than I expected, but there is one stream per job and this is from data for one month, so what was I thinking.

The merging is a nested loop operating on the set of sorted-by-ascending-time streams. The outer loop first finds the lowest timestamp among the timestamps at the heads of the streams, and then it collects all records across all the streams with roughly that time and removes them. It repeats until all streams are empty. The number of iterations of the outer loop will correspond roughly to the number of time steps in the window of data we're looking at - 30 days exactly with 5 minute intervals would yield 8640 time steps.

The data structure is a vector of vectors for the streams, and a vector of ints for indices into those streams.

The first loop is therefore a search across all the vectors but it could probably be some kind of priority queue - we're only looking for a single value but we have to look at all the elements in the list.

The second loop is trickier because the selection logic is far from straightforward, but again, if the streams were sorted with lowest leading timestamp first we could probably pick off the streams at the head and ignore the rest, and not loop over everything. That said, sample times will be highly correlated and in fact the second loop may pick up records from many or indeed all of the streams.

The text was updated successfully, but these errors were encountered:

lars-t-hansen · 2023-12-14T13:11:07Z

While there are micro-optimizations that apply in the inner loops of merge_streams, the essential insight is probably that there is usually a smallish number of jobs - really O(1) - that are active at any one time, and the trick to performance is to manage the set of sample streams such that we only look at the ones that are running: by looking at all streams at every time step we make the algorithm O(t^2) where t is the number of time steps. If we can concentrate on the live streams we can use the naive algorithms. The set of streams can probably be divided into three: the ones that are in the past, the ones that are running, and the ones that are in the future. Streams can be moved from one set to the other (this is awkward b/c rust and b/c these data structures are just Vec<Box>, not Vec<Rc>) and the scans will only process the set of the ones running.

All attempts at implementing this efficiently enough to beat the highly optimized naive loops have so far met with failure. I think in some cases the resulting code is so complicated that it confuses Rust's bounds check optimization, but that is really TBD.

I'll test the optimizations for the naive loops, which make a fair amount of difference on their own, and then land those and leave this bug open.

For #314 (sonarlog) - Micro-optimize merge_streams

lars-t-hansen · 2023-12-18T09:47:36Z

I merged the micro-optimizations. They give us about a 25% improvement on 30 days of data and 10% on 90 days (the big-Oh strikes back); the new output was bitwise identical to the old and there didn't seem to be any reason to hold this back.

lars-t-hansen · 2023-12-18T11:57:54Z

I think we'll back-burner what remains here until we have more data from Fox, Saga et al. We want to know what a typical input size is. It could well be that the number of streams on the ML systems is very high because these are systems without a batch queue.

lars-t-hansen · 2024-10-14T12:49:37Z

Finally some credible data from fox. Running a 90-day report remotely with the sonalyzed cache enabled takes about 3 minutes, quite a long time and much longer than for the ML systems (about 20s for the same report iirc). For betzy this will not do. We could hack around it for betzy by partitioning in various ways - entire nodes are allocated to jobs so if a node is involved with one job at one time it is not involved with the others - but it would be better to fix the algorithm somehow.

lars-t-hansen added task:enhancement New feature or request component:sonalyze sonalyze/* labels Dec 14, 2023

lars-t-hansen self-assigned this Dec 14, 2023

lars-t-hansen mentioned this issue Dec 14, 2023

Performance #313

Closed

8 tasks

lars-t-hansen pushed a commit that referenced this issue Dec 15, 2023

Fix #314 (sonarlog) - Optimize merge_streams

892f7a3

lars-t-hansen pushed a commit that referenced this issue Dec 18, 2023

For #314 (sonarlog) - Micro-optimize merge_streams

3bc5325

lars-t-hansen pushed a commit that referenced this issue Dec 18, 2023

For #314 (sonarlog) - Restructure and micro-optimize merge_streams

d066111

lars-t-hansen added a commit that referenced this issue Dec 18, 2023

Merge pull request #315 from NAICNO/larstha-314-faster-sonalyze

a1518a9

For #314 (sonarlog) - Micro-optimize merge_streams

lars-t-hansen removed their assignment Dec 18, 2023

lars-t-hansen added the pri:low label Dec 18, 2023

lars-t-hansen changed the title ~~Speed up merge_streams~~ Fix quadratic algorithm in merge_streams Dec 18, 2023

lars-t-hansen mentioned this issue Feb 14, 2024

sonalyze jobs is malloc-bound #414

Closed

11 tasks

lars-t-hansen mentioned this issue Oct 18, 2024

Long-period naicreport will not scale to fram or betzy (or maybe even saga) #620

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix quadratic algorithm in merge_streams #314

Fix quadratic algorithm in merge_streams #314

lars-t-hansen commented Dec 14, 2023 •

edited

Loading

lars-t-hansen commented Dec 14, 2023 •

edited

Loading

lars-t-hansen commented Dec 18, 2023

lars-t-hansen commented Dec 18, 2023

lars-t-hansen commented Oct 14, 2024

Fix quadratic algorithm in merge_streams #314

Fix quadratic algorithm in merge_streams #314

Comments

lars-t-hansen commented Dec 14, 2023 • edited Loading

lars-t-hansen commented Dec 14, 2023 • edited Loading

lars-t-hansen commented Dec 18, 2023

lars-t-hansen commented Dec 18, 2023

lars-t-hansen commented Oct 14, 2024

lars-t-hansen commented Dec 14, 2023 •

edited

Loading

lars-t-hansen commented Dec 14, 2023 •

edited

Loading