Long-period naicreport will not scale to fram or betzy (or maybe even saga) #620

lars-t-hansen · 2024-10-18T11:33:50Z

Already for Fox the scripts that compute the 3-month load are quite expensive, as they pull in a lot of data and the computation of system load from sonar data is superlinear (#314). Though we may be able to make the algorithm something closer to linear, the underlying strategy of recomputing everything from sample data every time we need it is not sensible in the long term. The issue is really a consequence of a very flexible query model, where we can ask for the load incurred for arbitrary users on arbitrary hosts in an arbitrary time span. This is powerful and we shouldn't abandon it, but it need not be the main engine for simpler day-to-day tasks where we compute the same result every day.

There is the possibility of caching more computed results ( #506 ) but we could also change the strategy for how we handle load data for these reports (also mentioned in #517 ). It's hard to say how urgent this is. For the time being, we'd best not implement load reports for the big iron for more than the last week.

lars-t-hansen · 2024-10-24T18:53:59Z

The main culprit is simply this:

sonalyze load \
  -fmt=json,datetime,cpu,mem,gpu,gpumem,rcpu,rmem,rres,rgpu,rgpumem,gpus,host \
  -from 90d -daily \
  ...

lars-t-hansen · 2024-10-24T18:57:28Z

The profiling strategy is:

instrument the code so that each remote command creates a new profile
erect an instrumented server on the normal data store, with a generous memory allowance
run the above query twice, once to warmup the cache, second time to get a useful profile
if the useful profile shows a lot of I/O, then increase the cache size and repeat

lars-t-hansen · 2024-10-31T12:26:53Z

Well, no surprises there:

(pprof) top 10
Showing nodes accounting for 175.20s, 96.62% of 181.32s total
Dropped 190 nodes (cum <= 0.91s)
Showing top 10 nodes out of 19
      flat  flat%   sum%        cum   cum%
   174.42s 96.19% 96.19%    174.89s 96.45%  sonalyze/sonarlog.mergeStreams
     0.51s  0.28% 96.48%      3.04s  1.68%  sonalyze/sonarlog.createInputStreams
     0.27s  0.15% 96.62%      1.91s  1.05%  runtime.scanobject
         0     0% 96.62%    178.83s 98.63%  main.(*daemonCommandLineHandler).HandleCommand
         0     0% 96.62%    178.83s 98.63%  main.(*standardCommandLineHandler).HandleCommand
         0     0% 96.62%    178.83s 98.63%  main.localAnalysis
         0     0% 96.62%    178.83s 98.63%  net/http.(*ServeMux).ServeHTTP
         0     0% 96.62%    178.83s 98.63%  net/http.(*conn).serve
         0     0% 96.62%    178.83s 98.63%  net/http.HandlerFunc.ServeHTTP
         0     0% 96.62%    178.83s 98.63%  net/http.serverHandler.ServeHTTP

Assuming the current structure, the code looks about as well optimized as it can be. As expected, about 74/174s is spent in the second inner loop "Loop across streams to find smallest head", and about 91/174s is spent near the top of the third inner loop, "Now select values from all streams", either setting up the loop or in the first test, which seems to filter out almost everything (most streams we look at will indeed start in the future, so it should). 74+91= 165, so this is the bulk of it; the rest falls to the body of the third loop below the initial filter.

lars-t-hansen added task:bug Something isn't working task:enhancement New feature or request component:sonalyze sonalyze/* labels Oct 18, 2024

lars-t-hansen self-assigned this Oct 25, 2024

lars-t-hansen added the pri:high label Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long-period naicreport will not scale to fram or betzy (or maybe even saga) #620

Long-period naicreport will not scale to fram or betzy (or maybe even saga) #620

lars-t-hansen commented Oct 18, 2024

lars-t-hansen commented Oct 24, 2024 •

edited

Loading

lars-t-hansen commented Oct 24, 2024

lars-t-hansen commented Oct 31, 2024

Long-period naicreport will not scale to fram or betzy (or maybe even saga) #620

Long-period naicreport will not scale to fram or betzy (or maybe even saga) #620

Comments

lars-t-hansen commented Oct 18, 2024

lars-t-hansen commented Oct 24, 2024 • edited Loading

lars-t-hansen commented Oct 24, 2024

lars-t-hansen commented Oct 31, 2024

lars-t-hansen commented Oct 24, 2024 •

edited

Loading