Reduce memory usage for timeseries jac computation #1001

johnjasa · 2023-10-12T17:22:23Z

Summary

After a good amount of digging into cases that scaled poorly as number of procs increased, I found a toarray() call that was turning a sparse array into a dense before going back to sparse. In the case of @kanekosh's run script with relatively high num_segments and a memory-expensive parallel ODE, this caused a large increase in memory usage during setup().

This new implementation keeps the jac in sparse format throughout. I've changed it in the two places where it happened -- and maybe those files could be combined into one? Or @robfalck were they two separate files on purpose?

Prior setup() mem usage:

Mem usage for setup() with the fix:

This PR does not address potentially large memory usage in final_setup() as that is a separate but related issue regarding how PETScVectors are created and used. We should further discuss if we have action items there.

Related Issues

Resolves Running out of memory with MPI (parallel trajectory optimization) OpenMDAO#3012

Backwards incompatibilities

None

New Dependencies

None

kanekosh · 2023-10-12T22:25:07Z

Thank you @johnjasa ! This fixes the issue I was facing.

Here is a summary of the memory usage for my Dymos+OAS case. The memory usage is shown in % of the total memory I have on my machine (64GB).

Before this fix (Dymos 1.9.0)

n_procs = 1
setup:          12.1%
run_model:       5.5%
compute_totals: 10.6%

n_procs = 2
setup:          20.6%
run_model:       5.2%
compute_totals: 11.0%

n_procs = 4
setup:          37.6%
run_model:       5.6%
compute_totals: 12.0%

n_procs = 8
setup:          72.0%
run_model:       7.2%
compute_totals: 13.6%

n_procs = 16
runs out of memory during setup

And with this fix:

n_procs = 1
setup:           3.8% 
run_model:       4.4%
compute_totals: 10.6% 

n_procs = 2
setup:           4.2%
run_model:       5.2%
compute_totals: 11.0%

n_procs = 4
setup:           4.8%
run_model:       5.6%
compute_totals: 12.0%

n_procs = 8   
setup:           6.3%
run_model:       7.2%
compute_totals: 13.6%

n_procs = 16
setup:           9.6%
run_model:      11.2%
compute_totals: 16.0%

As far as I observe,final_setup() uses approximately the same amount of memory as run_model and is not a bottleneck. But I've only monitored memory from top with 0.1sec frequency, so I might have overlooked if there was any "spike" during final_setup.

robfalck · 2023-10-13T16:10:03Z

dymos/transcriptions/pseudospectral/components/pseudospectral_timeseries_output_comp.py


        if rate:
            mat = self.differentiation_matrix
        else:
            mat = self.interpolation_matrix

-        for i in range(size):
-            if _USE_SPARSE:


The _USE_SPARSE setting was just used to quickly toggle and check performance between sparse and dense implementations. Let's remove the _USE_SPARSE variable and just assume that we always do so. Any performance benefits of dense are typically only present in smaller problems.

coveralls · 2023-10-16T18:01:50Z

coverage: 92.55% (+0.02%) from 92.53% when pulling fba061d on johnjasa:jac_sparse_fix into 9e02030 on OpenMDAO:master.

johnjasa added 2 commits October 12, 2023 11:54

jac fix for timeseries derivs

06607da

Also updated analytic phase deriv calc

074bde7

johnjasa requested a review from robfalck October 12, 2023 17:23

robfalck requested changes Oct 13, 2023

View reviewed changes

johnjasa added 3 commits October 16, 2023 11:23

Removed use_sparse flag

04530a9

Merging

8bdf903

pycodestyle updates

fba061d

robfalck approved these changes Oct 16, 2023

View reviewed changes

robfalck merged commit 2953f09 into OpenMDAO:master Oct 16, 2023
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory usage for timeseries jac computation #1001

Reduce memory usage for timeseries jac computation #1001

johnjasa commented Oct 12, 2023 •

edited by robfalck

Loading

kanekosh commented Oct 12, 2023

robfalck Oct 13, 2023

coveralls commented Oct 16, 2023

Reduce memory usage for timeseries jac computation #1001

Reduce memory usage for timeseries jac computation #1001

Conversation

johnjasa commented Oct 12, 2023 • edited by robfalck Loading

Summary

Related Issues

Backwards incompatibilities

New Dependencies

kanekosh commented Oct 12, 2023

robfalck Oct 13, 2023

Choose a reason for hiding this comment

coveralls commented Oct 16, 2023

johnjasa commented Oct 12, 2023 •

edited by robfalck

Loading