Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add level-3 BLAS triangular Sylvester equation solver #651

Merged
merged 8 commits into from
Oct 6, 2022

Conversation

angsch
Copy link
Collaborator

@angsch angsch commented Feb 20, 2022

Description

Currently the only way to solve the triangular Sylvester equation AX + XB = C is via dtrsyl, which solely relies in level-2 BLAS. This PR intends to contribute a level-3 BLAS version.

The triangular Sylvester equation has been recognized to be prone to overflow. For that purpose, dtrsyl utilizes a scaling factor to represent the solution as (scale^{-1} * X) and solve the scaled equation AX + XB = scale * C. Due to the scaling factor, there is some flexibility in the representation of the solution. The proposed level-3 BLAS version computes the scaling factors based on upper bounds of blocks to enable level-3 BLAS. The scaling is typically slightly more aggressive so that an alternatively scaled final solution is computed. This is no problem as long as the scaling factor does not get flushed to zero. Indeed, the assumption in Anderson's original paper on solving scaled systems is that scale is in (0,1] and not [0,1].

In experiments the flushing occasionally happens for both dtrsyl and the proposed solver dtrsyl3, but dtrsyl3 reaches the stage earlier. In view of the intended changes of the handling of NaN, inf etc. I would be grateful to get some feedback on how to address the problem.

Our paper proposes to use integer scaling factors of the form 2^{(intscale} and solve AX + XB = 2^{intscale} * C. The vastly increased representational range rules out any flushing during the computation. The problem with the scaling factors affects dlatrs and dtrevc, too. There may be more routines. Is there an interest in using integer scaling factors and changing the function signature/documentation?

Alternatively, I would revise the PR and add a workaround for the flushing problem present in the attached code.

Once the decision on the scalings is taken, I will fix the other data types.

Checklist

  • The documentation has been updated.
  • If the PR solves a specific issue, it is set to be closed on merge.

@codecov
Copy link

codecov bot commented Aug 2, 2022

Codecov Report

Base: 0.00% // Head: 0.00% // No change to project coverage 👍

Coverage data is based on head (d83c3fd) compared to base (801ac2f).
Patch coverage: 0.00% of modified lines in pull request are covered.

❗ Current head d83c3fd differs from pull request most recent head 970a772. Consider uploading reports for the commit 970a772 to get more accurate results

Additional details and impacted files
@@            Coverage Diff            @@
##           master     #651     +/-   ##
=========================================
  Coverage    0.00%    0.00%             
=========================================
  Files        1894     1904     +10     
  Lines      184078   186552   +2474     
=========================================
- Misses     184078   186552   +2474     
Impacted Files Coverage Δ
SRC/clatrs3.f 0.00% <0.00%> (ø)
SRC/ctrsyl3.f 0.00% <0.00%> (ø)
SRC/dlarmm.f 0.00% <0.00%> (ø)
SRC/dlatrs3.f 0.00% <0.00%> (ø)
SRC/dtrsyl3.f 0.00% <0.00%> (ø)
SRC/ilaenv.f 0.00% <0.00%> (ø)
SRC/slarmm.f 0.00% <0.00%> (ø)
SRC/slatrs3.f 0.00% <0.00%> (ø)
SRC/strsyl3.f 0.00% <0.00%> (ø)
SRC/zlatrs3.f 0.00% <0.00%> (ø)
... and 1 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Force compatibility with [ds]trsyl. Use two floating-point
scaling factors (rather than integer scaling factors).
This does not eliminate the problem that scalings can be flushed,
making any result useless. That problem could be eliminated
by replacing the floating-point scale factor with an integer
scale factor.
The tests of ztrsyl via zget35 are on tiny matrices that fall into the
unblocked section of ztrsyl3. Add a new test file that checks ztrsyl(3)
for larger matrices and their compatibility: Every problem that is
solvable by ztrsyl must be solvable by ztrsyl3.
An extensions of LATRS solving T*x = s*x to many right-hand sides,
allowing the usage of BLAS-3.

The new algorithm tends to use less aggressive scaling,
in particular for larger matrices.
@angsch
Copy link
Collaborator Author

angsch commented Sep 25, 2022

DLATRS3
Performance results originally presented at the BLIS retreat, and augmented with results for reference BLAS

dlatrs3

DTRSYL3

dtrsyl3

The two solvers are a drop-in replacement of the existing BLAS-2 routines and dedicate quite a bit of code to computing a similar result as the BLAS-2 counterpart.The question on whether or not the scale factor should be of the same data type as the other data structures remains. The advantage of moving to the integer scale factor alpha = 2**j is that flushing of the scale factor to zero can no longer happen as a result of limitations of the floating-point range.

I would keep the representation by the BLAS-2 and the BLAS-3 version compatible. Then the disadvantage is a change of the function signature of LATRS and TRSYL. One option is to revisit the code later when an upgrade to LAPACK 4.0 is due and then switch the representation of the solution from (1/scale) * x to (1/(2**scale)) * x,where scale is int, later.

@angsch angsch marked this pull request as ready for review September 25, 2022 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants