Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renovate SP800-53 production pipeline converting to OSCAL from docx source #25

Closed
3 tasks done
wendellpiez opened this issue Oct 1, 2020 · 15 comments
Closed
3 tasks done
Assignees

Comments

@wendellpiez
Copy link
Contributor

wendellpiez commented Oct 1, 2020

Errors in production of SP800-53 Revision 5 in OSCAL, both final and earlier (FPD) versions, show the process needs to be addressed for robustness and maintainability.

A new design will shorten and simplify the initial extraction by performing a generic conversion from the Word document (docx) into HTML, which contains all the necessary information for the OSCAL, and then processing it through a chain of cleanup and enhancement filters. By using the open-source XSweet utility, we can do this end-to-end in XSLT with no other language or application dependencies.

The new pipeline should produce valid and correct OSCAL for the current (final) version of SP800-53 Rev 5, with the same UUIDs as the presently published version (for minimum destabilization). Going forward, it should also be able to produce the same correct outputs for future revisions (with minimal adjustment and assuming consistent formatting in the input data), with UUIDs maintained or refreshed as required.

Issues #16 and #23 can also be addressed in this work, if not already resolved.

Criteria for acceptance:

  • The new pipeline (including an XSweet first step) is able to produce valid and correct OSCAL from the best-available docx version of SP800-53 rev 5.
  • Optionally, UUIDs can be preserved from an older catalog provided to the pipeline as a secondary input. The same object (e.g., a resource element) should be given the same UUID on its uuid flag. As an alternative, all UUIDs can be refreshed.
  • The pipeline has been committed to a Github repository for maintenance. (An internal NIST repository is fine inasmuch as this is not a general-purpose or general-use tool.)

Producing a valid NVD XML representation of an input catalog -- since we will no longer do this at the beginning -- is not a goal of this Issue, and should be tracked separately. It makes sense to build this as a conversion from the OSCAL produced by this pipeline.

@wendellpiez
Copy link
Contributor Author

This is nominally complete, in the sense that an XSLT pipeline in production (and committed to an internal repo) appears to produce valid and correct outputs faithfully representing SP800-53 rev 5 as published.

The file produced is now under review internally (see Gitter for share) starting with @david-waltermire-nist and @iMichaela.

@wendellpiez
Copy link
Contributor Author

Latest on this, a version that replaces "curly quotes" with markup (<q> element tagging).

@iMichaela
Copy link
Contributor

10/15/2020

@wendellpiez Here are the current findings of the first-round review. I will continue adding to the list below as I go through the document.

  • all guidance statements are missing the IDs altogether.
  • all sort-id are now represented with lower cases and not upper cases as in the other versions.
  • all the IDs for item are missing the inherited "_stm" substring of the id.
  • (line 154) missing references of AC-2
  • all internal references pointing to items that have IDs with errors need to be updated when the IDs are corrected

What are the comments with numeric values e.g.

@wendellpiez
Copy link
Contributor Author

Thanks! these can also be added to a Schematron filter to help enforce consistency in future.

@wendellpiez
Copy link
Contributor Author

wendellpiez commented Oct 16, 2020

Okay, I made these repairs. Thanks!

  • guidance statements now have IDs
  • We had sort-id all lower-case in Rev 4. I think it is better as it is explicitly case-normalized.
  • Added _stm substring to IDs on item parts (thank you)
  • Discovered that 'references' links had been inadvertently moved -- incorrectly grouped in with the last control enhancement.
  • Internal cross-references still check out so this should be okay.

A latest version will be attached here.

@wendellpiez
Copy link
Contributor Author

Zipped for your safety -
rev5-oscal-latest-20201016.zip

@wendellpiez
Copy link
Contributor Author

wendellpiez commented Oct 16, 2020

Outstanding issues for quality check:

Broken links

Three links (given as a elements) have no targets. This is formally valid (since HTML-valid), but an error. These should be eliminated or targeted.

The links are here:

  • AC-20(3) Discussion points to AC-20(6)
  • CA-7 Discussion points to SC-18c (near the end)
  • PM-31 Discussion also points to SC-18c

There is no AC-20(6) enhancement, and no item c under SC-18. The links are probably intended to indicate something nearby.

@wendellpiez
Copy link
Contributor Author

wendellpiez commented Oct 22, 2020

We have found a couple more issues:

  • In some controls, 'withdrawn' is not being picked up correctly (it's coming into a part)
  • Some diagnostic comments are coming through
  • On parameters called from other parameters, @depends-on is not being set

@wendellpiez
Copy link
Contributor Author

Now I made the Schematron I am glad I did, since there is a latest latest.

rev5-oscal-latest-20201022.zip

@wendellpiez
Copy link
Contributor Author

wendellpiez commented Oct 28, 2020

rev5-oscal-latest-20201028.zip

Latest corrections:

  • broken @href targets found and detected
    • including many internal cross references
  • parameter ids for parameters referenced in other parameters

Couple of outstanding questions:

  • Persistence of UUIDs across documents?
  • Presentation of square brackets with internal links inline?

wendellpiez added a commit to wendellpiez/oscal-content that referenced this issue Oct 30, 2020
wendellpiez added a commit to wendellpiez/oscal-content that referenced this issue Nov 4, 2020
…on rules (introducing cosmetic whitespace), so should address usnistgov#29
@wendellpiez
Copy link
Contributor Author

@iMichaela @david-waltermire-nist -- looking yet one more time I de-confused myself wrt the brackets-around-links issue, and the brackets are now gone from inline links in the SP800-53 Rev 5 (thank you Michaela). Rev 4 in any case did not have so many inline links.

Also @brianrufgsa and @tcorsa since I found that doing this made our whitespace problem worse in a few places (#29), I added new custom serialization logic at the end of the pipeline. So we are no longer using defaults (and if there are further adjustments to whitespace called for we have more leverage). (This is not actually done yet, but it is better as far as the data is concerned.)

@wendellpiez
Copy link
Contributor Author

wendellpiez commented Nov 4, 2020

Whitespace is fine but for some reason we are getting duplicate rlink elements in back matter resources.

This error appears to go back a ways (at least a couple of weeks) which would help explain why checks against regression have not caught it.

I will add a Schematron to detect this issue in future, and repair the data.

wendellpiez added a commit to wendellpiez/oscal-content that referenced this issue Nov 5, 2020
@wendellpiez
Copy link
Contributor Author

wendellpiez commented Nov 5, 2020

Update Nov 5 2020

The catalog and profiles in PR #32 look good after most recent improvements but please double check, including diffing with previous versions. Still awaiting word from FISMA team on remaining link issues.

Also from recent discussions it appears we may have work to do on the PRIVACY baseline. (Separate issue?)

So here we are:

  • resolve link issues or determine to leave them
  • review catalog one more time since most recent changes (punctuation/whitespace cleanup)
  • run profile resolution and check results
  • potentially, return to PRIVACY baseline(s) issue

@wendellpiez
Copy link
Contributor Author

Update Dec 3 2020

Presuming that builds work over PR #32 (which contains current pipeline output), and that profile resolution produces expected results, this Issue is complete.

The conversion pipelines including docx->OSCAL catalog and profiles is being maintained in an internal repository.

Development of this pipeline through the QA process for Rev 5 has suggested more use for Schematron to validate the correctness of pipeline outputs. Suggest a new Issue for this. The repo already has some Schematron; we could use more (to check on things like file reference integrity).

@david-waltermire david-waltermire linked a pull request Dec 3, 2020 that will close this issue
2 tasks
david-waltermire pushed a commit that referenced this issue Dec 4, 2020
…(introducing cosmetic whitespace), so should address #29
@david-waltermire
Copy link
Contributor

This was addressed with the merge or PR #32.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants