-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All hands on deck: tool doc updates #3853
Comments
I've tentatively categorized the tools and they are listed in speadsheet format at: https://docs.google.com/a/broadinstitute.org/spreadsheets/d/19SvP6DHyXewm8Cd47WsM3NUku_czP2rkh4L_6fd-Nac/edit?usp=sharing
To view docs, build with |
@vdauwera The tools are categorized and listed in the Google Spreadsheet above. It is waiting for you to assign tech leads to tools for documentation. One thing that @chandrans brought to my attention is that for BaseRecalibrator one of the parameters (
What the gatkDocs look like as of commit of
|
Geraldine says she is busy catching up this week so I think it best if tech leads assign the tools to members of their teams @droazen @cwhelan @samuelklee @ldgauthier @vruano @yfarjoun @LeeTL1220. |
If we can agree on tool categorization sooner than later, this gives @cmnbroad time to engineer any changes that need engineering. |
Any chance we could break off legacy CNV tools into their own group? There are many more of them than there will be in the new pipelines---and many of them are experimental, deprecated, unsupported, or for validation only---that I think it makes sense to hide them and perhaps be less stringent about their documentation requirements. Anything we can do to reduce the support burden before release would be great. |
I just learned that KEBAB case is different from SNAKE case @cmnbroad. Sorry if KEBAB is offensive @cmnbroad but it is meant to clarify syntax (e.g. https://lodash.com/docs#kebabCase). To be clear, Geraldine wants KEBAB case that uses hyphens, and not SNAKE case, which uses underscores.
@vruano will describe how he uses constants to manage parameters. |
Since we are going to change many of those argument names (camel-back to kebab-case) I think we should take this opportunity to use constants to specify argument names in the code and use them in our test code so further changes in argument names don't break tests. Take as an example CombineReadCounts. Extract enclosed below. It might be also beneficial to add public constant for the default values. public final class CombineReadCounts extends CommandLineProgram {
public static final String READ_COUNT_FILES_SHORT_NAME = StandardArgumentDefinitions.INPUT_SHORT_NAME;
public static final String READ_COUNT_FILES_FULL_NAME = StandardArgumentDefinitions.INPUT_LONG_NAME;
public static final String READ_COUNT_FILE_LIST_SHORT_NAME = "inputList";
public static final String READ_COUNT_FILE_LIST_FULL_NAME = READ_COUNT_FILE_LIST_SHORT_NAME;
public static final String MAX_GROUP_SIZE_SHORT_NAME = "MOF";
public static final String MAX_GROUP_SIZE_FULL_NAME = "maxOpenFiles";
public static final int DEFAULT_MAX_GROUP_SIZE = 100;
@Argument(
doc = "Coverage files to combine, they must contain all the targets in the input file (" +
TargetArgumentCollection.TARGET_FILE_LONG_NAME + ") and in the same order",
shortName = READ_COUNT_FILE_LIST_SHORT_NAME,
fullName = READ_COUNT_FILE_LIST_FULL_NAME,
optional = true
)
protected File coverageFileList;
@Argument(
doc = READ_COUNT_FILES_DOCUMENTATION,
shortName = READ_COUNT_FILES_SHORT_NAME,
fullName = READ_COUNT_FILES_FULL_NAME,
optional = true
)
protected List<File> coverageFiles = new ArrayList<>();
@Argument(
doc = "Maximum number of files to combine simultaneously.",
shortName = MAX_GROUP_SIZE_SHORT_NAME,
fullName = MAX_GROUP_SIZE_FULL_NAME,
optional = false
)
protected int maxMergeSize = DEFAULT_MAX_GROUP_SIZE;
@ArgumentCollection
protected TargetArgumentCollection targetArguments = new TargetArgumentCollection(() ->
composeAndCheckInputReadCountFiles(coverageFiles, coverageFileList).stream().findFirst().orElseGet(null));
@Argument(
doc = "Output file",
shortName = StandardArgumentDefinitions.OUTPUT_SHORT_NAME,
fullName = StandardArgumentDefinitions.OUTPUT_LONG_NAME,
optional = false
)
protected File outputFile; |
@samuelklee Because our repo is open-source, even if we hide them from the docs, users end up asking questions on them. So no to hiding any tool that is in the repo.
Even when we deprecate a tool or feature, we give people fair warning that the tool/feature will be deprecated before literally removing it from the codebase.
Besides the
|
Fair points. I agree that legacy tools/versions that are part of a canonical or relatively widely used pipeline should have good documentation. However, there are many of the CNV tools that are basically prototypes---they have never been part of a pipeline, have no tutorial materials, and the chances that any external users have actually used them are probably extremely low. The sooner they are deprecated, the less the overall burden on both comms and methods---I don't think comms should need to feel protective of code or tools that developers are willing to scrap wholesale! I'd like to cordon off or hide such tools so the program group doesn't get too cluttered---if we can do this in a way that doesn't require @cmnbroad to add more categories, that would be great. For example, we will have 5 tools that one might reasonably try to use for segmentation (PerformSegmentation, ModelSegments, PerformAlleleFractionSegmentation, PerformCopyRatioSegmentation, and PerformJointSegmentation). The first two are part of the legacy and new pipelines, respectively, but the last 3 were experimental prototypes. I think it's definitely confusing to have these 3 presented in the program group, and treating them the same as the other tools in terms of documentation is just extra work for everyone. In any case, I definitely think an additional program group to separate the legacy and new tools is warranted, since many of the updated tools in the new pipeline have very similar names to the legacy tools. If this is OK with everyone, I'll just add a "LegacyCopyNumber" program group, which I don't think should require extra work on anyone else's part. |
Hiding / deprecating tools and their docs@samuelklee To add to @sooheelee's answer, if there are any tools that you definitely want gone and already have a replacement for, I would encourage you to kill them off (ie delete from the code) before the 4.0 launch. While we're still in beta we can remove anything at the drop of a hat. Once 4.0 is out, we'll have a deprecation policy (exact details TBD) that will allow us to prune unwanted tools over time, but it will be less trivial. And as Soo Hee said, everything that's in the current code release MUST be documented. We used to hide tools/docs in the past and it caused us more headaches than not. That being said, as part of that TBD deprecation policy it will probably make sense to make a "Deprecated" program group where tools go to die. If there are tools you plan to kill but don't want to do it before 4.0 is released for whatever reason, you could put them there. Documentation standards can be less stringent for tools in that bucket. To be clear I think the deprecation group name should be generic, ie not named to match any particular use case or functionality. That will help us avoid seeing deprecation buckets proliferate for each variant class/ use case. Does that sound like a reasonable compromise? |
Guidelines for converting arguments to kebab caseWe're not following an external spec doc, so here some guidelines to follow instead. Keep in mind that the main thing we're going for here is readability and consistency across tools, not absolute purity, so feel free to raise discussion on any cases where you feel the guidelines should be relaxed. Some things are more negotiable than others.
|
Using constants for argument namesSounds like a fantastic idea -- I encourage everyone to follow @vruano's lead on this one. |
OK, great---I'll issue some PRs to delete some of the prototype tools soon and update the spreadsheet accordingly. A non-CNV-specific "Deprecated" program group seems reasonable to me if there is enough demand. If this is the only way to delineate the legacy CNV + ACNV pipeline from the new pipeline, I'm OK with it---but we should probably make the situation clear at any workshops, presentations, etc. between now and release that might focus on the legacy pipeline. On a different note, are there any conventions for short names that we should follow? |
I propose to still hide from the command line and docs the example walkers. They are meant only for developers, to show how to use some kind of walkers and have a running tool for integration tests. Having then in the command line will generate software users to run them instead of use them for developmental purposes... In addition, I think that this is a good moment to also generate a sub-module structure (as I suggested in #3838) to separate artifact for different pipelines/framework bits (e.g., engine, Spark-engine, experimental, example-code, CNV pipeline, general-tools, etc.). For the aim of this issue, this will be useful for setting documentation guidelines in each of the sub-modules: e.g., example-code should be documented for developers, but not for the final user; experimental module should have the |
A couple of comments:
|
To clarify the build process noted above "view local index in browser" means open the index.html file at gatk/build/docs/gatkdoc/ |
The standard arguments for each tool are listed with that tool's arguments (if you look at the doc for a particular tool, you'll see an "Optional Common Arguments" heading, with the shared, common arguments listed there). The GATK4 doc system doesn't generate a separate page for these like GATK3 did, and I think doing so would be of questionable value, since there are several classes of tools, each of which has it's own set of "common" arguments (GATK Walker tools, GATK Spark tools, Picard tools, and some GATK "cowboy" tools that do their own thing). We did discuss an alternative design a while back with @droazen and @vdauwera, but that was never implemented, and was a variant of the current design where the common args are included with each tool. |
@cmnbroad and @vdauwera Barclay doesn't pull the Doesn't seem right to duplicate the same information in a tool doc, once in the asterisked javaDoc portion and once in USAGE_DETAILs for whatever system creates this view, which I am to understand will go to the wayside someday in favor of Picard documentation being offered only through https://software.broadinstitute.org/gatk/. Seems we should use the asterisked gatkDoc portion for GATK-specific documentation we want, e.g. commands that invoke Picard tools through the gatk launch script and using GATK4 syntax, and pull the rest of the documentation from the I've prioritized Picard tools in a second tab of the shared Google spreadsheet towards Picard doc updates. Please let me know how we want to approach Picard tool doc updates @vdauwera. |
I guess we need to do some work on the doclet code, abstract out example code and use templates to transform it into the appropriate format/syntax depending what project is generating the documentation. Alternatively and only if documentation html is well formed (xhtml like) then in theory we could use XSLT transformation style sheets to convert embedded code example encoded with xml/xhtml into the concrete syntax. Most major browsers support XSLT. EDIT: The XSLT solution won't probably work since even if we try to change the output from html to xhtml, the fact that we are injecting the javadoc's html would probably break the xhtml. |
Thanks for taking this on @vruano. |
I created a separated issue about this #3932 |
If your PR is ready for review, please copy-paste url to PR like @vruano did for Picard MergeVcfs in the spreadsheet AND tag @sooheelee so I can find it easily. |
Just to be clear folks, we are using
|
Mutect2 Filters list is
|
Those should show up in the M2 tool doc -- or FilterMutectCalls, whatever tool actually takes those args |
Yes, they show up in FilterMutectCalls. Thanks for that pointer. |
Just a reminder to omit |
@vdauwera We need to also generate docs for Picard metrics. I think we can classify them separately like we do @cmnbroad Does Barclay pull these in as well? |
@sooheelee If you add |
@cmnbroad - Cannot be done annotating them as |
Since I have your attention on the matter, let me test what this looks like. |
Hidden tools are coming out of the woodworks and needing classification. |
If at least one example command for a Spark tool does not utilize Spark, then I think we need this statement in each tool doc (to which it applies):
copy-pastable format:
This tutorial isn't the best, as it only focuses on Google Dataproc, but it's what we have currently to get people started. Let's just say it's a placeholder until we get something better up. |
@yfarjoun @vdauwera I've refined the tool categorization based on feedback on the tentative categorization. Thank you @yfarjoun for the review and feedback. The refinement is reflected in the new tabbed sheet in the shared Google Spreadsheet: Here is a summary of the changes.
Let us know your thoughts. Thank you. |
Hey developers working on GATK4 doc updates (tagging tech leads here) @cwhelan @samuelklee @ldgauthier @vruano @yfarjoun @LeeTL1220. Just a reminder (from @vdauwera and @droazen) that we want to get rid of many of the short form arguments and only keep the long form for the more obscure arguments. Meaning we keep both forms for commonly used and understood arguments. |
Alright, everyone. If you can have your tooldoc updates merged by this Friday, that would be great. That gives me time to take stock of what is missing and update the documentation accordingly over the weekend. |
The Picard Program Group assignments broadinstitute/picard#1043 PR has been merged. |
Thanks everyone for your contributions to updating the tool docs for yesterday's GATK4.0 release. In my brief surveys of the status of tooldoc updates, I noticed a number of tools without example commands. I will fill in these missing ones going forward. Thanks again for your efforts. |
Thank you everyone for your contributions towards this documentation effort.
Instructions from @vdauwera
to followat this Google docFavorite tool doc examples from @vdauwera NOW in her SOP doc.
Spreadsheet from @sooheelee
to beposted hereThe text was updated successfully, but these errors were encountered: