review_recommendation_responses.tex

\documentclass[12pt]{article}

\paperwidth=8.5in
\paperheight=11in
\textwidth=6.0in
\oddsidemargin=0.25in % use built-in offset of 1 inch for left margin
\evensidemargin=0.25in % ditto for even pages
\textheight=8.5in
\topmargin=0in % use built-in offset of 1 inch
\headheight=0in % no headers in this document
\headsep=0in % no headers in this document

\begin{document}

\begin{center}
  {\bf\Large Status of Hall D \\ Responses to Recommendations \\ from the 12~GeV Software and Computing Reviews} \\
  \large
  \medskip
  Mark Ito, David Lawrence, Curtis Meyer \\
  \medskip
  July 10, 2015 \\
\end{center}

\section{Director's Review of 12 GeV Software and Computing -- June~7-8,~2012}

\begin{center}\tt
https://www.jlab.org/indico/conferenceDisplay.py?confId=4
\end{center}

Committee Recommendations:

\begin{enumerate}

\item Presentations in future reviews should cover end user
  utilization of and experience with the software in more
  detail. Talks from end users on usage experience with the software
  and analysis infrastructure would be beneficial.

  {\bf Complete.} At the following review (November 2013???) we
  reported on the analysis workshop held in July, 2013 and had a
  presentation from Justin Stevens (then a postdoc at MIT) on the user
  experience.

\item Once a modest all-way data path is established, plan a mock data
  challenge with fake data, in particular with nominal data rates from
  GlueX.

  {\bf Complete.} We have since completed two data challenges and are
  now analyzing real data on a regular basis.

\item Nightly builds are performed by some; we recommend them for all.

  {\bf Complete.} At the time of the review we had already been doing
  nightly builds for some years and we have continued to do so.

\item Evaluate standard code evaluation tools, such as valgrind,
  clang's scan-build, cppcheck, Gooda for inclusion in the software
  development cycle. We suggest looking at an Insure++ license as
  well.

  {\bf Open.} Work has started on a regular valgrind suite, but is not complete.

\item Run a code validation suite such as valgrind as part of the
  routine software release procedure.

  {\bf Open.} See response to the previous item.

\item Give full and early consideration to file management, cataloging
  and data discovery by physicists doing analysis. Report on this area
  in future reviews.

  {\bf Open.} This effort is still in the design stage.

\item Investigate the feasibility of event-based parallelization of
  C++ analysis in a multi-core batch environment.

  {\bf Complete.} We have been using the JANA framework reconstruction
  and data analysis which is multi-threaded by design.

\item Intensify efforts on the HRS tracking development, including
  calibration and alignment procedures. Define performance milestones
  which allow time to explore alternatives if problems arise.

  {\bf Not applicable to Hall D.}

\item Study the SBS track reconstruction algorithm efficiency under
  higher background conditions. It would be useful to know at what
  level of background the existing algorithm stops functioning.

  {\bf Not applicable to Hall D.}

\item Develop requirements for the SBS algorithm performance, along
  with a development timeline and a responsible contact. Requirements
  should include alignment and calibration.

  {\bf Not applicable to Hall D.}

\item A series of scaling tests ramping up using the LQCD farm should
  be planned and undertaken. Tests should begin soon; don't wait for
  completion of the software 18 months before startup.

  {\bf Complete.} LQCD farm nodes were used as part of the computing
  resources for the data challenges.

\item Seriously consider using ROOT as the file format in order to
  make use of the steady advances in its I/O capabilities.

  {\bf Obsolete.} We have been using the Hall D Data Model (HDDM)
  format for reconstructed results and are happy with its performance.

\item The costs and sustainability of supporting two languages,
  relative to the advantages, should be regularly assessed as the
  community of users grows, code development practices become clearer,
  the framework matures further, etc.

  {\bf Not applicable to Hall D.} Our code base is exclusively C++.

\item With the somewhat aggressive schedule leading up to December
  2013, make sure to engage a reasonable number of early adopters to
  stress test the new framework.

  {\bf Complete.} The framework has been stressed both in the sense
  that (a) a large fraction of the collaboration have been using it
  for data reconstruction and analysis and (b) its large-scale
  performance has been tested in data challenges.
  
\item Re-use existing efforts from Hall A to decode CODA-formatted
  data in ROOT.

  {\bf Open.}

\item If resources are limited, the Fortran-based SHMS reconstruction
  should be a low priority.

  {\bf Not applicable to Hall D.}

\item While we encourage the move to git as a code management system,
  be sure not to underestimate the extent of the paradigm
  shift. Identify a workflow model for your use of git. Communicate
  clearly the new paradigm (easy branching, no central repository,
  etc.). Set up (or link to) tutorials for users with a mapping of
  routine CVS tasks to their git equivalents (such as cvs diff,
  etc.). Document or link to documentation for standard git tasks
  without obvious equivalent in CVS or SVN, such as git rebase, or
  bisect.

  {\bf Open.} We are in the process of converting from Subversion to
  Git. Switch-over is scheduled for July 15, 2015.

\item A series of scale tests ramping up using JLab's LQCD farm should
  be planned and conducted.

  {\bf Repeat of a previous item.} 

\item The data volume and processing scale of GlueX is substantial but
  plans for data management and workload management systems supporting
  the operational scale were not made clear. They should be carefully
  developed.

  {\bf Open.} We have a design for these areas, but tools are still
  being developed and tested.

\item Consider ROOT (with it's schema evolution capabilities) as a
  possible alternative for the HDDM DST format.

  {\bf Complete}. We have decided to stay with HDDM.

\item To ensure a smooth transition from development and deployment to
  operations, particularly for Halls B and D, an explicitly planned
  program of data challenges, directed both at exercising the
  performance of the full analysis chain and at exercising the scaling
  behavior and effectiveness of the computing model at scales
  progressively closer to operating scale, is recommended. We heard
  more explicit plans from Hall D than from Hall B in this
  respect. This data challenge program should be underway now, and
  should not await the full completion of the offline software.

  {\bf Complete.} See previous responses.

\item To ensure a smooth transition from development and deployment to
  operations...

  {\bf Repeat of previous item.}

\item In response to the question as to how the computing budget is
  scrubbed, the answer received was that scrubbing happens through
  this review. This review hasn't examined the requirements and
  associated budget sufficiently for this to be considered a
  scrubbing. Also it is not clear that an overall optimization of the
  computing models, associated resource requirements, and required
  budget levels has been done. A process should exist whereby this
  optimization takes place. For example are the relative roles of disk
  and tape optimal for making analysis as effective as possible,
  within budgetary constraints.

  {\bf Open.} We have an informal process, but need to develop a system for revising estimates as we go forward with the program.

\item The measures being planned to render LQCD resources usable by
  the 12 GeV community should have high priority.

  {\bf Complete.} See previous responses.

\end{enumerate}

\section{Director's Review of 12 GeV Software and Computing -- November 25-26, 2013}

\begin{center}\tt
https://www.jlab.org/indico/conferenceDisplay.py?confId=55
\end{center}

Committee Recommendations for Hall D:

\begin{enumerate}

\item Event tagging in the HLT is recommended as a mechanism for
  separating calibration data samples into streams for use in a prompt
  calibration loop.

  {\bf Open.} The online group is planning a facility for event tagging.

\item We recommend against consideration of SRM. The LHC grid
  community is moving away from it as a heavy and expendable layer.

  {\bf Obsolete.} To date a suitable replacement for use on the grid
  has not been deployed.

\item We recommend against consideration of LFC. It will soon have no
  LHC users and will be deprecated.

  {\bf Complete.} LFC is no longer being considered.

\item We recommend TagFS be examined as a possible file metadata
  catalog solution.

  {\bf Open.} We have not had a need for the TagFS service (event
  distribution based on tags) yet and work has not started.

\end{enumerate}

\section{Director's Review of 12~GeV Software and Computing -- February~10-11,~2015}

\begin{center}\tt
https://www.jlab.org/indico/conferenceDisplay.py?confId=93
\end{center}

Committee Recommendations:

\begin{enumerate}

\item It seems that some combination of code analysis tools such as
  cppcheck and valgrind are being used by all experiments. The applied
  tools should be unified to some extent to capture a larger phase
  space of potential programs, such as using clang's scan-build
  feature. It would be beneficial if a professional code analysis tool
  such as coverity would be licensed and made centrally available.

  {\bf Open.} We are still exploring options.

\item Those groups that have not yet set up nightly rebuilds should do
  so, and flag the checked-in code that caused the rebuild to fail.

  {\bf Complete.} Hall D does regular nightly builds.

\item Clarify for the users the role of time stamps and run
  numbers. Unless the condition is varying too rapidly, we recommend
  using run numbers as a primary key for constants. Treat the time as
  a secondary information to be stored with the collection of
  constants.

  {\bf Complete.} Our Run Conditions Database (RCDB) supports both the
  concept of run numbers and time stamps to mark data items. Either or
  both can be used.

\item Explore the use of Analysis Trains in collaboration with GlueX
  so the technology is in place once the data becomes available.

  {\bf Open.} We are doing regular (bi-weekly) reconstruction passes
  on the commissioning data we have taken and are planning a train for
  calibration in the future. We hope to leverage this experience for
  regular data reconstruction in the future.

\item Establish milestones for the migration to Geant4, prioritized
  appropriately considering other activities and the needs of physics
  running, and identify more manpower to complete the milestones.

  {\bf Open.} A team has started work and is reporting progress
  regularly to the Offline Software Working Group.

\item Establish a strategy and timescale for meeting data
  management/cataloging needs, exploring whether common tools can be
  part of the strategy.

  {\bf Open.} Planning is in early stages still.

\item Raise the priority of investigating and tracking performance
  problems with profiling tools. The current choice of valgrind is
  heavy. Consider using a sampling profiler, and even better, consult
  with the HPC staff to both borrow a licensed commercial tool and get
  help in understanding the results.

  {\bf Open.} This area needs more attention at present.

\item Explore, ideally in collaboration with Hall B, the use of
  Analysis Trains which have become the backbone of user data analysis
  at other facilities. Even if the current data sets are small enough
  to be kept disk-resident entirely, this is likely to change in the
  future. Trains are ideal to make the best use of scarce resources,
  such as tape bandwidth. Assign a person to be responsible for the
  maintenance of train-managed data sets.

  {\bf Open.} See response to a previous item.

\item As you move from the era of data challenges to that of data
  taking you should transition the people you have operating the
  challenges to a computing operations group that is responsible for
  both the reconstruction of collected data and the creation of monte
  carlo samples for analysis. If you decide that analysis trains are
  useful, the computing operations group would also insure that the
  coordination and services required are available.

  {\bf Complete.} The transition is in progress since we do have
  commissioning data. Many of the personnel devoted to data challenges
  are not working repeated analysis of data in hand.

\end{enumerate}

\end{document}