-
Notifications
You must be signed in to change notification settings - Fork 0
/
review_recommendation_responses.tex
327 lines (234 loc) · 12.4 KB
/
review_recommendation_responses.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
\documentclass[12pt]{article}
\paperwidth=8.5in
\paperheight=11in
\textwidth=6.0in
\oddsidemargin=0.25in % use built-in offset of 1 inch for left margin
\evensidemargin=0.25in % ditto for even pages
\textheight=8.5in
\topmargin=0in % use built-in offset of 1 inch
\headheight=0in % no headers in this document
\headsep=0in % no headers in this document
\begin{document}
\begin{center}
{\bf\Large Status of Hall D \\ Responses to Recommendations \\ from the 12~GeV Software and Computing Reviews} \\
\large
\medskip
Mark Ito, David Lawrence, Curtis Meyer \\
\medskip
July 10, 2015 \\
\end{center}
\section{Director's Review of 12 GeV Software and Computing -- June~7-8,~2012}
\begin{center}\tt
https://www.jlab.org/indico/conferenceDisplay.py?confId=4
\end{center}
Committee Recommendations:
\begin{enumerate}
\item Presentations in future reviews should cover end user
utilization of and experience with the software in more
detail. Talks from end users on usage experience with the software
and analysis infrastructure would be beneficial.
{\bf Complete.} At the following review (November 2013???) we
reported on the analysis workshop held in July, 2013 and had a
presentation from Justin Stevens (then a postdoc at MIT) on the user
experience.
\item Once a modest all-way data path is established, plan a mock data
challenge with fake data, in particular with nominal data rates from
GlueX.
{\bf Complete.} We have since completed two data challenges and are
now analyzing real data on a regular basis.
\item Nightly builds are performed by some; we recommend them for all.
{\bf Complete.} At the time of the review we had already been doing
nightly builds for some years and we have continued to do so.
\item Evaluate standard code evaluation tools, such as valgrind,
clang's scan-build, cppcheck, Gooda for inclusion in the software
development cycle. We suggest looking at an Insure++ license as
well.
{\bf Open.} Work has started on a regular valgrind suite, but is not complete.
\item Run a code validation suite such as valgrind as part of the
routine software release procedure.
{\bf Open.} See response to the previous item.
\item Give full and early consideration to file management, cataloging
and data discovery by physicists doing analysis. Report on this area
in future reviews.
{\bf Open.} This effort is still in the design stage.
\item Investigate the feasibility of event-based parallelization of
C++ analysis in a multi-core batch environment.
{\bf Complete.} We have been using the JANA framework reconstruction
and data analysis which is multi-threaded by design.
\item Intensify efforts on the HRS tracking development, including
calibration and alignment procedures. Define performance milestones
which allow time to explore alternatives if problems arise.
{\bf Not applicable to Hall D.}
\item Study the SBS track reconstruction algorithm efficiency under
higher background conditions. It would be useful to know at what
level of background the existing algorithm stops functioning.
{\bf Not applicable to Hall D.}
\item Develop requirements for the SBS algorithm performance, along
with a development timeline and a responsible contact. Requirements
should include alignment and calibration.
{\bf Not applicable to Hall D.}
\item A series of scaling tests ramping up using the LQCD farm should
be planned and undertaken. Tests should begin soon; don't wait for
completion of the software 18 months before startup.
{\bf Complete.} LQCD farm nodes were used as part of the computing
resources for the data challenges.
\item Seriously consider using ROOT as the file format in order to
make use of the steady advances in its I/O capabilities.
{\bf Obsolete.} We have been using the Hall D Data Model (HDDM)
format for reconstructed results and are happy with its performance.
\item The costs and sustainability of supporting two languages,
relative to the advantages, should be regularly assessed as the
community of users grows, code development practices become clearer,
the framework matures further, etc.
{\bf Not applicable to Hall D.} Our code base is exclusively C++.
\item With the somewhat aggressive schedule leading up to December
2013, make sure to engage a reasonable number of early adopters to
stress test the new framework.
{\bf Complete.} The framework has been stressed both in the sense
that (a) a large fraction of the collaboration have been using it
for data reconstruction and analysis and (b) its large-scale
performance has been tested in data challenges.
\item Re-use existing efforts from Hall A to decode CODA-formatted
data in ROOT.
{\bf Open.}
\item If resources are limited, the Fortran-based SHMS reconstruction
should be a low priority.
{\bf Not applicable to Hall D.}
\item While we encourage the move to git as a code management system,
be sure not to underestimate the extent of the paradigm
shift. Identify a workflow model for your use of git. Communicate
clearly the new paradigm (easy branching, no central repository,
etc.). Set up (or link to) tutorials for users with a mapping of
routine CVS tasks to their git equivalents (such as cvs diff,
etc.). Document or link to documentation for standard git tasks
without obvious equivalent in CVS or SVN, such as git rebase, or
bisect.
{\bf Open.} We are in the process of converting from Subversion to
Git. Switch-over is scheduled for July 15, 2015.
\item A series of scale tests ramping up using JLab's LQCD farm should
be planned and conducted.
{\bf Repeat of a previous item.}
\item The data volume and processing scale of GlueX is substantial but
plans for data management and workload management systems supporting
the operational scale were not made clear. They should be carefully
developed.
{\bf Open.} We have a design for these areas, but tools are still
being developed and tested.
\item Consider ROOT (with it's schema evolution capabilities) as a
possible alternative for the HDDM DST format.
{\bf Complete}. We have decided to stay with HDDM.
\item To ensure a smooth transition from development and deployment to
operations, particularly for Halls B and D, an explicitly planned
program of data challenges, directed both at exercising the
performance of the full analysis chain and at exercising the scaling
behavior and effectiveness of the computing model at scales
progressively closer to operating scale, is recommended. We heard
more explicit plans from Hall D than from Hall B in this
respect. This data challenge program should be underway now, and
should not await the full completion of the offline software.
{\bf Complete.} See previous responses.
\item To ensure a smooth transition from development and deployment to
operations...
{\bf Repeat of previous item.}
\item In response to the question as to how the computing budget is
scrubbed, the answer received was that scrubbing happens through
this review. This review hasn't examined the requirements and
associated budget sufficiently for this to be considered a
scrubbing. Also it is not clear that an overall optimization of the
computing models, associated resource requirements, and required
budget levels has been done. A process should exist whereby this
optimization takes place. For example are the relative roles of disk
and tape optimal for making analysis as effective as possible,
within budgetary constraints.
{\bf Open.} We have an informal process, but need to develop a system for revising estimates as we go forward with the program.
\item The measures being planned to render LQCD resources usable by
the 12 GeV community should have high priority.
{\bf Complete.} See previous responses.
\end{enumerate}
\section{Director's Review of 12 GeV Software and Computing -- November 25-26, 2013}
\begin{center}\tt
https://www.jlab.org/indico/conferenceDisplay.py?confId=55
\end{center}
Committee Recommendations for Hall D:
\begin{enumerate}
\item Event tagging in the HLT is recommended as a mechanism for
separating calibration data samples into streams for use in a prompt
calibration loop.
{\bf Open.} The online group is planning a facility for event tagging.
\item We recommend against consideration of SRM. The LHC grid
community is moving away from it as a heavy and expendable layer.
{\bf Obsolete.} To date a suitable replacement for use on the grid
has not been deployed.
\item We recommend against consideration of LFC. It will soon have no
LHC users and will be deprecated.
{\bf Complete.} LFC is no longer being considered.
\item We recommend TagFS be examined as a possible file metadata
catalog solution.
{\bf Open.} We have not had a need for the TagFS service (event
distribution based on tags) yet and work has not started.
\end{enumerate}
\section{Director's Review of 12~GeV Software and Computing -- February~10-11,~2015}
\begin{center}\tt
https://www.jlab.org/indico/conferenceDisplay.py?confId=93
\end{center}
Committee Recommendations:
\begin{enumerate}
\item It seems that some combination of code analysis tools such as
cppcheck and valgrind are being used by all experiments. The applied
tools should be unified to some extent to capture a larger phase
space of potential programs, such as using clang's scan-build
feature. It would be beneficial if a professional code analysis tool
such as coverity would be licensed and made centrally available.
{\bf Open.} We are still exploring options.
\item Those groups that have not yet set up nightly rebuilds should do
so, and flag the checked-in code that caused the rebuild to fail.
{\bf Complete.} Hall D does regular nightly builds.
\item Clarify for the users the role of time stamps and run
numbers. Unless the condition is varying too rapidly, we recommend
using run numbers as a primary key for constants. Treat the time as
a secondary information to be stored with the collection of
constants.
{\bf Complete.} Our Run Conditions Database (RCDB) supports both the
concept of run numbers and time stamps to mark data items. Either or
both can be used.
\item Explore the use of Analysis Trains in collaboration with GlueX
so the technology is in place once the data becomes available.
{\bf Open.} We are doing regular (bi-weekly) reconstruction passes
on the commissioning data we have taken and are planning a train for
calibration in the future. We hope to leverage this experience for
regular data reconstruction in the future.
\item Establish milestones for the migration to Geant4, prioritized
appropriately considering other activities and the needs of physics
running, and identify more manpower to complete the milestones.
{\bf Open.} A team has started work and is reporting progress
regularly to the Offline Software Working Group.
\item Establish a strategy and timescale for meeting data
management/cataloging needs, exploring whether common tools can be
part of the strategy.
{\bf Open.} Planning is in early stages still.
\item Raise the priority of investigating and tracking performance
problems with profiling tools. The current choice of valgrind is
heavy. Consider using a sampling profiler, and even better, consult
with the HPC staff to both borrow a licensed commercial tool and get
help in understanding the results.
{\bf Open.} This area needs more attention at present.
\item Explore, ideally in collaboration with Hall B, the use of
Analysis Trains which have become the backbone of user data analysis
at other facilities. Even if the current data sets are small enough
to be kept disk-resident entirely, this is likely to change in the
future. Trains are ideal to make the best use of scarce resources,
such as tape bandwidth. Assign a person to be responsible for the
maintenance of train-managed data sets.
{\bf Open.} See response to a previous item.
\item As you move from the era of data challenges to that of data
taking you should transition the people you have operating the
challenges to a computing operations group that is responsible for
both the reconstruction of collected data and the creation of monte
carlo samples for analysis. If you decide that analysis trains are
useful, the computing operations group would also insure that the
coordination and services required are available.
{\bf Complete.} The transition is in progress since we do have
commissioning data. Many of the personnel devoted to data challenges
are not working repeated analysis of data in hand.
\end{enumerate}
\end{document}