Skip to content

Developer Overview

Brian "Moses" Hall edited this page Apr 13, 2020 · 1 revision

Locations

Development is hosted on punch.umdl.umich.edu alongside other HathiTrust software. Production and Training are hosted at babel.hathitrust.org/crms.

Locations of interest:

bin/ - cron scripts and configuration files
cgi/ - CRMS-specific Perl modules and page templates
prep/ - log files and `.rights` export files in development, no longer much used
web/ - images, JavaScripts, and CSS assets

CRMS.pm

Most core functionality is in the Perl module cgi/CRMS.pm which implements a CRMS Perl object. This object is global to all of the CGI and other scripts that make up the system.

Most other modules in cgi/ (such as cgi/Metadata.pm, and the project modules in cgi/Project/) typically, to make it possible for them to invoke core functionality in CRMS.pm, are passed a reference to the CRMS object as $self, so that the module may simply call $self->SomeMethod() as if the module’s subroutines were simply additional methods of the CRMS object. Almost all of these other modules are simply collections of procedural routines with no class affiliation. Web Interface

CRMS uses Template Toolkit as a vehicle for embedding suitably marked-up Perl inside HTML templates. The base URL is the file crms/cgi/crms that in most cases loads the template file (with suffix .tt) pointed at by the URL page parameter p=. Thus, the URL for the Review Page is https://babel.hathitrust.org/crms/cgi/crms?p=review, which executes the crms/cgi/crms CGI script and instructs it to load the template file crms/cgi/review.tt. If there is no page parameter, CRMS loads the landing page home.tt.

Some page parameters do not refer to actual .tt files, but instead are processed inside crms/cgi/crms. An example of this is p=confirmReview which signals crms/cgi/crms to store a user’s review and load the Review page with a new volume from the queue. There is no confirmReview.tt file. p=confirmReview is handled as a special case because choice of destination page depends on the form parameters (Submit to keep reviewing vs Cancel to go to home page).

Background Processing

Overnight processing is handled by the cron job bin/overnight.pl. The script is run twice each night: the first run does everything except load candidates (because the HathiTrust Bib API is not populated until after the first run completes).

This and other command-line scripts are documented separately.

CRMS-US and CRMS-World via SYS

The notions of CRMS-US vs CRMS-World, SYS, the -x SYS command-line parameter, are all obsolete. There is now a single system, but remnants of World-specific behavior may be lurking in the codebase and databases.

Training Area

The CRMS training area is a production environment with a separate database used to train reviewers. It is also used for demonstration when it is desirable to be able to submit a review without "polluting" the production system with bogus information. The site is at https://crms-training.babel.hathitrust.org/crms/cgi/crms. Its cron job for nightly processing is disabled unless there is an ongoing training session, and it does not have a cron job for updating candidates. The script bin/training.pl is run when necessary to populate the site with reviews from production.

Projects

Projects can inherit from cgi/Project.pm class. See the various examples in cgi/Projects/.

Projects have three broad responsibilities: evaluating candidacy (EvaluateCandidacy()), processing CGI parameters from a user's review (ValidateSubmission() and ExtractReviewData()), and displaying a review interface (ReviewPartials()).

Partials

The Review page template review.tt uses a partials system to populate its operational pane with resources and controls appropriate to the project scope. Its ReviewPartials() method is expected to return a list of templates (from cgi/partial/) to be displayed. These partials as a whole must populate the reviewer's CGI parameters so that the project subclass can extract the relevant data to store in the database, so typically one of the partials will be an HTML

. Production and Training via CRMS_INSTANCE

Production and training instances are both first-class CRMS systems running the same codebase. A&E URL rewrite rules set the environment variable CRMS_INSTANCE to either production or crms-training. CRMS.pm uses the value to determine the name of the database to connect to, and for a few other purposes. From the command line, the -p flag is a signal to run against the production database. Even cron jobs that run on production servers need this flag – we don't want the software guessing whether it is running in production or not.

Databases (DB name and server alias)

development: crms on mysql-htdev
production: crms on mysql-sdr
training: crms_training on mysql-sdr

Staging and Deploying to Production

On Punch/Grog:

/htapps/test.babel/mdp-tools/scripts/stage-app crms
/htapps/test.babel/mdp-tools/scripts/deploy-app crms

Debugging

Most core CRMS algorithms can be tested by calling them from a Perl script on the development server; some (like GetNextItemForReview()) take additional debugging parameters not intended for use by the CGI environment. Mose CRMS debugging has historically been done by "printf debugging."

For inspecting MySQL activity in the web app, the URL parameters debugSql=1 and debugVar=1 may be used to display all database transactions done on the page. Calls to [% crms.Debug() %] within the Template code will emit accumulated SQL queries and parameters, and dump variables of interest.