This repository contains a sample of the graphs I've produced throughout my career.
This figure plots hazard ratios for an analysis looking at the effect of age at spay/neuter on risk of subsequent weight gain in dogs. This work constitutes a piece of the data analysis consulting work I've done for the Alliance for Contraception in Cats and Dogs. (Code for this project can be found at https://github.com/jsahrma/accd-oo.) The model upon which the analyses are based contains several interactions terms, which make straightforward interpretation of coefficients impossible. Rather, effects are contingent on a combination of age, sex, and breed size. To ease interpretation, I created effect plots that display hazard ratios for a wide array of characteristics.
The graph below was constructed so that within sex and breed size class, a dog at one year of age is used as the reference. Points above 1.0 on the
The decision whether and when to spay/neuter a dog requires balancing many different factors; these results provide additional information on just one aspect of that decision.
This plot was made as part of a (currently unpublished) project investigating patterns of the timing of childhood vaccinations and in particular patterns of delay (or outright skipping) of different vaccines in a sample of privately insured children. The data were deidentified and a small amount of noise added to the vaccination dates prior to plotting.
The motivation for this graph was to explore how vaccination histories could be used to cluster patients into groups. It uses daily indicator variables for presence/absence of each vaccine in the first year of life. If patients likely to delay or skip vaccines could be identified using early vaccine histories, timely reminders could be provided or outreach efforts used to address concerns of parents.
Patient histories are shown on the horizontal axis; presence/absence of each vaccine from 30 days to one year of life are shown on the vertical axis. Blue regions represent absence of the vaccine, orange regions represent the presence of vaccine, and white regions represent missing data (i.e., insurance disenrollment). The size of the data set makes a direct visual inspection of the clusters difficult. This plot contains a random sample of 1,000 patient histories (with the data set in /ecv/data/ itself being a 1% sample of the complete analytic file). Nevertheless, we can see a cluster of patients with a significant proportion of their histories colored blue, indicating delay or skipping of recommended vaccinations.
This map was prepared as an example for the course "Using Administrative Data for Health Services Research" at the Washington University in St. Louis School of Medicine. The goal was to demonstrate data set merging and basic mapping using county-level opioid-related admissions for the state of Florida in 2019.
The course teaches students at the School of Medicine the basics of working with administrative healthcare data. A major part of that is learning the programming skills needed to open, manipulate, and analyze data sources. The Spring 2022 semester saw the transition from SAS to R programming, where this example was designed for a module on special topics. Updated course materials are currently being migrated to this repository.
Note: This is not meant as a rigorous or authoritative analysis of this issue.
The forest plot below summarizes the results of an extensive set of analyses comparing safety outcomes in a sample of privately insured children across the U.S. diagnosed with common bacterial and viral infections. (Results of the full study were published in the journal JAMA Network Open in May 2022.) Patients were grouped into cohorts based on the type of infection diagnosed, and treatment was defined as appropriate (guideline-recommended therapy for bacterial infection cohorts and no antibiotic for viral infection cohorts) or inappropriate (non-guideline-recommended therapy for bacterial infection cohorts and any antibiotic for viral infection cohorts). Weights derived from propensity scores were used to achieve exchangeability between treatment groups within each cohort. Cox proportional hazards models were then used to compare risk of each safety outcome following initial treatment.
Much of the work in creating the plot is encapsulated by the forestplot
function defined in a package of the same name, although there's a significant amount of work required to prepare the text and estimates for plotting. Other adornments were added by leveraging the lower-level functions provided by the grid package. Text annotations were added to the top of the plot to provide context to the event rate calculations shown in the middle two columns and to indicate what regions of the forest plot represent a protective effect of appropriate or inappropriate treatment. Gray backgrounds were added to help visually distinguish between the bacterial infection and viral infection cohorts.