Name		Name	Last commit message	Last commit date
parent directory ..
Docker_TRITONServer		Docker_TRITONServer
dev		dev
hf_models		hf_models
onnx_models		onnx_models
prometheus		prometheus
templates		templates
tritonmodelrepo/nemo-tokenizer		tritonmodelrepo/nemo-tokenizer
.gitignore		.gitignore
README.md		README.md
basic_inference_triton.py		basic_inference_triton.py
basic_triton_client.py		basic_triton_client.py
convert_hf_to_trt.sh		convert_hf_to_trt.sh
deploy_model.py		deploy_model.py
docker-compose.yml		docker-compose.yml
helpers.py		helpers.py
queries.txt		queries.txt
serialize_txt.py		serialize_txt.py
start_prometheus_service.sh		start_prometheus_service.sh
start_triton_server.sh		start_triton_server.sh

README.md

NYUtriton

author: Eric K. Oermann, Anas Abidin

NYUtriton (pronounced “nutrition”) is for the Triton deployment of NYUtron. This is an accompanying code repository for paper "Health system scale language models are general purpose clinical prediction engines"

This repository includes code for standing up NYUtron with built in pre-processing for NYUtron as well as interfaces for FHIR and EPIC Nebula.

This ReadMe is focused on running the system within NYU.

We're currently using OLAB-1 as our production system pending more hardware

Requirements

nvidia-docker

Steps to start system and deploy a model on OLAB-1.

Copy over codebase and models

Login to OLAB-1 and forward the following ports for the monitoring service:

ssh USERNAME@IP -L 9090:IP:9090 -L 3000:IP:3000

As root user clone the NYUTriton repo and copy over the models you want to deploy into ./hf_models. If the models are from HuggingFace, you'll need git-lfs.

PAT=<YOUR ACCESS TOKEN>
git clone https://${PAT}@github.com/nyuolab/NYUtriton.git

Convert to accelerators and stage

Convert to accelerators using the conversion script and supplying the model name and the model task (per the HuggingFace model tasks). For example...

MODEL_NAME=nyutron_readmission
MODEL_TASK=sequence-classification
bash convert_hf_to_trt.sh ${MODEL_NAME} ${MODEL_TASK}

Now run the build script to stage the models you want to deploy for deployment

MODEL_NAME=nyutron_readmission
MODEL_TYPE=onnx
python deploy_model.py --model_name ${MODEL_NAME} --model_type ${MODEL_TYPE}

Deploy and test

Now launch Triton in prod mode with device 1 as the assigned GPU

./start_triton_server.sh triton2202_nemo:latest "device=1" tritonmodelrepo --p #prod

You can build test queries using the serialize_txt.py script and specifying the port, string payload, model name, and model type like below:

python serialize_txt.py -p 8000 -l "test payload is test payload" -m nyutron_readmission -t onnx

or inspect model health and parameters usingthe REST API like so:

curl -v IP:8000/v2/models/nyutron_readmission_onnx/config

Start monitoring services:

Lastly we can initialize the monitoring services:

./start_promethus_service.sh --prod

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NYUTriton

NYUTriton

README.md

NYUtriton

Requirements

Steps to start system and deploy a model on OLAB-1.

Copy over codebase and models

Convert to accelerators and stage

Deploy and test

Start monitoring services:

Files

NYUTriton

Directory actions

More options

Directory actions

More options

Latest commit

History

NYUTriton

Folders and files

parent directory

README.md

NYUtriton

Requirements

Steps to start system and deploy a model on OLAB-1.

Copy over codebase and models

Convert to accelerators and stage

Deploy and test

Start monitoring services: