v0.9.9 - Updated docs with vLLM server integration

noamgat · Apr 24, 2024 · ec208a7 · ec208a7
1 parent 40b62f3
commit ec208a7
Show file tree

Hide file tree

Showing 3 changed files with 28 additions and 1 deletion.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,8 @@
 # LM Format Enforcer Changelog
 
+## v0.9.9
+- Updated README with vLLM OpenAI Server Inference integration
+
 ## v0.9.8
 - [#80] JSONSchemaParser List would allow opening comma before first element if there was a whitespace before it
 

diff --git a/README.md b/README.md
@@ -94,6 +94,30 @@ You can also [view the notebook in GitHub](https://github.com/noamgat/lm-format-
 
 For the different ways to integrate with huggingface transformers, see the [unit tests](https://github.com/noamgat/lm-format-enforcer/blob/main/tests/test_transformerenforcer.py).
 
+## vLLM Server Integration
+
+LM Format Enforcer is integrated into the [vLLM](https://github.com/vllm-project/vllm) inference server. vLLM includes an [OpenAI compatible server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html) with added capabilities that allow using LM Format Enforcer without writing custom inference code.
+
+Use LM Format Enforcer with the vLLM OpenAI Server either by adding this [vLLM command line parameter](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#command-line-arguments-for-the-server):
+
+```--guided-decoding-backend lm-format-enforcer```
+
+Or on a per-request basis, by adding the `guided_decoding_backend` [extra parameter](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#extra-parameters-for-chat-api) to the request:
+
+```
+completion = client.chat.completions.create(
+  model="mistralai/Mistral-7B-Instruct-v0.2",
+  messages=[
+    {"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
+  ],
+  extra_body={
+    "guided_choice": ["positive", "negative"],
+    "guided_decoding_backend": "lm-format-enforcer"
+  }
+)
+```
+Json schema and regex decoding also supported via `guided_json` and `guided_regex` parameters.
+
 ## How does it work?
 
 The library works by combining a character level parser and a tokenizer prefix tree into a smart token filtering mechanism.

diff --git a/pyproject.toml b/pyproject.toml
@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "lm-format-enforcer"
-version = "0.9.8"
+version = "0.9.9"
 description = "Enforce the output format (JSON Schema, Regex etc) of a language model"
 authors = ["Noam Gat <[email protected]>"]
 license = "MIT"