Skip to content

Commit

Permalink
databricks: Add partner package directory and ChatDatabricks implemen…
Browse files Browse the repository at this point in the history
…tation (langchain-ai#25430)

### Summary

Create `langchain-databricks` as a new partner packages. This PR does
not migrate all existing Databricks integration, but the package will
eventually contain:

* `ChatDatabricks` (implemented in this PR)
* `DatabricksVectorSearch`
* `DatabricksEmbeddings`
* ~`UCFunctionToolkit`~ (will be done after UC SDK work which
drastically simplify implementation)

Also, this PR does not add integration tests yet. This will be added
once the Databricks test workspace is ready.

Tagging @efriis as POC


### Tracker
[✍️] Create a package and imgrate ChatDatabricks
[ ] Migrate DatabricksVectorSearch, DatabricksEmbeddings, and their docs
~[ ] Migrate UCFunctionToolkit and its doc~
[ ] Add provider document and update README.md
[ ] Add integration tests and set up secrets (after moved to an external
package)
[ ] Add deprecation note to the community implementations.

---------

Signed-off-by: B-Step62 <[email protected]>
Co-authored-by: Erick Friis <[email protected]>
  • Loading branch information
B-Step62 and efriis authored Aug 22, 2024
1 parent fb1d67e commit 3981d73
Show file tree
Hide file tree
Showing 20 changed files with 3,699 additions and 13 deletions.
2 changes: 1 addition & 1 deletion docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ for dir; do \
if find "$$dir" -maxdepth 1 -type f \( -name "pyproject.toml" -o -name "setup.py" \) | grep -q .; then \
echo "$$dir"; \
fi \
done' sh {} + | grep -vE "airbyte|ibm|couchbase" | tr '\n' ' ')
done' sh {} + | grep -vE "airbyte|ibm|couchbase|databricks" | tr '\n' ' ')

PORT ?= 3001

Expand Down
20 changes: 8 additions & 12 deletions docs/docs/integrations/chat/databricks.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
"\n",
"| Class | Package | Local | Serializable | Package downloads | Package latest |\n",
"| :--- | :--- | :---: | :---: | :---: | :---: |\n",
"| [ChatDatabricks](https://api.python.langchain.com/en/latest/chat_models/langchain_community.chat_models.databricks.ChatDatabricks.html) | [langchain-community](https://api.python.langchain.com/en/latest/community_api_reference.html) | ❌ | beta | ![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain-community?style=flat-square&label=%20) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-community?style=flat-square&label=%20) |\n",
"| [ChatDatabricks](https://api.python.langchain.com/en/latest/chat_models/langchain_community.chat_models.databricks.ChatDatabricks.html) | [langchain-databricks](https://api.python.langchain.com/en/latest/databricks_api_reference.html) | ❌ | beta | ![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain-databricks?style=flat-square&label=%20) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-databricks?style=flat-square&label=%20) |\n",
"\n",
"### Model features\n",
"| [Tool calling](/docs/how_to/tool_calling/) | [Structured output](/docs/how_to/structured_output/) | JSON mode | [Image input](/docs/how_to/multimodal_inputs/) | Audio input | Video input | [Token-level streaming](/docs/how_to/chat_streaming/) | Native async | [Token usage](/docs/how_to/chat_token_usage_tracking/) | [Logprobs](/docs/how_to/logprobs/) |\n",
Expand Down Expand Up @@ -99,7 +99,7 @@
"source": [
"### Installation\n",
"\n",
"The LangChain Databricks integration lives in the `langchain-community` package. Also, `mlflow >= 2.9 ` is required to run the code in this notebook."
"The LangChain Databricks integration lives in the `langchain-databricks` package."
]
},
{
Expand All @@ -108,7 +108,7 @@
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain-community mlflow>=2.9.0"
"%pip install -qU langchain-databricks"
]
},
{
Expand All @@ -133,7 +133,7 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.chat_models import ChatDatabricks\n",
"from langchain_databricks import ChatDatabricks\n",
"\n",
"chat_model = ChatDatabricks(\n",
" endpoint=\"databricks-dbrx-instruct\",\n",
Expand Down Expand Up @@ -245,9 +245,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Invocation (streaming)\n",
"\n",
"`ChatDatabricks` supports streaming response by `stream` method since `langchain-community>=0.2.1`."
"## Invocation (streaming)"
]
},
{
Expand Down Expand Up @@ -299,7 +297,7 @@
"* An LLM was registered and deployed to [a Databricks serving endpoint](https://docs.databricks.com/machine-learning/model-serving/index.html) via MLflow. The endpoint must have OpenAI-compatible chat input/output format ([reference](https://mlflow.org/docs/latest/llms/deployments/index.html#chat))\n",
"* You have [\"Can Query\" permission](https://docs.databricks.com/security/auth-authz/access-control/serving-endpoint-acl.html) to the endpoint.\n",
"\n",
"Once the endpoint is ready, the usage pattern is completely same as Foundation Models."
"Once the endpoint is ready, the usage pattern is identical to that of Foundation Models."
]
},
{
Expand Down Expand Up @@ -332,7 +330,7 @@
"\n",
"First, create a new Databricks serving endpoint that proxies requests to the target external model. The endpoint creation should be fairy quick for proxying external models.\n",
"\n",
"This requires registering OpenAI API Key in Databricks secret manager with the following comment:\n",
"This requires registering your OpenAI API Key within the Databricks secret manager as follows:\n",
"```sh\n",
"# Replace `<scope>` with your scope\n",
"databricks secrets create-scope <scope>\n",
Expand Down Expand Up @@ -417,8 +415,6 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.chat_models.databricks import ChatDatabricks\n",
"\n",
"llm = ChatDatabricks(endpoint=\"databricks-meta-llama-3-70b-instruct\")\n",
"tools = [\n",
" {\n",
Expand Down Expand Up @@ -461,7 +457,7 @@
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all ChatDatabricks features and configurations head to the API reference: https://api.python.langchain.com/en/latest/chat_models/langchain_community.chat_models.ChatDatabricks.html"
"For detailed documentation of all ChatDatabricks features and configurations head to the API reference: https://api.python.langchain.com/en/latest/chat_models/langchain_databricks.chat_models.ChatDatabricks.html"
]
}
],
Expand Down
1 change: 1 addition & 0 deletions libs/partners/databricks/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
__pycache__
21 changes: 21 additions & 0 deletions libs/partners/databricks/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2024 LangChain, Inc.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
62 changes: 62 additions & 0 deletions libs/partners/databricks/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
.PHONY: all format lint test tests integration_tests docker_tests help extended_tests

# Default target executed when no arguments are given to make.
all: help

# Define a variable for the test file path.
TEST_FILE ?= tests/unit_tests/
integration_test integration_tests: TEST_FILE = tests/integration_tests/


# unit tests are run with the --disable-socket flag to prevent network calls
test tests:
poetry run pytest --disable-socket --allow-unix-socket $(TEST_FILE)

# integration tests are run without the --disable-socket flag to allow network calls
integration_test integration_tests:
poetry run pytest $(TEST_FILE)

######################
# LINTING AND FORMATTING
######################

# Define a variable for Python and notebook files.
PYTHON_FILES=.
MYPY_CACHE=.mypy_cache
lint format: PYTHON_FILES=.
lint_diff format_diff: PYTHON_FILES=$(shell git diff --relative=libs/partners/databricks --name-only --diff-filter=d master | grep -E '\.py$$|\.ipynb$$')
lint_package: PYTHON_FILES=langchain_databricks
lint_tests: PYTHON_FILES=tests
lint_tests: MYPY_CACHE=.mypy_cache_test

lint lint_diff lint_package lint_tests:
poetry run ruff check .
poetry run ruff format $(PYTHON_FILES) --diff
poetry run ruff check --select I $(PYTHON_FILES)
mkdir -p $(MYPY_CACHE); poetry run mypy $(PYTHON_FILES) --cache-dir $(MYPY_CACHE)

format format_diff:
poetry run ruff format $(PYTHON_FILES)
poetry run ruff check --select I --fix $(PYTHON_FILES)

spell_check:
poetry run codespell --toml pyproject.toml

spell_fix:
poetry run codespell --toml pyproject.toml -w

check_imports: $(shell find langchain_databricks -name '*.py')
poetry run python ./scripts/check_imports.py $^

######################
# HELP
######################

help:
@echo '----'
@echo 'check_imports - check imports'
@echo 'format - run code formatters'
@echo 'lint - run linters'
@echo 'test - run unit tests'
@echo 'tests - run unit tests'
@echo 'test TEST_FILE=<test_file> - run all tests in file'
24 changes: 24 additions & 0 deletions libs/partners/databricks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# langchain-databricks

This package contains the LangChain integration with Databricks

## Installation

```bash
pip install -U langchain-databricks
```

And you should configure credentials by setting the following environment variables:

* TODO: fill this out

## Chat Models

`ChatDatabricks` class exposes chat models from Databricks.

```python
from langchain_databricks import ChatDatabricks

llm = ChatDatabricks()
llm.invoke("Sing a ballad of LangChain.")
```
15 changes: 15 additions & 0 deletions libs/partners/databricks/langchain_databricks/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
from importlib import metadata

from langchain_databricks.chat_models import ChatDatabricks

try:
__version__ = metadata.version(__package__)
except metadata.PackageNotFoundError:
# Case where package metadata is not available.
__version__ = ""
del metadata # optional, avoids polluting the results of dir(__package__)

__all__ = [
"ChatDatabricks",
"__version__",
]
Loading

0 comments on commit 3981d73

Please sign in to comment.