Skip to content

Commit

Permalink
optimize rerank with backend ref (#579)
Browse files Browse the repository at this point in the history
* add rerank with neural speed

Signed-off-by: Dong, Bo1 <[email protected]>

* add the code

Signed-off-by: Dong, Bo1 <[email protected]>

* add the code

Signed-off-by: Dong, Bo1 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Dong, Bo1 <[email protected]>

* fix mismatched response format w/wo streaming guardrails (#568)

* fix mismatched response format w/wo streaming  guardrails

* fix & debug

* fix & rm debug

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Dong, Bo1 <[email protected]>

* Fix guardrails out handle logics for space linebreak and quote (#571)

* fix mismatched response format w/wo streaming  guardrails

* fix & debug

* fix & rm debug

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* debug

* debug

* debug

* fix pre-space and linebreak

* fix pre-space and linebreak

* fix single/double quote

* fix single/double quote

* remove debug

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Dong, Bo1 <[email protected]>

* BUG FIX: LVM security fix (#572)

* add url validator

Signed-off-by: BaoHuiling <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add validation for video_url

Signed-off-by: BaoHuiling <[email protected]>

---------

Signed-off-by: BaoHuiling <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Dong, Bo1 <[email protected]>

* Modify output messages. (#569)

* Reduced output.

Signed-off-by: zepan <[email protected]>

* Output the location where the modified Dockerfile file is referenced.

Signed-off-by: zepan <[email protected]>

* for test

Signed-off-by: zepan <[email protected]>

* Restore test file.

Signed-off-by: zepan <[email protected]>

---------

Signed-off-by: zepan <[email protected]>
Signed-off-by: Dong, Bo1 <[email protected]>

* refine logging code. (#559)

* add ut and refine logging code.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update microservice port.

---------

Co-authored-by: root <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Dong, Bo1 <[email protected]>

* adding lancedb to langchain vectorstores (#291)

* adding lancedb to langchain vectorstores

Signed-off-by: sharanshirodkar7 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: sharanshirodkar7 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: lvliang-intel <[email protected]>
Signed-off-by: Dong, Bo1 <[email protected]>

* Refine Dataprep Milvus MS (#570)

Signed-off-by: letonghan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Dong, Bo1 <[email protected]>

* final version

Signed-off-by: Dong, Bo1 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Dong, Bo1 <[email protected]>

* update the readme

Signed-off-by: Dong, Bo1 <[email protected]>

* add the sign

Signed-off-by: Dong, Bo1 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Dong, Bo1 <[email protected]>

* fix error for pre ci

Signed-off-by: Dong, Bo1 <[email protected]>

* add the ut

Signed-off-by: Dong, Bo1 <[email protected]>

* update docker file

Signed-off-by: Dong, Bo1 <[email protected]>

* update CI test log achieve (#577)

Signed-off-by: chensuyue <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Dong, Bo1 <[email protected]>

* Multimodal dataprep (#575)

* multimodal embedding for MM RAG for videos

Signed-off-by: Tiep Le <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* develop data prep first commit

Signed-off-by: Tiep Le <[email protected]>

* develop dataprep microservice for multimodal data

Signed-off-by: Tiep Le <[email protected]>

* multimodal langchain for dataprep

Signed-off-by: Tiep Le <[email protected]>

* update README

Signed-off-by: Tiep Le <[email protected]>

* update README

Signed-off-by: Tiep Le <[email protected]>

* update README

Signed-off-by: Tiep Le <[email protected]>

* update README

Signed-off-by: Tiep Le <[email protected]>

* cosmetic

Signed-off-by: Tiep Le <[email protected]>

* test for multimodal dataprep

Signed-off-by: Tiep Le <[email protected]>

* update test

Signed-off-by: Tiep Le <[email protected]>

* update test

Signed-off-by: Tiep Le <[email protected]>

* update test

Signed-off-by: Tiep Le <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cosmetic update

Signed-off-by: Tiep Le <[email protected]>

* remove langsmith

Signed-off-by: Tiep Le <[email protected]>

* update API to remove /dataprep from API names and remove langsmith

Signed-off-by: Tiep Le <[email protected]>

* update test

Signed-off-by: Tiep Le <[email protected]>

* update the error message per PR reviewer

Signed-off-by: Tiep Le <[email protected]>

---------

Signed-off-by: Tiep Le <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Dong, Bo1 <[email protected]>

* add: Pathway vector store and retriever as LangChain component (#342)

* nb

Signed-off-by: Berke <[email protected]>

* init changes

Signed-off-by: Berke <[email protected]>

* docker

Signed-off-by: Berke <[email protected]>

* example data

Signed-off-by: Berke <[email protected]>

* docs(readme): update, add commands

Signed-off-by: Berke <[email protected]>

* fix: formatting, data sources

Signed-off-by: Berke <[email protected]>

* docs(readme): update instructions, add comments

Signed-off-by: Berke <[email protected]>

* fix: rm unused parts

Signed-off-by: Berke <[email protected]>

* fix: image name, compose env vars

Signed-off-by: Berke <[email protected]>

* fix: rm unused part

Signed-off-by: Berke <[email protected]>

* fix: logging name

Signed-off-by: Berke <[email protected]>

* fix: env var

Signed-off-by: Berke <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Berke <[email protected]>

* fix: rename pw docker

Signed-off-by: Berke <[email protected]>

* docs(readme): update input sources

Signed-off-by: Berke <[email protected]>

* nb

Signed-off-by: Berke <[email protected]>

* init changes

Signed-off-by: Berke <[email protected]>

* fix: formatting, data sources

Signed-off-by: Berke <[email protected]>

* docs(readme): update instructions, add comments

Signed-off-by: Berke <[email protected]>

* fix: rm unused part

Signed-off-by: Berke <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Berke <[email protected]>

* fix: rename pw docker

Signed-off-by: Berke <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Berke <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* feat: mv vector store, naming, clarify instructions, improve ingestion components

Signed-off-by: Berke <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* tests: add pw retriever test
fix: update docker to include libmagic

Signed-off-by: Berke <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* implement suggestions from review, entrypoint, reqs, comments, https_proxy.

Signed-off-by: Berke <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: update docker tags in test and readme

Signed-off-by: Berke <[email protected]>

* tests: add separate pathway vectorstore test

Signed-off-by: Berke <[email protected]>

---------

Signed-off-by: Berke <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sihan Chen <[email protected]>
Signed-off-by: Dong, Bo1 <[email protected]>

* Add local Rerank microservice for VideoRAGQnA (#496)

* initial commit

Signed-off-by: BaoHuiling <[email protected]>

* save

Signed-off-by: BaoHuiling <[email protected]>

* add readme, test script, fix bug

Signed-off-by: BaoHuiling <[email protected]>

* update video URL

Signed-off-by: BaoHuiling <[email protected]>

* use default

Signed-off-by: BaoHuiling <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update core dependency

Signed-off-by: BaoHuiling <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use p 5000

Signed-off-by: BaoHuiling <[email protected]>

* use 5037

Signed-off-by: BaoHuiling <[email protected]>

* update ctnr name

Signed-off-by: BaoHuiling <[email protected]>

* remove langsmith

Signed-off-by: BaoHuiling <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add rerank algo desc in readme

Signed-off-by: BaoHuiling <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: BaoHuiling <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: chen, suyue <[email protected]>
Signed-off-by: Dong, Bo1 <[email protected]>

* Add Scan Container. (#560)

Signed-off-by: zepan <[email protected]>
Signed-off-by: Dong, Bo1 <[email protected]>

* fix SearchedMultimodalDoc in docarray (#583)

Signed-off-by: BaoHuiling <[email protected]>
Signed-off-by: Dong, Bo1 <[email protected]>

* update image build yaml (#529)

Signed-off-by: chensuyue <[email protected]>
Signed-off-by: zepan <[email protected]>
Signed-off-by: Dong, Bo1 <[email protected]>

* add microservice for intent detection (#131)

* add microservice for intent detection

Signed-off-by: Liangyx2 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update license copyright

Signed-off-by: Liangyx2 <[email protected]>

* add ut

Signed-off-by: Liangyx2 <[email protected]>

* refine

Signed-off-by: Liangyx2 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update folder

Signed-off-by: Liangyx2 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix test

Signed-off-by: Liangyx2 <[email protected]>

---------

Signed-off-by: Liangyx2 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Dong, Bo1 <[email protected]>

* Make the scanning method optional. (#580)

Signed-off-by: zepan <[email protected]>
Signed-off-by: Dong, Bo1 <[email protected]>

* add code owners (#586)

Signed-off-by: Dong, Bo1 <[email protected]>

* remove revision for tei (#584)

Signed-off-by: letonghan <[email protected]>
Signed-off-by: Dong, Bo1 <[email protected]>

* Bug fix (#591)

* Check if the document exists.

Signed-off-by: zepan <[email protected]>

* Add flag output.

Signed-off-by: zepan <[email protected]>

* Modify nginx readme.

Signed-off-by: zepan <[email protected]>

* Modify document detection logic

Signed-off-by: zepan <[email protected]>

---------

Signed-off-by: zepan <[email protected]>
Signed-off-by: Dong, Bo1 <[email protected]>

* fix ut issue

Signed-off-by: Dong, Bo1 <[email protected]>

* merge the main

Signed-off-by: Dong, Bo1 <[email protected]>

* align with new pipeline

Signed-off-by: Dong, Bo1 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* align with newest pipeline

Signed-off-by: Dong, Bo1 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* upload code

Signed-off-by: Dong, Bo1 <[email protected]>

* update the ut

Signed-off-by: Dong, Bo1 <[email protected]>

* add docker path

Signed-off-by: Dong, Bo1 <[email protected]>

* add the docker path

Signed-off-by: Dong, Bo1 <[email protected]>

---------

Signed-off-by: Dong, Bo1 <[email protected]>
Signed-off-by: BaoHuiling <[email protected]>
Signed-off-by: zepan <[email protected]>
Signed-off-by: sharanshirodkar7 <[email protected]>
Signed-off-by: letonghan <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Signed-off-by: Tiep Le <[email protected]>
Signed-off-by: Berke <[email protected]>
Signed-off-by: Liangyx2 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sihan Chen <[email protected]>
Co-authored-by: Huiling Bao <[email protected]>
Co-authored-by: ZePan110 <[email protected]>
Co-authored-by: lkk <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: Sharan Shirodkar <[email protected]>
Co-authored-by: lvliang-intel <[email protected]>
Co-authored-by: Letong Han <[email protected]>
Co-authored-by: chen, suyue <[email protected]>
Co-authored-by: Tiep Le <[email protected]>
Co-authored-by: berkecanrizai <[email protected]>
Co-authored-by: Liangyx2 <[email protected]>
Co-authored-by: kevinintel <[email protected]>
  • Loading branch information
15 people authored Sep 10, 2024
1 parent 2c48bc8 commit d76751a
Show file tree
Hide file tree
Showing 13 changed files with 497 additions and 0 deletions.
8 changes: 8 additions & 0 deletions .github/workflows/docker/compose/reranks-compose-cd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,11 @@ services:
build:
dockerfile: comps/reranks/langchain-mosec/docker/Dockerfile
image: ${REGISTRY:-opea}/reranking-langchain-mosec:${TAG:-latest}
reranking-mosec-neural-speed:
build:
dockerfile: comps/reranks/neural-speed/docker/Dockerfile
image: ${REGISTRY:-opea}/reranking-mosec-neural-speed:${TAG:-latest}
reranking-mosec-neural-speed-endpoint:
build:
dockerfile: comps/reranks/neural-speed/neuralspeed-docker/Dockerfile
image: ${REGISTRY:-opea}/reranking-mosec-neural-speed-endpoint:${TAG:-latest}
32 changes: 32 additions & 0 deletions comps/reranks/neural-speed/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# build Mosec endpoint docker image

```
docker build --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy -t langchain-mosec:neuralspeed-reranks -f comps/reranks/neural-speed/neuralspeed-docker/Dockerfile .
```

# build Reranking microservice docker image

```
docker build --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy -t opea/reranking-langchain-mosec:neuralspeed -f comps/reranks/neural-speed/docker/Dockerfile .
```

Note: Please contact us to request model files before building images.

# launch Mosec endpoint docker container

```
docker run -d --name="reranking-langchain-mosec-endpoint" -p 6001:8000 langchain-mosec:neuralspeed-reranks
```

# launch Reranking microservice docker container

```
export MOSEC_RERANKING_ENDPOINT=http://127.0.0.1:6001
docker run -d --name="reranking-langchain-mosec-server" -e http_proxy=$http_proxy -e https_proxy=$https_proxy -p 6000:8000 --ipc=host -e MOSEC_RERANKING_ENDPOINT=$MOSEC_RERANKING_ENDPOINT opea/reranking-langchain-mosec:neuralspeed
```

# run client test

```
curl http://localhost:6000/v1/reranking -X POST -d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' -H 'Content-Type: application/json'
```
2 changes: 2 additions & 0 deletions comps/reranks/neural-speed/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
31 changes: 31 additions & 0 deletions comps/reranks/neural-speed/docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@

# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

FROM langchain/langchain:latest

RUN apt-get update -y && apt-get install -y --no-install-recommends --fix-missing \
libgl1-mesa-glx \
libjemalloc-dev \
vim

RUN useradd -m -s /bin/bash user && \
mkdir -p /home/user && \
chown -R user /home/user/

USER user

COPY comps /home/user/comps

RUN pip install --no-cache-dir --upgrade pip && \
pip install --no-cache-dir -r /home/user/comps/reranks/neural-speed/requirements.txt

RUN pip3 install llmspec mosec msgspec httpx requests
RUN pip3 install torch==2.2.2 --trusted-host download.pytorch.org --index-url https://download.pytorch.org/whl/cpu

ENV PYTHONPATH=$PYTHONPATH:/home/user

WORKDIR /home/user/comps/reranks/neural-speed

ENTRYPOINT ["python", "reranking_neuralspeed_svc.py"]

22 changes: 22 additions & 0 deletions comps/reranks/neural-speed/docker/docker_compose_embedding.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

version: "3.8"

services:
reranking:
image: opea/reranking-langchain-mosec:neuralspeed
container_name: reranking-langchain-mosec-server
ports:
- "6000:8000"
ipc: host
environment:
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
MOSEC_RERANKING_ENDPOINT: ${MOSEC_RERANKING_ENDPOINT}
LANGCHAIN_API_KEY: ${LANGCHAIN_API_KEY}
restart: unless-stopped

networks:
default:
driver: bridge
27 changes: 27 additions & 0 deletions comps/reranks/neural-speed/neuralspeed-docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

From ubuntu:22.04
ARG DEBIAN_FRONTEND=noninteractive

ENV GLIBC_TUNABLES glibc.cpu.x86_shstk=permissive

COPY comps /root/comps
COPY neural_speed-0.1.dev45+g41ea0aa-cp310-cp310-linux_x86_64.whl /root/
COPY bge-large-r-q8.bin /root/
COPY libstdc++.so.6 /root/

RUN apt update && apt install -y python3 python3-pip
RUN pip3 install -r /root/comps/reranks/neural-speed/neuralspeed-docker/requirements.txt
RUN pip3 install llmspec mosec msgspec httpx requests
RUN pip3 install /root/neural_speed-0.1.dev45+g41ea0aa-cp310-cp310-linux_x86_64.whl

RUN cd /root/ && export HF_ENDPOINT=https://hf-mirror.com && huggingface-cli download --resume-download BAAI/bge-reranker-large --local-dir /root/bge-reranker-large


ENV LD_PRELOAD=/root/libstdc++.so.6


WORKDIR /root/comps/reranks/neural-speed/neuralspeed-docker

CMD ["python3", "server.py"]
35 changes: 35 additions & 0 deletions comps/reranks/neural-speed/neuralspeed-docker/client.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

import os
from http import HTTPStatus

import httpx
import msgspec
import requests

req = {
"query": "talk is cheap, show me the code",
"docs": [
"what a nice day",
"life is short, use python",
"early bird catches the worm",
],
}

httpx_response = httpx.post("http://127.0.0.1:8080/inference", content=msgspec.msgpack.encode(req))

requests_response = requests.post("http://127.0.0.1:8080/inference", data=msgspec.msgpack.encode(req))

MOSEC_RERANKING_ENDPOINT = os.environ.get("MOSEC_RERANKING_ENDPOINT", "http://127.0.0.1:8080")

request_url = MOSEC_RERANKING_ENDPOINT + "/inference"
print(f"request_url = {request_url}")
resp_3 = requests.post(request_url, data=msgspec.msgpack.encode(req))

if httpx_response.status_code == HTTPStatus.OK and requests_response.status_code == HTTPStatus.OK:
print(f"OK: \n {msgspec.msgpack.decode(httpx_response.content)}")
print(f"OK: \n {msgspec.msgpack.decode(requests_response.content)}")
print(f"OK: \n {msgspec.msgpack.decode(resp_3.content)}")
else:
print(f"err[{httpx_response.status_code}] {httpx_response.text}")
45 changes: 45 additions & 0 deletions comps/reranks/neural-speed/neuralspeed-docker/client_multibatch.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

from http import HTTPStatus
from threading import Thread

import httpx
import msgspec

req = {
"query": "talk is cheap, show me the code",
"docs": [
"what a nice day",
"life is short, use python",
"early bird catches the worm",
],
}
reqs = []
BATCH = 32
for i in range(BATCH):
reqs.append(msgspec.msgpack.encode(req))


def post_func(threadIdx):
resp = httpx.post("http://127.0.0.1:8080/inference", content=reqs[threadIdx])
ret = f"thread {threadIdx} \n"
if resp.status_code == HTTPStatus.OK:
ret += f"OK: {msgspec.msgpack.decode(resp.content)['scores']}"
else:
ret += f"err[{resp.status_code}] {resp.text}"
print(ret)


threads = []
for i in range(BATCH):
t = Thread(
target=post_func,
args=[
i,
],
)
threads.append(t)

for i in range(BATCH):
threads[i].start()
16 changes: 16 additions & 0 deletions comps/reranks/neural-speed/neuralspeed-docker/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
--extra-index-url https://download.pytorch.org/whl/cpu
accelerate
cmake
datasets
huggingface_hub
matplotlib
numpy
peft
protobuf<3.20
py-cpuinfo
sentencepiece
tiktoken
torch
transformers
transformers_stream_generator
zipfile38
91 changes: 91 additions & 0 deletions comps/reranks/neural-speed/neuralspeed-docker/server.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

import os
import time
from typing import Any, List

import numpy
from mosec import Server, Worker, get_logger
from mosec.mixin import TypedMsgPackMixin
from msgspec import Struct
from neural_speed import Model
from transformers import AutoModelForSequenceClassification, AutoTokenizer

logger = get_logger()

INFERENCE_BATCH_SIZE = 128
INFERENCE_MAX_WAIT_TIME = 10
INFERENCE_WORKER_NUM = 1
INFERENCE_CONTEXT = 512

TorchModel = "/root/bge-reranker-large"
NS_Bin = "/root/bge-large-r-q8.bin"

NS_Model = "bert"


class Request(Struct, kw_only=True):
query: str
docs: List[str]


class Response(Struct, kw_only=True):
scores: List[float]


class Inference(TypedMsgPackMixin, Worker):

def __init__(self):
super().__init__()
self.tokenizer = AutoTokenizer.from_pretrained(TorchModel)
self.model = Model()
self.model.init_from_bin(
NS_Model,
NS_Bin,
batch_size=INFERENCE_BATCH_SIZE,
n_ctx=INFERENCE_CONTEXT + 2,
)

def forward(self, data: List[Request]) -> List[Response]:
batch = len(data)
ndoc = []
inps = []
for data in data:
inp = [[data.query, doc] for doc in data.docs]
inps.extend(inp)
ndoc.append(len(data.docs))
outs = []
for i in range(0, len(inps), INFERENCE_BATCH_SIZE):
inp_bs = inps[i : i + INFERENCE_BATCH_SIZE]
inputs = self.tokenizer(
inp_bs, padding=True, truncation=True, max_length=INFERENCE_CONTEXT, return_tensors="pt"
)
st = time.time()
output = self.model(
**inputs,
reinit=True,
logits_all=True,
continuous_batching=False,
ignore_padding=True,
)
logger.info(f"Toal batch {batch} input shape {inputs.input_ids.shape} time {time.time()-st}")
outs.append(output)
ns_outputs = numpy.concatenate(outs, axis=0)
resps = []
pos = 0
for i in range(batch):
resp = Response(scores=ns_outputs[pos : pos + ndoc[i]].tolist())
pos += ndoc[i]
resps.append(resp)
return resps


if __name__ == "__main__":
INFERENCE_BATCH_SIZE = int(os.environ.get("MAX_BATCH_SIZE", 128))
INFERENCE_MAX_WAIT_TIME = int(os.environ.get("MAX_WAIT_TIME", 1))
server = Server()
server.append_worker(
Inference, max_batch_size=INFERENCE_BATCH_SIZE, max_wait_time=INFERENCE_MAX_WAIT_TIME, num=INFERENCE_WORKER_NUM
)
server.run()
11 changes: 11 additions & 0 deletions comps/reranks/neural-speed/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
docarray[full]
fastapi
langchain
langchain_community
openai
opentelemetry-api
opentelemetry-exporter-otlp
opentelemetry-sdk
prometheus-fastapi-instrumentator
shortuuid
uvicorn
Loading

0 comments on commit d76751a

Please sign in to comment.