Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend Qanary Python MT components for multiple source and target languages #369

Merged
merged 22 commits into from
Sep 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
f511be8
allow for multiple source and target langugages
heinpa Aug 13, 2024
0f9bd4c
look for existing texts with language in kg
heinpa Aug 15, 2024
edcc680
move creation of insert queries to qanary-helpers
heinpa Aug 19, 2024
9e8ad79
update init of translation options and add tests
heinpa Aug 20, 2024
8df515c
add translation endpoints
heinpa Aug 20, 2024
d26dfc1
remove langid
heinpa Aug 20, 2024
84eb6c2
allow configuration of source and target languages
heinpa Aug 20, 2024
9b2435c
allow configuration of source and target languages
heinpa Aug 20, 2024
f3bcd9c
allow configuration of source and target languages
heinpa Aug 20, 2024
fa8f595
add fastapi support to LibreTranslate component
heinpa Aug 22, 2024
643fcff
add fastapi support to NLLB component
heinpa Aug 22, 2024
447f0ca
add fastapi support to MBart component
heinpa Aug 22, 2024
532d286
add fastapi support to Helsinki component
heinpa Aug 22, 2024
cc4d7fa
check for language support on translate endpoints
heinpa Aug 22, 2024
b749236
setup LibreTranslate for multi-language deployment
heinpa Aug 26, 2024
5803fc0
setup NLLB for multi-language deployment
heinpa Aug 26, 2024
1ce43e0
setup MBart for multi-language deployment
heinpa Aug 26, 2024
d158d92
setup Helsinki for multi-language deployment
heinpa Aug 26, 2024
67514b3
remove testpypi import
heinpa Aug 26, 2024
57e9533
remove old qanary-helpers from requirements file
heinpa Sep 3, 2024
9e106af
ensure consistent format of parameter names
heinpa Sep 5, 2024
2a8d949
enable for auto-deployment
heinpa Sep 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 10 additions & 3 deletions qanary-component-MT-Python-HelsinkiNLP/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,14 +1,21 @@
FROM python:3.7
FROM python:3.10

COPY requirements.txt ./

RUN pip install --upgrade pip
RUN pip install -r requirements.txt; exit 0
RUN pip install gunicorn
RUN pip install -r requirements.txt

COPY component component
COPY utils utils
COPY run.py boot.sh ./

# to allow preconfigured images
ARG SOURCE_LANGUAGE
ARG TARGET_LANGUAGE

ENV SOURCE_LANGUAGE=$SOURCE_LANGUAGE
ENV TARGET_LANGUAGE=$TARGET_LANGUAGE

RUN chmod +x boot.sh

ENTRYPOINT ["./boot.sh"]
45 changes: 36 additions & 9 deletions qanary-component-MT-Python-HelsinkiNLP/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,9 @@ SPRING_BOOT_ADMIN_CLIENT_INSTANCE_SERVICE-BASE-URL=http://public-component-host:
SPRING_BOOT_ADMIN_USERNAME=admin
SPRING_BOOT_ADMIN_PASSWORD=admin
SERVICE_NAME_COMPONENT=MT-Helsinki-NLP
SERVICE_DESCRIPTION_COMPONENT=Translates question to English
SERVICE_DESCRIPTION_COMPONENT=Translates questions
SOURCE_LANGUAGE=de
TARGET_LANGUAGE=en
```

The parameters description:
Expand All @@ -68,7 +69,8 @@ The parameters description:
* `SPRING_BOOT_ADMIN_CLIENT_INSTANCE_SERVICE-BASE-URL` -- the URL of your Qanary component (has to be visible to the Qanary pipeline)
* `SERVICE_NAME_COMPONENT` -- the name of your Qanary component (for better identification)
* `SERVICE_DESCRIPTION_COMPONENT` -- the description of your Qanary component
* `SOURCE_LANGUAGE` -- (optional) the source language of the text (the component will use langdetect if no source language is given)
* `SOURCE_LANGUAGE` -- (optional) the default source language of the translation
* `TARGET_LANGUAGE` -- (optional) the default target language of the translation

4. Build the Docker image:

Expand All @@ -82,18 +84,43 @@ docker-compose build .
docker-compose up
```

After execution, component creates Qanary annotation in the Qanary triplestore:
After successful execution, component creates Qanary annotation in the Qanary triplestore:
```
GRAPH <uuid> {
?a a qa:AnnotationOfQuestionLanguage .
?a qa:translationResult "translation result" .
?a qa:sourceLanguage "ISO_639-1 language code" .
?a oa:annotatedBy <urn:qanary:app_name> .
?a oa:annotatedAt ?time .
}
?a a qa:AnnotationOfQuestionTranslation .
?a oa:hasTarget <urn:myQanaryQuestion> .
?a oa:hasBody "translation_result"@ISO_639-1 language code
?a oa:annotatedBy <urn:qanary:app_name> .
?a oa:annotatedAt ?time .
}
```

### Support for multiple Source and Target Languages

This component relies on the presence of one of more existing annotations that associate a question text with a language.
This can be in the form of an `AnnotationOfQuestionLanguage`, as created by LD components, or an `AnnotationOfQuestionTranslation` as created by MT components.

It supports multiple combinations of source and target languages.
You can specify a desired source and target language independently, or simply use all available language pairings.

If a `SOURCE_LANGUAGE` is set, then only texts with this specific language are considered for translation.
If none is set, then all configured source languages will be used to find candidates for translation.

Similarily, if a `TARGET_LANGUAGE` is set, then texts are only translated into that language.
If none is set, then the texts are translated into all target languages that are supported for their respective source language.

Note that while configured source languages naturally determine the possible target languages,
the configured target languages also determine which source languages can be supported!

### Pre-configured Docker Images

You may use the included file `docker-compose-pairs.yml` to build a list of images that are preconfigured for specific language pairs.
Note that if you intend to use these containers at the same time, you need to assign different `SERVER_PORT` values for each image.

```bash
docker-compose -f docker-compose-pairs.yml build
```

## How To Test This Component

This component uses the [pytest](https://docs.pytest.org/).
Expand Down
35 changes: 26 additions & 9 deletions qanary-component-MT-Python-HelsinkiNLP/boot.sh
Original file line number Diff line number Diff line change
@@ -1,16 +1,33 @@
#!/bin/sh
#!/bin/bash
export $(grep -v "^#" < .env)

# check required parameters
declare -a required_vars=(
"SPRING_BOOT_ADMIN_URL"
"SERVER_HOST"
"SERVER_PORT"
"SPRING_BOOT_ADMIN_USERNAME"
"SPRING_BOOT_ADMIN_PASSWORD"
"SERVICE_NAME_COMPONENT"
"SERVICE_DESCRIPTION_COMPONENT"
)

export $(grep -v '^#' .env | xargs)
for param in ${required_vars[@]};
do
if [[ -z ${!param} ]]; then
echo "Required variable \"$param\" is not set!"
echo "The required variables are: ${required_vars[@]}"
exit 4
fi
done

echo Downloading the model
python -c "from transformers.models.marian.modeling_marian import MarianMTModel; from transformers.models.marian.tokenization_marian import MarianTokenizer; supported_langs = ['ru', 'es', 'de', 'fr']; models = {lang: MarianMTModel.from_pretrained('Helsinki-NLP/opus-mt-{lang}-en'.format(lang=lang)) for lang in supported_langs}; tokenizers = {lang: MarianTokenizer.from_pretrained('Helsinki-NLP/opus-mt-{lang}-en'.format(lang=lang)) for lang in supported_langs}"
echo Downloading the model finished
echo Downloading the models

python -c "from utils.model_utils import load_models_and_tokenizers; SUPPORTED_LANGS = { 'en': ['de', 'fr', 'ru', 'es'], 'de': ['en', 'fr', 'es'], 'fr': ['en', 'de', 'ru', 'es'], 'ru': ['en', 'fr', 'es'], 'es': ['en', 'de', 'fr', 'es'], }; load_models_and_tokenizers(SUPPORTED_LANGS); "

echo Downloading the model finished

echo The port number is: $SERVER_PORT
echo The host is: $SERVER_HOST
echo The Qanary pipeline URL is: $SPRING_BOOT_ADMIN_URL
if [ -n $SERVER_PORT ]
then
exec gunicorn -b :$SERVER_PORT --access-logfile - --error-logfile - run:app # refer to the gunicorn documentation for more options
fi
exec uvicorn run:app --host 0.0.0.0 --port $SERVER_PORT --log-level warning
32 changes: 19 additions & 13 deletions qanary-component-MT-Python-HelsinkiNLP/component/__init__.py
Original file line number Diff line number Diff line change
@@ -1,27 +1,33 @@
from component.mt_helsinki_nlp import mt_helsinki_nlp_bp
from flask import Flask
from component import mt_helsinki_nlp
from fastapi import FastAPI
from fastapi.responses import RedirectResponse, Response

version = "0.1.2"
version = "0.2.0"

# default config file (use -c parameter on command line specify a custom config file)
configfile = "app.conf"

# endpoint for health information of the service required for Spring Boot Admin server callback
healthendpoint = "/health"

aboutendpoint = "/about"
HEALTHENDPOINT = "/health"
ABOUTENDPOINT = "/about"
# TODO: add languages endpoint?

# initialize Flask app and add the externalized service information
app = Flask(__name__)
app.register_blueprint(mt_helsinki_nlp_bp)
app = FastAPI(docs_url="/swagger-ui.html")
app.include_router(mt_helsinki_nlp.router)


@app.get("/")
async def main():
return RedirectResponse("/about")


@app.route(healthendpoint, methods=['GET'])
@app.get(HEALTHENDPOINT, description="Shows the status of the component")
def health():
"""required health endpoint for callback of Spring Boot Admin server"""
return "alive"
return Response("alive", media_type="text/plain")

@app.route(aboutendpoint, methods=['GET'])
@app.get(ABOUTENDPOINT, description="Shows a description of the component")
def about():
"""required about endpoint for callback of Spring Boot Admin server"""
return "about"
"""required about endpoint for callback of Srping Boot Admin server"""
return Response("Translates questions into English", media_type="text/plain")
Loading