Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hyperlinks and paths validation. #699

Merged
merged 33 commits into from
Sep 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
e2e8ae5
Add hyperlinks and paths validation.
ZePan110 Sep 18, 2024
baac2d4
Fix format issue.
ZePan110 Sep 18, 2024
dbe26a9
Change runs-on
ZePan110 Sep 18, 2024
a9652e2
Add hyperlinks and paths validation.
ZePan110 Sep 18, 2024
3f17566
Fix format issue.
ZePan110 Sep 18, 2024
2824ad9
Change runs-on
ZePan110 Sep 18, 2024
d7a11f9
Merge branch 'zepan/dev' of https://github.com/opea-project/GenAIComp…
ZePan110 Sep 18, 2024
c43daee
Change link head.
ZePan110 Sep 18, 2024
259d73c
Fix issue.
ZePan110 Sep 18, 2024
4aa58dd
Add output.
ZePan110 Sep 18, 2024
456a36b
Change serch rules.
ZePan110 Sep 18, 2024
6dca896
Change output and fix error
ZePan110 Sep 18, 2024
508bc6f
For test
ZePan110 Sep 18, 2024
9ca7881
Fix error
ZePan110 Sep 18, 2024
0fcaefc
Fix error.
ZePan110 Sep 18, 2024
aea7650
Fix error.
ZePan110 Sep 18, 2024
afe7d0f
test.
ZePan110 Sep 18, 2024
c433bfb
Fix issue and add output
ZePan110 Sep 18, 2024
d496b44
Fix issue and test
ZePan110 Sep 18, 2024
ea148ae
Merge branch 'main' into zepan/dev
ZePan110 Sep 18, 2024
8b7e98e
Add PR's own detection.
ZePan110 Sep 18, 2024
6a1854d
reduce output
ZePan110 Sep 18, 2024
f1415a4
Remove debug code.
ZePan110 Sep 18, 2024
a1f699e
test
ZePan110 Sep 18, 2024
2febdd8
test.
ZePan110 Sep 18, 2024
0bffcfc
Compatible with the origin of PR.
ZePan110 Sep 19, 2024
9f1a77e
Ignore links that require verification by a real person.
ZePan110 Sep 19, 2024
2a17a87
Merge branch 'zepan/dev' of https://github.com/opea-project/GenAIComp…
ZePan110 Sep 19, 2024
fc54a05
Change the judgment method.
ZePan110 Sep 19, 2024
a5bae96
Add need ignore link.
ZePan110 Sep 19, 2024
cfbedb3
Merge branch 'main' into zepan/dev
ZePan110 Sep 19, 2024
790ac04
Change runs-on.
ZePan110 Sep 19, 2024
4f6177c
Redefine output.
ZePan110 Sep 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 121 additions & 0 deletions .github/workflows/pr-dockerfile-path-scan.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -156,3 +156,124 @@ jobs:
echo "Please modify the corresponding README in GenAIExamples repo and ask [email protected] for final confirmation."
exit 1
fi

check-the-validity-of-hyperlinks-in-README:
runs-on: ubuntu-latest
steps:
- name: Clean Up Working Directory
run: sudo rm -rf ${{github.workspace}}/*

- name: Checkout Repo GenAIComps
uses: actions/checkout@v4

- name: Check the Validity of Hyperlinks
# ignore_links=("https://platform.openai.com/docs/api-reference/fine-tuning"
# "https://platform.openai.com/docs/api-reference/"
# "https://openai.com/index/whisper/"
# "https://platform.openai.com/docs/api-reference/chat/create")
run: |
cd ${{github.workspace}}
fail="FALSE"
url_lines=$(grep -Eo '\]\(http[s]?://[^)]+\)' --include='*.md' -r .)
if [ -n "$url_lines" ]; then
for url_line in $url_lines; do
url=$(echo "$url_line"|cut -d '(' -f2 | cut -d ')' -f1|sed 's/\.git$//')
path=$(echo "$url_line"|cut -d':' -f1 | cut -d'/' -f2-)
if [[ "https://platform.openai.com/docs/api-reference/fine-tuning" == "$url" || "https://platform.openai.com/docs/api-reference/" == "$url" || "https://openai.com/index/whisper/" == "$url" || "https://platform.openai.com/docs/api-reference/chat/create" == "$url" ]]; then
echo "Link "$url" from ${{github.workspace}}/$path need to be verified by a real person."
else
response=$(curl -L -s -o /dev/null -w "%{http_code}" "$url")
if [ "$response" -ne 200 ]; then
echo "**********Validation failed, try again**********"
response_retry=$(curl -s -o /dev/null -w "%{http_code}" "$url")
if [ "$response_retry" -eq 200 ]; then
echo "*****Retry successfully*****"
else
echo "Invalid link from ${{github.workspace}}/$path: $url"
fail="TRUE"
fi
fi
fi
done
fi

if [[ "$fail" == "TRUE" ]]; then
exit 1
else
echo "All hyperlinks are valid."
fi
shell: bash

check-the-validity-of-relative-path:
runs-on: ubuntu-latest
steps:
- name: Clean up Working Directory
run: sudo rm -rf ${{github.workspace}}/*

- name: Checkout Repo GenAIComps
uses: actions/checkout@v4

- name: Checking Relative Path Validity
run: |
cd ${{github.workspace}}
fail="FALSE"
repo_name=${{ github.event.pull_request.head.repo.full_name }}
if [ "$(echo "$repo_name"|cut -d'/' -f1)" != "opea-project" ]; then
owner=$(echo "${{ github.event.pull_request.head.repo.full_name }}" |cut -d'/' -f1)
branch="https://github.com/$owner/GenAIComps/tree/${{ github.event.pull_request.head.ref }}"
else
branch="https://github.com/opea-project/GenAIComps/blob/${{ github.event.pull_request.head.ref }}"
fi
link_head="https://github.com/opea-project/GenAIComps/blob/main"
png_lines=$(grep -Eo '\]\([^)]+\)' --include='*.md' -r .|grep -Ev 'http')
if [ -n "$png_lines" ]; then
for png_line in $png_lines; do
refer_path=$(echo "$png_line"|cut -d':' -f1 | cut -d'/' -f2-)
png_path=$(echo "$png_line"|cut -d '(' -f2 | cut -d ')' -f1)
if [[ "${png_path:0:1}" == "/" ]]; then
check_path=${{github.workspace}}$png_path
elif [[ "${png_path:0:1}" == "#" ]]; then
check_path=${{github.workspace}}/$refer_path$png_path
else
check_path=${{github.workspace}}/$(dirname "$refer_path")/$png_path
fi
real_path=$(realpath $check_path)
if [ $? -ne 0 ]; then
echo "Path $png_path in file ${{github.workspace}}/$refer_path does not exist"
fail="TRUE"
else
url=$link_head$(echo "$real_path" | sed 's|.*/GenAIComps||')
response=$(curl -I -L -s -o /dev/null -w "%{http_code}" "$url")
if [ "$response" -ne 200 ]; then
echo "**********Validation failed, try again**********"
response_retry=$(curl -s -o /dev/null -w "%{http_code}" "$url")
if [ "$response_retry" -eq 200 ]; then
echo "*****Retry successfully*****"
else
echo "Retry failed. Check branch ${{ github.event.pull_request.head.ref }}"
url_dev=$branch$(echo "$real_path" | sed 's|.*/GenAIComps||')
response=$(curl -I -L -s -o /dev/null -w "%{http_code}" "$url_dev")
if [ "$response" -ne 200 ]; then
echo "**********Validation failed, try again**********"
response_retry=$(curl -s -o /dev/null -w "%{http_code}" "$url_dev")
if [ "$response_retry" -eq 200 ]; then
echo "*****Retry successfully*****"
else
echo "Invalid path from ${{github.workspace}}/$refer_path: $png_path"
fail="TRUE"
fi
else
echo "Check branch ${{ github.event.pull_request.head.ref }} successfully."
fi
fi
fi
fi
done
fi

if [[ "$fail" == "TRUE" ]]; then
exit 1
else
echo "All hyperlinks are valid."
fi
shell: bash
2 changes: 1 addition & 1 deletion comps/dataprep/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ apt-get install libreoffice

## Use LVM (Large Vision Model) for Summarizing Image Data

Occasionally unstructured data will contain image data, to convert the image data to the text data, LVM can be used to summarize the image. To leverage LVM, please refer to this [readme](../lvms/README.md) to start the LVM microservice first and then set the below environment variable, before starting any dataprep microservice.
Occasionally unstructured data will contain image data, to convert the image data to the text data, LVM can be used to summarize the image. To leverage LVM, please refer to this [readme](../lvms/llava/README.md) to start the LVM microservice first and then set the below environment variable, before starting any dataprep microservice.

```bash
export SUMMARIZE_IMAGE_VIA_LVM=1
Expand Down
2 changes: 1 addition & 1 deletion comps/finetuning/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,7 @@ curl http://${your_ip}:8015/v1/finetune/list_checkpoints -X POST -H "Content-Typ

### 3.4 Leverage fine-tuned model

After fine-tuning job is done, fine-tuned model can be chosen from listed checkpoints, then the fine-tuned model can be used in other microservices. For example, fine-tuned reranking model can be used in [reranks](../reranks/README.md) microservice by assign its path to the environment variable `RERANK_MODEL_ID`, fine-tuned embedding model can be used in [embeddings](../embeddings/README.md) microservice by assign its path to the environment variable `model`, LLMs after instruction tuning can be used in [llms](../llms/README.md) microservice by assign its path to the environment variable `your_hf_llm_model`.
After fine-tuning job is done, fine-tuned model can be chosen from listed checkpoints, then the fine-tuned model can be used in other microservices. For example, fine-tuned reranking model can be used in [reranks](../reranks/fastrag/README.md) microservice by assign its path to the environment variable `RERANK_MODEL_ID`, fine-tuned embedding model can be used in [embeddings](../embeddings/README.md) microservice by assign its path to the environment variable `model`, LLMs after instruction tuning can be used in [llms](../llms/text-generation/README.md) microservice by assign its path to the environment variable `your_hf_llm_model`.

## 🚀4. Descriptions for Finetuning parameters

Expand Down
2 changes: 1 addition & 1 deletion comps/guardrails/llama_guard/langchain/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ curl 127.0.0.1:8088/generate \

### 1.4 Start Guardrails Service

Optional: If you have deployed a Guardrails model with TGI Gaudi Service other than default model (i.e., `meta-llama/Meta-Llama-Guard-2-8B`) [from section 1.2](## 1.2 Start TGI Gaudi Service), you will need to add the eviornment variable `SAFETY_GUARD_MODEL_ID` containing the model id. For example, the following informs the Guardrails Service the deployed model used LlamaGuard2:
Optional: If you have deployed a Guardrails model with TGI Gaudi Service other than default model (i.e., `meta-llama/Meta-Llama-Guard-2-8B`) [from section 1.2](#12-start-tgi-gaudi-service), you will need to add the eviornment variable `SAFETY_GUARD_MODEL_ID` containing the model id. For example, the following informs the Guardrails Service the deployed model used LlamaGuard2:

```bash
export SAFETY_GUARD_MODEL_ID="meta-llama/Meta-Llama-Guard-2-8B"
Expand Down
2 changes: 1 addition & 1 deletion comps/vectorstores/pathway/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Set the environment variables for Pathway, and the embedding model.

> Note: If you are using `TEI_EMBEDDING_ENDPOINT`, make sure embedding service is already running.
> See the instructions under [here](../../../retrievers/langchain/pathway/README.md)
> See the instructions under [here](../../retrievers/pathway/langchain/README.md)

```bash
export PATHWAY_HOST=0.0.0.0
Expand Down