Skip to content

Commit

Permalink
Add hyperlinks and paths validation. (#699)
Browse files Browse the repository at this point in the history
* Add hyperlinks and paths validation.

Signed-off-by: ZePan110 <[email protected]>

* Fix format issue.

Signed-off-by: ZePan110 <[email protected]>

* Change runs-on

Signed-off-by: ZePan110 <[email protected]>

* Add hyperlinks and paths validation.

Signed-off-by: ZePan110 <[email protected]>

* Fix format issue.

Signed-off-by: ZePan110 <[email protected]>

* Change runs-on

Signed-off-by: ZePan110 <[email protected]>

* Change link head.

Signed-off-by: ZePan110 <[email protected]>

* Fix issue.

Signed-off-by: ZePan110 <[email protected]>

* Add output.

Signed-off-by: ZePan110 <[email protected]>

* Change serch rules.

Signed-off-by: ZePan110 <[email protected]>

* Change output and fix error

Signed-off-by: ZePan110 <[email protected]>

* For test

Signed-off-by: ZePan110 <[email protected]>

* Fix error

Signed-off-by: ZePan110 <[email protected]>

* Fix error.

Signed-off-by: ZePan110 <[email protected]>

* Fix error.

Signed-off-by: ZePan110 <[email protected]>

* test.

Signed-off-by: ZePan110 <[email protected]>

* Fix issue and add output

Signed-off-by: ZePan110 <[email protected]>

* Fix issue and test

Signed-off-by: ZePan110 <[email protected]>

* Add PR's own detection.

Signed-off-by: ZePan110 <[email protected]>

* reduce output

Signed-off-by: ZePan110 <[email protected]>

* Remove debug code.

Signed-off-by: ZePan110 <[email protected]>

* test

Signed-off-by: ZePan110 <[email protected]>

* test.

Signed-off-by: ZePan110 <[email protected]>

* Compatible with the origin of PR.

Signed-off-by: ZePan110 <[email protected]>

* Ignore links that require verification by a real person.
Restore test files.

Signed-off-by: ZePan110 <[email protected]>

* Change the judgment method.

Signed-off-by: ZePan110 <[email protected]>

* Add need ignore link.

Signed-off-by: ZePan110 <[email protected]>

* Change runs-on.

Signed-off-by: ZePan110 <[email protected]>

* Redefine output.

Signed-off-by: ZePan110 <[email protected]>

---------

Signed-off-by: ZePan110 <[email protected]>
  • Loading branch information
ZePan110 committed Sep 19, 2024
1 parent e29865e commit ccdd2d0
Show file tree
Hide file tree
Showing 5 changed files with 125 additions and 4 deletions.
121 changes: 121 additions & 0 deletions .github/workflows/pr-dockerfile-path-scan.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -156,3 +156,124 @@ jobs:
echo "Please modify the corresponding README in GenAIExamples repo and ask [email protected] for final confirmation."
exit 1
fi
check-the-validity-of-hyperlinks-in-README:
runs-on: ubuntu-latest
steps:
- name: Clean Up Working Directory
run: sudo rm -rf ${{github.workspace}}/*

- name: Checkout Repo GenAIComps
uses: actions/checkout@v4

- name: Check the Validity of Hyperlinks
# ignore_links=("https://platform.openai.com/docs/api-reference/fine-tuning"
# "https://platform.openai.com/docs/api-reference/"
# "https://openai.com/index/whisper/"
# "https://platform.openai.com/docs/api-reference/chat/create")
run: |
cd ${{github.workspace}}
fail="FALSE"
url_lines=$(grep -Eo '\]\(http[s]?://[^)]+\)' --include='*.md' -r .)
if [ -n "$url_lines" ]; then
for url_line in $url_lines; do
url=$(echo "$url_line"|cut -d '(' -f2 | cut -d ')' -f1|sed 's/\.git$//')
path=$(echo "$url_line"|cut -d':' -f1 | cut -d'/' -f2-)
if [[ "https://platform.openai.com/docs/api-reference/fine-tuning" == "$url" || "https://platform.openai.com/docs/api-reference/" == "$url" || "https://openai.com/index/whisper/" == "$url" || "https://platform.openai.com/docs/api-reference/chat/create" == "$url" ]]; then
echo "Link "$url" from ${{github.workspace}}/$path need to be verified by a real person."
else
response=$(curl -L -s -o /dev/null -w "%{http_code}" "$url")
if [ "$response" -ne 200 ]; then
echo "**********Validation failed, try again**********"
response_retry=$(curl -s -o /dev/null -w "%{http_code}" "$url")
if [ "$response_retry" -eq 200 ]; then
echo "*****Retry successfully*****"
else
echo "Invalid link from ${{github.workspace}}/$path: $url"
fail="TRUE"
fi
fi
fi
done
fi
if [[ "$fail" == "TRUE" ]]; then
exit 1
else
echo "All hyperlinks are valid."
fi
shell: bash

check-the-validity-of-relative-path:
runs-on: ubuntu-latest
steps:
- name: Clean up Working Directory
run: sudo rm -rf ${{github.workspace}}/*

- name: Checkout Repo GenAIComps
uses: actions/checkout@v4

- name: Checking Relative Path Validity
run: |
cd ${{github.workspace}}
fail="FALSE"
repo_name=${{ github.event.pull_request.head.repo.full_name }}
if [ "$(echo "$repo_name"|cut -d'/' -f1)" != "opea-project" ]; then
owner=$(echo "${{ github.event.pull_request.head.repo.full_name }}" |cut -d'/' -f1)
branch="https://github.com/$owner/GenAIComps/tree/${{ github.event.pull_request.head.ref }}"
else
branch="https://github.com/opea-project/GenAIComps/blob/${{ github.event.pull_request.head.ref }}"
fi
link_head="https://github.com/opea-project/GenAIComps/blob/main"
png_lines=$(grep -Eo '\]\([^)]+\)' --include='*.md' -r .|grep -Ev 'http')
if [ -n "$png_lines" ]; then
for png_line in $png_lines; do
refer_path=$(echo "$png_line"|cut -d':' -f1 | cut -d'/' -f2-)
png_path=$(echo "$png_line"|cut -d '(' -f2 | cut -d ')' -f1)
if [[ "${png_path:0:1}" == "/" ]]; then
check_path=${{github.workspace}}$png_path
elif [[ "${png_path:0:1}" == "#" ]]; then
check_path=${{github.workspace}}/$refer_path$png_path
else
check_path=${{github.workspace}}/$(dirname "$refer_path")/$png_path
fi
real_path=$(realpath $check_path)
if [ $? -ne 0 ]; then
echo "Path $png_path in file ${{github.workspace}}/$refer_path does not exist"
fail="TRUE"
else
url=$link_head$(echo "$real_path" | sed 's|.*/GenAIComps||')
response=$(curl -I -L -s -o /dev/null -w "%{http_code}" "$url")
if [ "$response" -ne 200 ]; then
echo "**********Validation failed, try again**********"
response_retry=$(curl -s -o /dev/null -w "%{http_code}" "$url")
if [ "$response_retry" -eq 200 ]; then
echo "*****Retry successfully*****"
else
echo "Retry failed. Check branch ${{ github.event.pull_request.head.ref }}"
url_dev=$branch$(echo "$real_path" | sed 's|.*/GenAIComps||')
response=$(curl -I -L -s -o /dev/null -w "%{http_code}" "$url_dev")
if [ "$response" -ne 200 ]; then
echo "**********Validation failed, try again**********"
response_retry=$(curl -s -o /dev/null -w "%{http_code}" "$url_dev")
if [ "$response_retry" -eq 200 ]; then
echo "*****Retry successfully*****"
else
echo "Invalid path from ${{github.workspace}}/$refer_path: $png_path"
fail="TRUE"
fi
else
echo "Check branch ${{ github.event.pull_request.head.ref }} successfully."
fi
fi
fi
fi
done
fi
if [[ "$fail" == "TRUE" ]]; then
exit 1
else
echo "All hyperlinks are valid."
fi
shell: bash
2 changes: 1 addition & 1 deletion comps/dataprep/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ apt-get install libreoffice

## Use LVM (Large Vision Model) for Summarizing Image Data

Occasionally unstructured data will contain image data, to convert the image data to the text data, LVM can be used to summarize the image. To leverage LVM, please refer to this [readme](../lvms/README.md) to start the LVM microservice first and then set the below environment variable, before starting any dataprep microservice.
Occasionally unstructured data will contain image data, to convert the image data to the text data, LVM can be used to summarize the image. To leverage LVM, please refer to this [readme](../lvms/llava/README.md) to start the LVM microservice first and then set the below environment variable, before starting any dataprep microservice.

```bash
export SUMMARIZE_IMAGE_VIA_LVM=1
Expand Down
2 changes: 1 addition & 1 deletion comps/finetuning/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,7 @@ curl http://${your_ip}:8015/v1/finetune/list_checkpoints -X POST -H "Content-Typ

### 3.4 Leverage fine-tuned model

After fine-tuning job is done, fine-tuned model can be chosen from listed checkpoints, then the fine-tuned model can be used in other microservices. For example, fine-tuned reranking model can be used in [reranks](../reranks/README.md) microservice by assign its path to the environment variable `RERANK_MODEL_ID`, fine-tuned embedding model can be used in [embeddings](../embeddings/README.md) microservice by assign its path to the environment variable `model`, LLMs after instruction tuning can be used in [llms](../llms/README.md) microservice by assign its path to the environment variable `your_hf_llm_model`.
After fine-tuning job is done, fine-tuned model can be chosen from listed checkpoints, then the fine-tuned model can be used in other microservices. For example, fine-tuned reranking model can be used in [reranks](../reranks/fastrag/README.md) microservice by assign its path to the environment variable `RERANK_MODEL_ID`, fine-tuned embedding model can be used in [embeddings](../embeddings/README.md) microservice by assign its path to the environment variable `model`, LLMs after instruction tuning can be used in [llms](../llms/text-generation/README.md) microservice by assign its path to the environment variable `your_hf_llm_model`.

## 🚀4. Descriptions for Finetuning parameters

Expand Down
2 changes: 1 addition & 1 deletion comps/guardrails/llama_guard/langchain/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ curl 127.0.0.1:8088/generate \

### 1.4 Start Guardrails Service

Optional: If you have deployed a Guardrails model with TGI Gaudi Service other than default model (i.e., `meta-llama/Meta-Llama-Guard-2-8B`) [from section 1.2](## 1.2 Start TGI Gaudi Service), you will need to add the eviornment variable `SAFETY_GUARD_MODEL_ID` containing the model id. For example, the following informs the Guardrails Service the deployed model used LlamaGuard2:
Optional: If you have deployed a Guardrails model with TGI Gaudi Service other than default model (i.e., `meta-llama/Meta-Llama-Guard-2-8B`) [from section 1.2](#12-start-tgi-gaudi-service), you will need to add the eviornment variable `SAFETY_GUARD_MODEL_ID` containing the model id. For example, the following informs the Guardrails Service the deployed model used LlamaGuard2:

```bash
export SAFETY_GUARD_MODEL_ID="meta-llama/Meta-Llama-Guard-2-8B"
Expand Down
2 changes: 1 addition & 1 deletion comps/vectorstores/pathway/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Set the environment variables for Pathway, and the embedding model.

> Note: If you are using `TEI_EMBEDDING_ENDPOINT`, make sure embedding service is already running.
> See the instructions under [here](../../../retrievers/langchain/pathway/README.md)
> See the instructions under [here](../../retrievers/pathway/langchain/README.md)
```bash
export PATHWAY_HOST=0.0.0.0
Expand Down

0 comments on commit ccdd2d0

Please sign in to comment.