Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Docker 镜像构建时并没有调用warm_up_vectordb预热nltk.download("punkt") #1971

Open
awwaawwa opened this issue Sep 20, 2024 · 2 comments

Comments

@awwaawwa
Copy link
Contributor

Installation Method | 安装方法与平台

Others (Please Describe)

Version | 版本

Latest | 最新版

OS | 操作系统

Docker

Describe the bug | 简述

类似 docs/GithubAction+NoLocal+Latex: RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'

Screen Shot | 有帮助的截图

网络不太好的话,运行docker的时候会卡在这里
CleanShot 2024-09-20 at 12 50 46@2x

Terminal Traceback & Material to Help Reproduce Bugs | 终端traceback(如有) + 帮助我们复现的测试材料样本(如有)

No response

@awwaawwa
Copy link
Contributor Author

把那句话改成以下这句应该能解决

RUN python3  -c 'from check_proxy import warm_up_modules, warm_up_vectordb; warm_up_modules(); warm_up_vectordb();'

@hongyi-zhao
Copy link
Collaborator

hongyi-zhao commented Sep 20, 2024

对于直接源码运行的情况,我最终采用了下面的方法:

$ cat nltk_data_setup.sh 
#!/bin/bash

#https://www.nltk.org/nltk_data/
# 设置基础目录,根据自己的情况对应调整:
BASE_DIR="$HOME/.pyenv/versions/gpt_academic/lib/python3.11/site-packages/llama_index/core/_static/nltk_cache"

# 定义下载链接和目标路径
declare -A downloads=(
    ["https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/stopwords.zip"]="corpora/stopwords"
    ["https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/tokenizers/punkt.zip"]="tokenizers/punkt"
    ["https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/tokenizers/punkt_tab.zip"]="tokenizers/punkt_tab"
)

# 创建临时目录
TEMP_DIR=$(mktemp -d)

# 下载和解压函数
download_and_extract() {
    local url="$1"
    local path="$2"
    local filename=$(basename "$url")
    local target_dir="$BASE_DIR/$path"

    # 如果目标目录存在,先删除它
    if [ -d "$target_dir" ]; then
        echo "Removing existing directory: $target_dir"
        rm -rf "$target_dir"
    fi

    echo "Downloading $url..."
    if ! curl -x socks5h://127.0.0.1:16668 -L "$url" -o "$TEMP_DIR/$filename"; then
        echo "Failed to download $url"
        return 1
    fi

    echo "Extracting to $target_dir..."
    mkdir -p "$target_dir"
    if ! unzip -o "$TEMP_DIR/$filename" -d "$target_dir"; then
        echo "Failed to extract $filename"
        return 1
    fi

    echo "Extraction complete for $filename"
}

# 主程序
main() {
    for url in "${!downloads[@]}"; do
        download_and_extract "$url" "${downloads[$url]}"
    done

    # 清理临时目录
    rm -rf "$TEMP_DIR"
    echo "All downloads and extractions completed."
}

# 运行主程序
main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants