Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]java连接polardb,执行sql频繁报Cause: com.aliyun.polardb.util.PSQLException: ERROR: out of memory #508

Open
sise1369 opened this issue May 18, 2024 · 9 comments
Labels
question Further information is requested

Comments

@sise1369
Copy link

Describe the problem
com.aliyun.polardb.util.PSQLException: ERROR: out of memory
详细:Failed on request of size 16 in memory context "CachedPlanQuery".
at com.aliyun.polardb.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2763)
at com.aliyun.polardb.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2372)
at com.aliyun.polardb.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:350)
at com.aliyun.polardb.jdbc.PgStatement.executeInternal(PgStatement.java:488)
at com.aliyun.polardb.jdbc.PgStatement.execute(PgStatement.java:405)
at com.aliyun.polardb.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:245)
at com.aliyun.polardb.jdbc.PgPreparedStatement.execute(PgPreparedStatement.java:234)
...

服务器是96核,128GB内存
调整以下参数,仍然还是会报内存溢出。特来咨询下,还有其他解决办法吗
shared_buffers=12GB
work_mem = 24MB
maintenance_work_mem = 128MB
autovacuum_work_mem = 128MB

@sise1369 sise1369 added the question Further information is requested label May 18, 2024
@polardb-bot
Copy link

polardb-bot bot commented May 18, 2024

Hi @sise1369 ~ Thanks for opening this issue! 🎉

Please make sure you have provided enough information for subsequent discussion.

We will get back to you as soon as possible. ❤️

@mrdrivingduck
Copy link
Member

@sise1369 Could you monitor the memory usage of your machine at the moment when there is an OOM happening?

@Mr-TTWang
Copy link

image
遇到这样的问题

@Mr-TTWang
Copy link

@mrdrivingduck see pic above, please help

@liuchengshan-lcs
Copy link

方便介绍一下你的操作流程吗?我们看下能否进行分析/复现。目前日志信息显示sharedserver在进行shmem_alloc的时候遇到oom,进而出发回退机制,算符合预期。引起oom的原因目前看是CachedPlanQuery,但是具体原因可能还需要您提供一些核心操作。

@liuchengshan-lcs
Copy link

方便介绍一下你的操作流程吗?我们看下能否进行分析/复现。目前日志信息显示sharedserver在进行shmem_alloc的时候遇到oom,进而出发回退机制,算符合预期。引起oom的原因目前看是CachedPlanQuery,但是具体原因可能还需要您提供一些核心操作。

您可以在相同操作下把shared server功能屏蔽一下,可以判断是否shared server带来的问题,目前看起来应该和这个关系不大

@Mr-TTWang
Copy link

方便介绍一下你的操作流程吗?我们看下能否进行分析/复现。目前日志信息显示sharedserver在进行shmem_alloc的时候遇到oom,进而出发回退机制,算符合预期。引起oom的原因目前看是CachedPlanQuery,但是具体原因可能还需要您提供一些核心操作。

您可以在相同操作下把shared server功能屏蔽一下,可以判断是否shared server带来的问题,目前看起来应该和这个关系不大

我是使用的 binary那个镜像 然后只启动另一个primary, 然后启动脚本是

#!/bin/bash

polar_data_dir=${POLARDB_DATA_DIR}
sudo mkdir -p ${polar_data_dir}
sudo chmod a+wr ${polar_data_dir}
sudo chown -R postgres:postgres ${polar_data_dir}

primary_datadir="$polar_data_dir/primary_datadir"
shared_datadir="$polar_data_dir/shared_datadir"

polardb_init() {
    primary_port=${POLARDB_PORT:-5432}

    # primary private dir and shared dir
    rm -rf ${primary_datadir}
    rm -rf ${shared_datadir}
    mkdir -p ${primary_datadir}
    mkdir -p ${shared_datadir}

    # initdb
    initdb -k -U postgres -D ${primary_datadir}

    # default GUCs
    echo "polar_enable_shared_storage_mode = on" >> ${primary_datadir}/postgresql.conf
    echo "polar_hostid = 1" >> ${primary_datadir}/postgresql.conf
    echo "max_connections = 2000" >> ${primary_datadir}/postgresql.conf
    echo "polar_wal_pipeline_enable = true" >> ${primary_datadir}/postgresql.conf
    echo "polar_create_table_with_full_replica_identity = off" >> ${primary_datadir}/postgresql.conf
    echo "logging_collector = on" >> ${primary_datadir}/postgresql.conf
    echo "log_directory = 'pg_log'" >> ${primary_datadir}/postgresql.conf

    echo "shared_buffers = '2GB'" >> ${primary_datadir}/postgresql.conf
    echo "synchronous_commit = on" >> ${primary_datadir}/postgresql.conf
    echo "full_page_writes = off" >> ${primary_datadir}/postgresql.conf
    echo "autovacuum_naptime = 10min" >> ${primary_datadir}/postgresql.conf
    echo "max_worker_processes = 32" >> ${primary_datadir}/postgresql.conf
    echo "polar_use_statistical_relpages = off" >> ${primary_datadir}/postgresql.conf
    echo "polar_enable_persisted_buffer_pool = off" >> ${primary_datadir}/postgresql.conf
    echo "polar_nblocks_cache_mode = 'all'" >> ${primary_datadir}/postgresql.conf
    echo "polar_enable_replica_use_smgr_cache = on" >> ${primary_datadir}/postgresql.conf
    echo "polar_enable_standby_use_smgr_cache = on" >> ${primary_datadir}/postgresql.conf
    echo "polar_enable_flashback_log = on" >> ${primary_datadir}/postgresql.conf
    echo "polar_enable_fast_recovery_area = on" >> ${primary_datadir}/postgresql.conf

    # storage-related GUCs
    disk_name=`echo ${shared_datadir} | cut -d '/' -f2`
    echo "polar_vfs.localfs_mode = true" >> ${primary_datadir}/postgresql.conf
    echo "polar_enable_localfs_test_mode = on" >> ${primary_datadir}/postgresql.conf
    echo "polar_enable_shared_storage_mode = on" >> ${primary_datadir}/postgresql.conf
    echo "listen_addresses = '*'" >> ${primary_datadir}/postgresql.conf
    echo "polar_disk_name = '$disk_name'" >> ${primary_datadir}/postgresql.conf
    echo "polar_datadir = 'file-dio://$shared_datadir'" >> ${primary_datadir}/postgresql.conf

    # preload extensions
    echo "shared_preload_libraries = 'polar_px,polar_vfs,polar_worker,pg_stat_statements,auth_delay,auto_explain,polar_monitor_preload,polar_stat_sql,timescaledb'" >> ${primary_datadir}/postgresql.conf

    # shared dir initialization
    polar-initdb.sh ${primary_datadir}/ ${shared_datadir}/ localfs

    # allow external connections
    echo "host all all 0.0.0.0/0 md5" >> ${primary_datadir}/pg_hba.conf

    echo "port = $primary_port" >> ${primary_datadir}/postgresql.conf
    echo "polar_hostid = 100" >> ${primary_datadir}/postgresql.conf
    echo "full_page_writes = off" >> ${primary_datadir}/postgresql.conf

    # PX related GUCs
    echo "polar_enable_px=0" >> ${primary_datadir}/postgresql.conf
    echo "polar_px_enable_check_workers=0" >> ${primary_datadir}/postgresql.conf
    echo "polar_px_enable_replay_wait=1" >> ${primary_datadir}/postgresql.conf
    echo "polar_px_dop_per_node=3" >> ${primary_datadir}/postgresql.conf
    echo "polar_px_max_workers_number=0" >> ${primary_datadir}/postgresql.conf
    echo "polar_px_enable_cte_shared_scan=1" >> ${primary_datadir}/postgresql.conf
    echo "polar_px_enable_partition=1" >> ${primary_datadir}/postgresql.conf
    echo "polar_px_enable_left_index_nestloop_join=1" >> ${primary_datadir}/postgresql.conf
    echo "polar_px_wait_lock_timeout=1800000" >> ${primary_datadir}/postgresql.conf
    echo "polar_px_enable_partitionwise_join=1" >> ${primary_datadir}/postgresql.conf
    echo "polar_px_optimizer_multilevel_partitioning=1" >> ${primary_datadir}/postgresql.conf
    echo "polar_px_max_slices=1000000" >> ${primary_datadir}/postgresql.conf
    echo "polar_px_enable_adps=1" >> ${primary_datadir}/postgresql.conf
    echo "polar_px_enable_adps_explain_analyze=1" >> ${primary_datadir}/postgresql.conf
    echo "polar_trace_heap_scan_flow=1" >> ${primary_datadir}/postgresql.conf
    echo "polar_px_enable_spi_read_all_namespaces=1" >> ${primary_datadir}/postgresql.conf

    # Shared server GUCs
    echo "polar_enable_shared_server = on" >> ${primary_datadir}/postgresql.conf
    echo "polar_enable_shm_aset = on" >> ${primary_datadir}/postgresql.conf

    # start up primary node
    pg_ctl -D ${primary_datadir} start

    # create default user with password, if specified
    if [[ ${POLARDB_USER} == "postgres" ]];
    then
        if [[ -n ${POLARDB_PASSWORD} ]];
        then
            psql -h 127.0.0.1 -p $primary_port -d postgres -c "ALTER ROLE ${POLARDB_USER} PASSWORD '${POLARDB_PASSWORD}'"
        fi
    elif [[ -n ${POLARDB_USER} ]];
    then
        if [[ -n ${POLARDB_PASSWORD} ]];
        then
            psql -h 127.0.0.1 -p $primary_port -d postgres -c "CREATE ROLE ${POLARDB_USER} PASSWORD '${POLARDB_PASSWORD}' SUPERUSER LOGIN"
        fi
    fi
}

# If the data volume is empty, we will try to initdb here.
if [ -z "$(ls -A $polar_data_dir)" ];
then
    polardb_init
else
    rm -f $shared_datadir/DEATH # solve the restarting error
    echo "waiting starting"
    sleep 60s
    pg_ctl -D ${primary_datadir} start
fi

tail -f /dev/null

# Stop PolarDB-PG.
if [ ! $(pg_ctl -D ${primary_datadir} status | grep -q "server is running") ];
then
    pg_ctl -D ${primary_datadir} stop
fi

C++ 服务连接倒是正常,但是java使用postgres的jdbc连接的时候 会有那种问题,只是猜测,因为我们登录界面调用java接口无法查询到数据库信息,无法登录。
下午调整了

shared_buffers; 4GB
work_mem; 256MB
maintenance_work_mem; 1GB
还是存在问题,然后
昨晚我把 #polar_enable_shm_aset = on 注释掉了
没再报错。

不知道使用单实例部署,shared相关需要怎么处理和配置,或者还是有啥别的原因..

@liuchengshan-lcs
Copy link

polar_enable_shm_aset参数表示是否开启全局内存。根据您的描述来看,可能是jdbc的客户端链接在某些条件下和内核不兼容?会触发全局内存的OOM。您那边临时先关闭全局内存的参数进行下测试和使用?我们也尝试一下。

@Mr-TTWang
Copy link

polar_enable_shm_aset参数表示是否开启全局内存。根据您的描述来看,可能是jdbc的客户端链接在某些条件下和内核不兼容?会触发全局内存的OOM。您那边临时先关闭全局内存的参数进行下测试和使用?我们也尝试一下。

好的,等你们消息

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants