Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tidb-server memory usage goes to 50G #862

Closed
chentao opened this issue Aug 31, 2018 · 15 comments
Closed

tidb-server memory usage goes to 50G #862

chentao opened this issue Aug 31, 2018 · 15 comments
Labels

Comments

@chentao
Copy link

chentao commented Aug 31, 2018

环境:
五台机器,硬件配置均为32C 128G 3.5T SSD:
A:tidb-server,pd
B:monitoring,pd
C:tikv,pd
D:tikv
E:tikv

tidb版本:2.0.6,ansible部署,均为默认配置

客户端java使用javax.jdo(DataNucleus 3.2.9)访问tidb,并发连接数20。
大表记录数2kw。

大部分情况下,简单SELECT、INSERT操作QPS 4k,tidb-server内存在100M以内。
在一个特殊业务场景下,同时有大量SELECT with LEFT OUTER JOIN with WHERE,少量DELETE操作,具体SQL见tidb日志文件:
tidb.log.TAIL.txt
tidb_slow_query.log

相关schemas:
hive-schema-1.1.0.mysql.sql.txt
从日志中提取出来的SQLs(有些超长的不完整):
these_sqls.zip

在这个场景下,tidb-server内存在几十秒内从2G增加到50G,CPU利用率反而有所降低。
此时,通过go pprof获取heap的svg图像,发现两个地方内存占用较大:

  • HashJoinExec后续的fetchInnerRows

  • grpc.RecvMsg

附上先后两次的heap svg:
svg.zip

此问题可以必现。同样的场景在MySQL下执行成功且很快。

谢谢。

@jackysp jackysp changed the title tidb-server进程内存占用超过50G tidb-server memory usage goes to 50G Aug 31, 2018
@jackysp
Copy link
Member

jackysp commented Aug 31, 2018

Thanks for your feedback, @chentao
PTAL @zz-jason @XuHuaiyu .

@XuHuaiyu
Copy link
Contributor

Hi, @chentao , thank you for your feedback.

  1. From your description, IMO, your problem is that why TiDB's memory usage grows fast while the CPU usage reduces?
  • HashJoin will read all the data of the inner table into memory to build the HashTable, so the memory usage grows.
  • You may try set @@tidb_hash_join_concurrency = 32 and then do the test and check the CPU usage.
  1. It would be nice if you can provide us with both the profile of CPU usage and memory usage during executing, and it'll be better to sample the profile for a longer time than the result you provide before.

@chentao
Copy link
Author

chentao commented Aug 31, 2018

@XuHuaiyu thank you for guiding

  1. YES.
  2. I run set @@global.tidb_hash_join_concurrency=32 in another session, I think it will work for all sessions. Then I re-produce the problem with the same load, during the running, i do the profile of heap/cpu/block three times while the memory is growing:
    first-at-25G.tar.gz
    second-at-35G.tar.gz
    third-at-40G.tar.gz

hope it'll help.

@zz-jason
Copy link
Member

zz-jason commented Sep 4, 2018

@chentao 下面这两个 SQL 的执行计划是什么?

SELECT /*+ TIDB_INLJ(A0, B0) */
       `B0`.`INPUT_FORMAT`,`B0`.`IS_COMPRESSED`,`B0`.`IS_STOREDASSUBDIRECTORIES`,`B0`.`LOCATION`,`B0`.`NUM_BUCKETS`,`B0`.`OUTPUT_FORMAT`,`B0`.`SD_ID`,
       `C0`.`CREATE_TIME`,`C0`.`LAST_ACCESS_TIME`,`C0`.`OWNER`,`C0`.`RETENTION`,`C0`.`TBL_NAME`,`C0`.`TBL_TYPE`,`C0`.`TBL_ID`
FROM `PARTITIONS` `A0`
LEFT OUTER JOIN `SDS` `B0`
ON `A0`.`SD_ID` = `B0`.`SD_ID`
LEFT OUTER JOIN `TBLS` `C0`
ON `A0`.`TBL_ID` = `C0`.`TBL_ID`
WHERE `A0`.`PART_ID` = 30305567
SELECT `B0`.`INPUT_FORMAT`,`B0`.`IS_COMPRESSED`,`B0`.`IS_STOREDASSUBDIRECTORIES`,`B0`.`LOCATION`,`B0`.`NUM_BUCKETS`,`B0`.`OUTPUT_FORMAT`,`B0`.`SD_ID`,
       `C0`.`CREATE_TIME`,`C0`.`LAST_ACCESS_TIME`,`C0`.`OWNER`,`C0`.`RETENTION`,`C0`.`TBL_NAME`,`C0`.`TBL_TYPE`,`C0`.`TBL_ID`
FROM `PARTITIONS` `A0`
LEFT OUTER JOIN `SDS` `B0`
ON `A0`.`SD_ID` = `B0`.`SD_ID`
LEFT OUTER JOIN `TBLS` `C0`
ON `A0`.`TBL_ID` = `C0`.`TBL_ID`
WHERE `A0`.`PART_ID` = 30305567

这个问题应该是 hash join 在 build hash table 时消耗太多内存,提高 hash join concurrency 不会对减小内存有帮助,CPU 利用率没有上去应该是还在拉小表数据 build hash table,提高这个 concurrency 应该也不会有太大帮助

以这个 SQL 为例,可以试试用 index join 减少内存使用,加快 SQL 执行:

SELECT /*+ TIDB_INLJ(A0, B0) */
       `B0`.`INPUT_FORMAT`,`B0`.`IS_COMPRESSED`,`B0`.`IS_STOREDASSUBDIRECTORIES`,`B0`.`LOCATION`,`B0`.`NUM_BUCKETS`,`B0`.`OUTPUT_FORMAT`,`B0`.`SD_ID`,
       `C0`.`CREATE_TIME`,`C0`.`LAST_ACCESS_TIME`,`C0`.`OWNER`,`C0`.`RETENTION`,`C0`.`TBL_NAME`,`C0`.`TBL_TYPE`,`C0`.`TBL_ID`
FROM `PARTITIONS` `A0`
LEFT OUTER JOIN `SDS` `B0`
ON `A0`.`SD_ID` = `B0`.`SD_ID`
LEFT OUTER JOIN `TBLS` `C0`
ON `A0`.`TBL_ID` = `C0`.`TBL_ID`
WHERE `A0`.`PART_ID` = 30305567

@chentao
Copy link
Author

chentao commented Sep 4, 2018

@zz-jason 执行计划如图:
capture_004

请问下tidb在选择使用何种join方式时是怎样决策的呢?有没有相关文档可以了解一下?因为是使用DataNucleus访问的tidb,所以在SQL里面加hint的方式不太方便。

谢谢~

@chentao
Copy link
Author

chentao commented Sep 5, 2018

@zz-jason 帮忙看下这个SQL:
select PARTITIONS.PART_ID, SDS.SD_ID, SDS.CD_ID, SERDES.SERDE_ID, PARTITIONS.CREATE_TIME, PARTITIONS.LAST_ACCESS_TIME, SDS.INPUT_FORMAT, SDS.IS_COMPRESSED, SDS.IS_STOREDASSUBDIRECTORIES, SDS.LOCATION, SDS.NUM_BUCKETS, SDS.OUTPUT_FORMAT, SERDES.NAME, SERDES.SLIB from PARTITIONS left outer join SDS on PARTITIONS.SD_ID = SDS.SD_ID left outer join SERDES on SDS.SERDE_ID = SERDES.SERDE_ID where PART_ID in (30335325,30335386,30334288,30334351,30334395,30334456,30334512,30331757,30334556,30334098,30334149,30334185,30334234,30334901,30335120,30335178,30335220,30331814,30335275,30336817,30336870,30335672,30335880,30335930,30331859,30335985,30336035,30332160,30337056,30337104,30337303,30337348,30337396,30337446,30336914,30336955,30337001,30336366,30336421,30336467,30335437,30335497,30335553,30335613,30336091,30336145,30336199,30336255,30336313,30332107,30336522,30336575,30336624,30336697,30336754,30337502,30337548,30337773,30337826,30334618,30334678,30334739,30334782,30334837,30354270,30333876,30354317,30354379,30354441,30354503,30354549,30354600,30354645,30354693,30354748,30354790,30333932,30354827,30354884,30354934,30354992,30355034,30355097,30355157,30355223,30355280,30355341,30333980,30355385,30355454,30355495,30355551,30355607,30355662,30355720,30355768,30355826,30355860,30331694,30334041) order by PART_NAME asc;

其中的三个表PARTITIONS,SDS 和 SERDES 数据量都是5kw行左右。
在tidb执行时间为55秒,执行过程中tidb-server进程内存涨到20G左右。按照你提供的方法,如下,执行效果并没有改善。查看它们的执行计划,并没看出差异。

select /*+ TIDB_INLJ(PARTITIONS, SDS, SERDES) */ PARTITIONS.PART_ID, SDS.SD_ID, SDS.CD_ID, SERDES.SERDE_ID, PARTITIONS.CREATE_TIME, PARTITIONS.LAST_ACCESS_TIME, SDS.INPUT_FORMAT, SDS.IS_COMPRESSED, SDS.IS_STOREDASSUBDIRECTORIES, SDS.LOCATION, SDS.NUM_BUCKETS, SDS.OUTPUT_FORMAT, SERDES.NAME, SERDES.SLIB from PARTITIONS left outer join SDS on PARTITIONS.SD_ID = SDS.SD_ID left outer join SERDES on SDS.SERDE_ID = SERDES.SERDE_ID where PART_ID in (30335325,30335386,30334288,30334351,30334395,30334456,30334512,30331757,30334556,30334098,30334149,30334185,30334234,30334901,30335120,30335178,30335220,30331814,30335275,30336817,30336870,30335672,30335880,30335930,30331859,30335985,30336035,30332160,30337056,30337104,30337303,30337348,30337396,30337446,30336914,30336955,30337001,30336366,30336421,30336467,30335437,30335497,30335553,30335613,30336091,30336145,30336199,30336255,30336313,30332107,30336522,30336575,30336624,30336697,30336754,30337502,30337548,30337773,30337826,30334618,30334678,30334739,30334782,30334837,30354270,30333876,30354317,30354379,30354441,30354503,30354549,30354600,30354645,30354693,30354748,30354790,30333932,30354827,30354884,30354934,30354992,30355034,30355097,30355157,30355223,30355280,30355341,30333980,30355385,30355454,30355495,30355551,30355607,30355662,30355720,30355768,30355826,30355860,30331694,30334041) order by PART_NAME asc;

执行计划如图:
capture_111
capture_222

谢谢~

@chentao
Copy link
Author

chentao commented Sep 17, 2018

@zz-jason 麻烦看下这个问题:

在一个包含107条SQL的事务中,有如下一条SQL执行非常缓慢,经过profile发现HashJoinExec调用的AppendPartialRow占用内存13.9GB,从slow log看该SQL执行时间超过2分钟。
该SQL单独执行是很快的,且在where in条件较少(<100个)的情况下也是很快的。
给SQL加上 TIDB_INLJ 的hint后,还是同样的现象,从profile看好像并未生效,还是使用的hash join。
隔离级别设置的是repeatable-read

slow log中的SQL:
2018/09/17 10:21:20.823 adapter.go:364: [warning] [SLOW_QUERY] cost_time:2m29.713209735s succ:true con:3342031 user:[email protected] txn_start_ts:402954840985567235 database:hive_test table_ids:[107,127],sql:select /*+ TIDB_INLJ(PARTITIONS, SDS) */ "PARTITIONS"."PART_ID", "SDS"."SD_ID", "SDS"."CD_ID", "SDS"."SERDE_ID", "PARTITIONS"."CREATE_TIME", "PARTITIONS"."LAST_ACCESS_TIME", "SDS"."INPUT_FORMAT", "SDS"."IS_COMPRESSED", "SDS"."IS_STOREDASSUBDIRECTORIES", "SDS"."LOCATION", "SDS"."NUM_BUCKETS", "SDS"."OUTPUT_FORMAT" from "PARTITIONS" left outer join "SDS" on "PARTITIONS"."SD_ID" = "SDS"."SD_ID" where "PART_ID" in (34561180,34560848,34561181,34561182,34561183,34561184,34561185,34561186,34561187,34561188,34561189,34561190,34560849,34561191,34561192,34561193,34561194,34561195,34561196,34561197,34561198,34561199,34561200,34560850,34561201,34561202,34561203,34561204,34561205,34561206,34561207,34561208,34561209,34561210,34560815,34560851,34560852,34560853,34560854,34560855,34560856,34560857,34560858,34560859,34560860,34560816,34560861,34560862,34560863,34560864,34560865,34560866,34560867,34560868,34560869,34560870,34560817,34560871,34560872,34560873,34560874,34560875,34560876,34560877,34560878,34560879,34560880,34560818,34560881,34560882,34560883,34560884,34560885,34560886,34560887,34560888,34560889,34560890,34560819,34560891,34560892,34560893,34560894,34560895,34560896,34560897,34560898,34560899,34560900,34560820,34560901,34560902,34560903,34560904,34560905,34560906,34560907,34560908,34560909,34560910) order by "PART_NAME" asc

事务的tidb.log日志:
tidb-log.txt

tidb-server heap profile:
profile009 (2).zip

@zz-jason
Copy link
Member

@chentao 麻烦提供一下这个 sql 的执行计划:

select /*+ TIDB_INLJ(PARTITIONS, SDS) */
    "PARTITIONS"."PART_ID", "SDS"."SD_ID", "SDS"."CD_ID", "SDS"."SERDE_ID", "PARTITIONS"."CREATE_TIME", "PARTITIONS"."LAST_ACCESS_TIME", "SDS"."INPUT_FORMAT", "SDS"."IS_COMPRESSED", "SDS"."IS_STOREDASSUBDIRECTORIES", "SDS"."LOCATION", "SDS"."NUM_BUCKETS", "SDS"."OUTPUT_FORMAT"
from "PARTITIONS"
left outer join "SDS"
on "PARTITIONS"."SD_ID" = "SDS"."SD_ID"
where "PART_ID" in (34561180,34560848,34561181,34561182,34561183,34561184,34561185,34561186,34561187,34561188,34561189,34561190,34560849,34561191,34561192,34561193,34561194,34561195,34561196,34561197,34561198,34561199,34561200,34560850,34561201,34561202,34561203,34561204,34561205,34561206,34561207,34561208,34561209,34561210,34560815,34560851,34560852,34560853,34560854,34560855,34560856,34560857,34560858,34560859,34560860,34560816,34560861,34560862,34560863,34560864,34560865,34560866,34560867,34560868,34560869,34560870,34560817,34560871,34560872,34560873,34560874,34560875,34560876,34560877,34560878,34560879,34560880,34560818,34560881,34560882,34560883,34560884,34560885,34560886,34560887,34560888,34560889,34560890,34560819,34560891,34560892,34560893,34560894,34560895,34560896,34560897,34560898,34560899,34560900,34560820,34560901,34560902,34560903,34560904,34560905,34560906,34560907,34560908,34560909,34560910)
order by "PART_NAME" asc;

@alivxxx
Copy link
Contributor

alivxxx commented Sep 27, 2018

@chentao 麻烦也提供一下 PARTITIONS 和 SDS 的统计信息,可以用 curl -G "http://${tidb-server-ip}:${tidb-server-status-port}/stats/dump/${db_name}/${table_name}" > ${table_name}_stats.json 得到。

@chentao
Copy link
Author

chentao commented Sep 27, 2018

@zz-jason @lamxTyler
该SQL执行计划和stats信息如下:
capture_tidb-111
stats.tar.gz

@zz-jason
Copy link
Member

@chentao 看看 join 的结果集有多大呢:

select /*+ TIDB_INLJ(PARTITIONS, SDS) */ count(*)
from "PARTITIONS"
left outer join "SDS"
on "PARTITIONS"."SD_ID" = "SDS"."SD_ID"
where "PART_ID" in (34561180,34560848,34561181,34561182,34561183,34561184,34561185,34561186,34561187,34561188,34561189,34561190,34560849,34561191,34561192,34561193,34561194,34561195,34561196,34561197,34561198,34561199,34561200,34560850,34561201,34561202,34561203,34561204,34561205,34561206,34561207,34561208,34561209,34561210,34560815,34560851,34560852,34560853,34560854,34560855,34560856,34560857,34560858,34560859,34560860,34560816,34560861,34560862,34560863,34560864,34560865,34560866,34560867,34560868,34560869,34560870,34560817,34560871,34560872,34560873,34560874,34560875,34560876,34560877,34560878,34560879,34560880,34560818,34560881,34560882,34560883,34560884,34560885,34560886,34560887,34560888,34560889,34560890,34560819,34560891,34560892,34560893,34560894,34560895,34560896,34560897,34560898,34560899,34560900,34560820,34560901,34560902,34560903,34560904,34560905,34560906,34560907,34560908,34560909,34560910)
order by "PART_NAME" asc;

@chentao
Copy link
Author

chentao commented Sep 27, 2018

@zz-jason 结果集很小的,最多几千条的样子。两个源表的量级差不多,都是1亿条左右,但是在5百万左右的时候,也是有同样的问题。

@winoros
Copy link
Member

winoros commented Sep 27, 2018

@chentao
不好意思,之前漏看了这条语句是一个事务的一部分。由于事务在发生写操作之后,TiDB 需要对事务没有提交的数据做处理,所以会在 dataReader 算子之上增加一个 UnionScan 的算子来合并 KV 读到的数据和事务中没有提交的数据(你可以在 explain 的结果中看到这个算子)。受限于 TiDB 目前的相关实现,在有 UnionScan 这个算子时没法使用 IndexNestedLoopJoin。
我们需要花一些时间来让 TiDB 可以在存在 UnionScan 算子的时候仍然可以使用 Index Join。

除了这条 SQL 之外还有其他问题吗

另外,方便说一下单次事务的写入量大吗?如果事务中只有这个 SQL 运行时间不好的话,可以具体到 SDS 这张表上的写入量

@chentao
Copy link
Author

chentao commented Sep 27, 2018

@winoros 是在事务中,这个事务的话一般不包括写入(insert),大概是80%的是select,15%的delete,另外5%是update操作这样。事务中实际sql条数的话,从100多条到几千条都有测试过,都有这个问题。

谢谢你的上述解答,解了我的迷惑。

目前除了这个问题,没有其他问题了。看来我现在的做法是只能尽量在事务中避免这种大表join,只能将join改写成多个SQL了。现在已经这样改写了几个业务过程,取得的效果还不错。

另外,顺便说一下,下面这条不重要的SQL在MySQL中可以执行,但是在TiDB中执行失败:
CREATE TABLE temp_table AS SELECT 1 AS TEST_COL;

@winoros
Copy link
Member

winoros commented Sep 27, 2018

@chentao 好的,我们会尽快处理掉这种情况。由于涉及到算子执行逻辑的设计,我们需要花些时间来更改相关代码。

CREATE TABLE TABLE_NAME AS ... 这个语法暂时还没有支持,不过已经有pingcap/tidb#7787 在增加与该语法同义的 CREATE TABLE TABLE_NAME SELECT ...。顺利的话应该很快就可以兼容这个语法。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants