Time-out issue when running native erlang views in 2.x on #1008

sklassen · 2017-11-18T13:48:38Z

I am seeing a time out when indexing a view written in native erlang. The erlang view works with 1.6.x for databases of any size; the view also works on 2.x, for database with fewer records or smaller documents. Ran against a database with many large(ish) documents, I see the following error:

[error] 2017-11-17T03:00:11.072015Z couchdb@localhost <0.12.1196> 19a93c5b89 rexi_server throw:{timeout,{gen_server,call,[<0.9106.1195>,{prompt,[...]}}]}]}]}} [{couch_mrview_util,get_view,4,[{file,"src/couch_mrview_util.erl"},{line,56}]},{couch_mrview,query_view,6,[{file,"src/couch_mrview.erl"},{line,244}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,139}]}]

If you latter try to call a view that failed due to time out, you get a second error:

[error] 2017-11-17T05:01:27.670419Z couchdb@localhost <0.26156.1198> d00b01bc7d rexi_server exit:timeout [{rexi,init_stream,1,[{file,"src/rexi.erl"},{line,256}]},{rexi,stream2,3,[{file,"src/rexi.erl"},{line,204}]},{fabric_rpc,view_cb,2,[{file,"src/fabric_rpc.erl"},{line,286}]},{couch_mrview,map_fold,3,[{file,"src/couch_mrview.erl"},{line,503}]},{couch_mrview_util,fold_fun,4,[{file,"src/couch_mrview_util.erl"},{line,360}]},{couch_btree,stream_kv_node2,8,[{file,"src/couch_btree.erl"},{line,783}]},{couch_btree,stream_kp_node,7,[{file,"src/couch_btree.erl"},{line,710}]},{couch_btree,fold,4,[{file,"src/couch_btree.erl"},{line,217}]}]

I suspect the gen_server timeout needs to be extended. With multiple nodes, some index tasked might be preempted and thus timing out.

I used the Ubuntu package couchdb 2.1.1-1 on xenial; I also replicated the same error on a an earlier 2.0 version running under snap.

gregoryjgarcia0 · 2018-02-08T20:02:09Z

I have that same error message in my log. Is your CPU usage getting spiked really hard by the erlang process too? That's the problem I'm trying to solve. I'm using 2.1.1

dc0d · 2018-03-01T06:19:05Z

Same error (on verify installation, installed via snap on Ubuntu 16.04). Also it installs 2.0.0 instead of 2.1.1.

davisp · 2018-03-09T17:52:48Z

Anyone have a way to duplicate this? The view engine changed between 1.6 and 2.x but the way Erlang functions are invoked shouldn't be any different so that's a bit odd. I'd also be interested in which version of Erlang is used as well.

janl · 2018-03-09T18:03:33Z

@davisp see #1142 for more context

davisp · 2018-03-09T18:31:25Z

Aha, updated their but also this seems like two different errors to me.

ghost · 2018-04-10T10:34:40Z

Same error , Erlang 6.2

[error] 2018-04-10T10:10:59.860288Z nonode@nohost <0.14711.9> bcb49f8b6b rexi_server: from: nonode@nohost(<0.32062.6>) mfa: fabric_rpc:reduce_view/4 throw:{timeout,{gen_server,call,[couch_proc_manager,{get_proc,<<"javascript">>},5000]}} [{couch_mrview_util,get_view_index_state,5,[{file,"src/couch_mrview_util.erl"},{line,101}]},{couch_mrview_util,get_view,4,[{file,"src/couch_mrview_util.erl"},{line,45}]},{couch_mrview,query_view,6,[{file,"src/couch_mrview.erl"},{line,244}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,139}]}]

janl · 2018-07-08T12:16:41Z

Closing until a reproducible test case is provided.

sklassen · 2018-11-07T19:50:11Z

Hi @janl , @wohali , @davisp

I have created a test suite that generates the timeout mentioned above.

https://github.com/sklassen/couchapp-erlang-example.git

There is a python script that generates a large number of (fairly big) documents. There is a javascript and erlang version of each view. Try the script with 5 documents to see it function. Then try 500 and you should see a timeouts on the erlang view. The javascript view can may crash too, restarting the server.

I last ran it on Couchdb 2.2, on ubuntu 18.04 from the http://apache.bintray.com/couchdb-deb bionic package. I ran it on a NUC 7i with 4 cores and 15G of memory (n=1,q=8). I've also seen it on (n=1,q=1) and over three NUCs (n=3,q=8). The same erlang views ran on 1.6x without issue.

Perhaphs there is a configurable timeout that needs tweaking?

Caesar305 · 2019-10-21T06:15:45Z

Are there any known work arounds for this issue?

sklassen · 2019-10-21T10:54:34Z

I can confirm I still see the same memory with the test suite above with a doc count of 500. When I run it there is no longer an error message; the process runs for some time until it quietly runs out of memory and restarts. (I am using the snap installation on ubuntu 19.04, version 2.3.1; erts-8.3.5.4; n=1; q=8 on a NUCs with 8 cores and 15GB).

I don't see the problem with larger databases with smaller documents. I suspect it also isn't only the size of the documents, but also the depth of nested structures. Memory management between erlang and the NIF is the likely culprit.

In my real-life database, as a workaround, I did a bit of everything: i) increased memory; ii) increased nodes n=5 (shared the problem around); iii) decreased the document size; iv) re-ran indexing multiple times. In my case, it now works on the second or third attempt of a full index. Incremental indexing is fine.

Caesar305 · 2019-10-21T15:45:02Z

Our database is not big, it just has over 100 databases. Each one maybe 200Mb in size, with a few thousand documents in each. This issue occurs randomly for us, one of the nodes will simply start responding to requests very slowly (over 20 second delay). When looking at the processes, I see 2 couchdb processes pegging a few CPUs. The logs are showing similar messages to OP. Running 32GB RAM, 16 processors, 1TB SSD drives. Not sure what I can tweak to help remedy this.

sklassen · 2020-03-26T05:57:03Z

Hi @janl , @wohali , @davisp

This problem disappeared after I rebuilt couchdb using jiffy 1.04 (see davisp/jiffy@0ba322e). Thanks @davisp for the fix.

wohali added the dbcore label Jan 16, 2018

janl added this to the 2.2.0 milestone Mar 5, 2018

janl added the need more info label Jul 8, 2018

janl closed this as completed Jul 8, 2018

wohali reopened this Nov 7, 2018

wohali removed the need more info label Nov 7, 2018

wohali modified the milestones: 2.2.0, 3.0.0 Jul 11, 2019

wohali closed this as completed Mar 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time-out issue when running native erlang views in 2.x on #1008

Time-out issue when running native erlang views in 2.x on #1008

sklassen commented Nov 18, 2017

gregoryjgarcia0 commented Feb 8, 2018

dc0d commented Mar 1, 2018 •

edited

Loading

davisp commented Mar 9, 2018

janl commented Mar 9, 2018

davisp commented Mar 9, 2018

ghost commented Apr 10, 2018

janl commented Jul 8, 2018

sklassen commented Nov 7, 2018

Caesar305 commented Oct 21, 2019

sklassen commented Oct 21, 2019

Caesar305 commented Oct 21, 2019

sklassen commented Mar 26, 2020

Time-out issue when running native erlang views in 2.x on #1008

Time-out issue when running native erlang views in 2.x on #1008

Comments

sklassen commented Nov 18, 2017

gregoryjgarcia0 commented Feb 8, 2018

dc0d commented Mar 1, 2018 • edited Loading

davisp commented Mar 9, 2018

janl commented Mar 9, 2018

davisp commented Mar 9, 2018

ghost commented Apr 10, 2018

janl commented Jul 8, 2018

sklassen commented Nov 7, 2018

Caesar305 commented Oct 21, 2019

sklassen commented Oct 21, 2019

Caesar305 commented Oct 21, 2019

sklassen commented Mar 26, 2020

dc0d commented Mar 1, 2018 •

edited

Loading