Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] MS SQL database backend reason: {port_exit,memory_allocation_failed} with Erlang >25.3 #4178

Closed
sando38 opened this issue Mar 19, 2024 · 5 comments
Labels
Database:MSSQL Microsoft SQL server Kind:Bug

Comments

@sando38
Copy link
Contributor

sando38 commented Mar 19, 2024

Environment

  • ejabberd version: 24.02
  • Erlang version: 26.2
  • OS: Linux (Debian12) and Alpine Linux 3.19
  • Installed from: source | ghcr.io/processone/ejabberd:24.02

Configuration (only if needed): grep -Ev '^$|^\s*#' ejabberd.yml

loglevel: info

Errors from error.log/crash.log

2024-03-16T20:39:12.6496590Z 2024-03-16 20:32:36.414879+00:00 [info] <0.423.0>@gen_mod:start_modules/0:141 Loading modules for 8310292872-12.example.com
2024-03-16T20:39:12.6498391Z 2024-03-16 20:32:36.449292+00:00 [info] <0.374.0>@ejabberd_mnesia:create/2:270 Creating Mnesia ram table 'pubsub_last_item'
2024-03-16T20:39:12.6500208Z 2024-03-16 20:32:36.461769+00:00 [info] <0.374.0>@ejabberd_mnesia:create/2:270 Creating Mnesia ram table 'mod_register_ip'
2024-03-16T20:39:12.6502116Z 2024-03-16 20:32:36.488646+00:00 [info] <0.608.0>@mod_stun_disco:parse_listener/1:626 Going to offer STUN service: 10.244.3.7:3478 (udp)
2024-03-16T20:39:12.6504254Z 2024-03-16 20:32:36.551778+00:00 [info] <0.374.0>@ejabberd_mnesia:create/2:270 Creating Mnesia ram table 'muc_online_room'
2024-03-16T20:39:12.6506198Z 2024-03-16 20:32:36.643316+00:00 [error] <0.649.0> ODBC: received unexpected info: {tcp_closed,#Port<0.91>}
2024-03-16T20:39:12.6507055Z 
2024-03-16T20:39:12.6508629Z 2024-03-16 20:32:36.643508+00:00 [error] <0.632.0>@ejabberd_sql:check_error/2:1404 SQL query 'select name, opts from muc_room where host='conference.8310292872-12.example.com'' failed: connection_closed
2024-03-16T20:39:12.6511449Z 2024-03-16 20:32:36.643598+00:00 [error] <0.632.0>@ejabberd_sql:check_error/2:1397 SQL query 'Q80627668' at {mod_muc_sql,{293,23}} failed: connection_closed
2024-03-16T20:39:12.6513413Z 2024-03-16 20:32:36.643986+00:00 [error] <0.649.0>@gen_server:error_info/8:1391 ** Generic server <0.649.0> terminating 
2024-03-16T20:39:12.6514552Z ** Last message in was {#Port<0.69>,{exit_status,2}}
2024-03-16T20:39:12.6515218Z ** When Server state == {state,#Port<0.69>,
2024-03-16T20:39:12.6515798Z                                {<0.632.0>,
2024-03-16T20:39:12.6516387Z                                 #Ref<0.3142729776.3845652482.250765>},
2024-03-16T20:39:12.6517084Z                                <0.632.0>,undefined,on,false,false,off,
2024-03-16T20:39:12.6517768Z                                connected,undefined,0,
2024-03-16T20:39:12.6518400Z                                [#Port<0.67>,#Port<0.68>],
2024-03-16T20:39:12.6519032Z                                #Port<0.90>,#Port<0.91>}
2024-03-16T20:39:12.6519621Z ** Reason for termination ==
2024-03-16T20:39:12.6520117Z ** {port_exit,memory_allocation_failed}
2024-03-16T20:39:12.6520518Z 
2024-03-16T20:39:12.6521166Z 2024-03-16 20:32:36.645543+00:00 [error] <0.649.0>@proc_lib:crash_report/4:584 CRASH REPORT:
2024-03-16T20:39:12.6522026Z   crasher:
2024-03-16T20:39:12.6522380Z     initial call: odbc:init/1
2024-03-16T20:39:12.6522837Z     pid: <0.649.0>
2024-03-16T20:39:12.6523221Z     registered_name: []
2024-03-16T20:39:12.6523768Z     exception exit: {port_exit,memory_allocation_failed}
2024-03-16T20:39:12.6524678Z       in function  gen_server:handle_common_reply/8 (gen_server.erl, line 1226)
2024-03-16T20:39:12.6525483Z     ancestors: [odbc_sup,<0.344.0>]
2024-03-16T20:39:12.6525987Z     message_queue_len: 1
2024-03-16T20:39:12.6526524Z     messages: [{'EXIT',#Port<0.69>,normal}]
2024-03-16T20:39:12.6527072Z     links: [<0.632.0>,<0.345.0>]
2024-03-16T20:39:12.6527535Z     dictionary: []
2024-03-16T20:39:12.6527907Z     trap_exit: true
2024-03-16T20:39:12.6528287Z     status: running
2024-03-16T20:39:12.6528658Z     heap_size: 6772
2024-03-16T20:39:12.6529036Z     stack_size: 28
2024-03-16T20:39:12.6529418Z     reductions: 19518
2024-03-16T20:39:12.6529805Z   neighbours:
2024-03-16T20:39:12.6530031Z 
2024-03-16T20:39:12.6530830Z 2024-03-16 20:32:36.649964+00:00 [warning] <0.632.0>@ejabberd_sql:handle_reconnect/2:491 mssql connection failed:
2024-03-16T20:39:12.6531919Z ** Reason: {port_exit,memory_allocation_failed}
2024-03-16T20:39:12.6532510Z ** Retry after: 5 seconds
2024-03-16T20:39:12.6533452Z 2024-03-16 20:32:36.649979+00:00 [error] <0.345.0>@supervisor:do_restart/3:759 SUPERVISOR REPORT:
2024-03-16T20:39:12.6534377Z     supervisor: {local,odbc_sup}
2024-03-16T20:39:12.6534891Z     errorContext: child_terminated
2024-03-16T20:39:12.6535462Z     reason: {port_exit,memory_allocation_failed}
2024-03-16T20:39:12.6536057Z     offender: [{pid,<0.649.0>},
2024-03-16T20:39:12.6536516Z                {id,[]},
2024-03-16T20:39:12.6537002Z                {mfargs,{odbc,start_link_sup,undefined}},
2024-03-16T20:39:12.6537632Z                {restart_type,temporary},
2024-03-16T20:39:12.6538183Z                {significant,false},
2024-03-16T20:39:12.6538677Z                {shutdown,7000},
2024-03-16T20:39:12.6539153Z                {child_type,worker}]
2024-03-16T20:39:12.6539496Z 
2024-03-16T20:39:12.6540870Z 2024-03-16 20:32:36.720319+00:00 [warning] <0.675.0>@ejabberd_captcha:get_prog_name/0:444 The option captcha_cmd is not configured, but some module wants to use the CAPTCHA feature.
2024-03-16T20:39:12.6543495Z 2024-03-16 20:32:36.754287+00:00 [info] <0.190.0>@ejabberd_cluster_mnesia:wait_for_sync/1:123 Waiting for Mnesia synchronization to complete
2024-03-16T20:39:12.6546970Z 2024-03-16 20:32:36.772555+00:00 [warning] <0.511.0>@ejabberd_pkix:log_warnings/1:393 Invalid certificate in /opt/ejabberd/certs/custom-cert/cert.pem: at line 1: unused private key
2024-03-16T20:39:12.6549459Z 2024-03-16 20:32:36.793597+00:00 [warning] <0.511.0>@ejabberd_pkix:check_domain_certfiles/1:312 No certificate found matching 8310292872-12.example.com
2024-03-16T20:39:12.6551565Z 2022024-03-16 20:32:36.643316+00:00 [error] <0.649.0> ODBC: received unexpected info: {tcp_closed,#Port<0.91>}
2024-03-16T20:39:12.6552430Z 
2024-03-16T20:39:12.6553964Z 2024-03-16 20:32:36.643508+00:00 [error] <0.632.0>@ejabberd_sql:check_error/2:1404 SQL query 'select name, opts from muc_room where host='conference.8310292872-12.example.com'' failed: connection_closed
2024-03-16T20:39:12.6556589Z 2024-03-16 20:32:36.643598+00:00 [error] <0.632.0>@ejabberd_sql:check_error/2:1397 SQL query 'Q80627668' at {mod_muc_sql,{293,23}} failed: connection_closed
2024-03-16T20:39:12.6558551Z 2024-03-16 20:32:36.643986+00:00 [error] <0.649.0>@gen_server:error_info/8:1391 ** Generic server <0.649.0> terminating 
2024-03-16T20:39:12.6559688Z ** Last message in was {#Port<0.69>,{exit_status,2}}
2024-03-16T20:39:12.6560349Z ** When Server state == {state,#Port<0.69>,
2024-03-16T20:39:12.6560925Z                                {<0.632.0>,
2024-03-16T20:39:12.6561508Z                                 #Ref<0.3142729776.3845652482.250765>},
2024-03-16T20:39:12.6562205Z                                <0.632.0>,undefined,on,false,false,off,
2024-03-16T20:39:12.6562880Z                                connected,undefined,0,
2024-03-16T20:39:12.6563515Z                                [#Port<0.67>,#Port<0.68>],
2024-03-16T20:39:12.6564143Z                                #Port<0.90>,#Port<0.91>}
2024-03-16T20:39:12.6564731Z ** Reason for termination ==
2024-03-16T20:39:12.6565221Z ** {port_exit,memory_allocation_failed}
2024-03-16T20:39:12.6565620Z 
2024-03-16T20:39:12.6566287Z 2024-03-16 20:32:36.645543+00:00 [error] <0.649.0>@proc_lib:crash_report/4:584 CRASH REPORT:
2024-03-16T20:39:12.6567211Z   crasher:
2024-03-16T20:39:12.6567598Z     initial call: odbc:init/1
2024-03-16T20:39:12.6568080Z     pid: <0.649.0>
2024-03-16T20:39:12.6568491Z     registered_name: []
2024-03-16T20:39:12.6569067Z     exception exit: {port_exit,memory_allocation_failed}
2024-03-16T20:39:12.6570030Z       in function  gen_server:handle_common_reply/8 (gen_server.erl, line 1226)
2024-03-16T20:39:12.6570912Z     ancestors: [odbc_sup,<0.344.0>]
2024-03-16T20:39:12.6571486Z     message_queue_len: 1
2024-03-16T20:39:12.6572147Z     messages: [{'EXIT',#Port<0.69>,normal}]
2024-03-16T20:39:12.6572771Z     links: [<0.632.0>,<0.345.0>]
2024-03-16T20:39:12.6573276Z     dictionary: []
2024-03-16T20:39:12.6573697Z     trap_exit: true
2024-03-16T20:39:12.6574131Z     status: running
2024-03-16T20:39:12.6574544Z     heap_size: 6772
2024-03-16T20:39:12.6574968Z     stack_size: 28
2024-03-16T20:39:12.6575383Z     reductions: 19518
2024-03-16T20:39:12.6575809Z   neighbours:
2024-03-16T20:39:12.6576051Z 
2024-03-16T20:39:12.6576798Z 2024-03-16 20:32:36.649979+00:00 [error] <0.345.0>@supervisor:do_restart/3:759 SUPERVISOR REPORT:
2024-03-16T20:39:12.6577739Z     supervisor: {local,odbc_sup}
2024-03-16T20:39:12.6578269Z     errorContext: child_terminated
2024-03-16T20:39:12.6578835Z     reason: {port_exit,memory_allocation_failed}
2024-03-16T20:39:12.6579427Z     offender: [{pid,<0.649.0>},
2024-03-16T20:39:12.6579877Z                {id,[]},
2024-03-16T20:39:12.6580372Z                {mfargs,{odbc,start_link_sup,undefined}},
2024-03-16T20:39:12.6580992Z                {restart_type,temporary},
2024-03-16T20:39:12.6581555Z                {significant,false},
2024-03-16T20:39:12.6582045Z                {shutdown,7000},
2024-03-16T20:39:12.6582536Z                {child_type,worker}]
2024-03-16T20:39:12.6583207Z 
2024-03-16T20:39:12.6584421Z 4-03-16 20:32:36.793707+00:00 [warning] <0.511.0>@ejabberd_pkix:check_domain_certfiles/1:312 No certificate found matching pubsub.8310292872-12.example.com
2024-03-16T20:39:12.6587024Z 2024-03-16 20:32:36.793821+00:00 [warning] <0.511.0>@ejabberd_pkix:check_domain_certfiles/1:312 No certificate found matching upload.8310292872-12.example.com
2024-03-16T20:39:12.6589450Z 2024-03-16 20:32:36.793968+00:00 [warning] <0.511.0>@ejabberd_pkix:check_domain_certfiles/1:312 No certificate found matching conference.8310292872-12.example.com
2024-03-16T20:39:12.6592258Z 2024-03-16 20:32:36.794082+00:00 [warning] <0.511.0>@ejabberd_pkix:check_domain_certfiles/1:312 No certificate found matching proxy.8310292872-12.example.com
2024-03-16T20:39:12.6595095Z 2024-03-16 20:32:36.795105+00:00 [info] <0.190.0>@ejabberd_app:start/2:63 ejabberd 24.2.0 is started in the node '[email protected]' in 1.31s
2024-03-16T20:39:12.6597822Z 2024-03-16 20:32:36.795411+00:00 [info] <0.672.0>@ejabberd_listener:init/4:163 Start accepting TCP connections at 10.244.3.7:7777 for mod_proxy65_stream
2024-03-16T20:39:12.6600095Z 2024-03-16 20:32:36.795486+00:00 [info] <0.505.0>@ejabberd_listener:init/4:163 Start accepting TCP connections at 127.0.0.1:5281 for ejabberd_http
2024-03-16T20:39:12.6602346Z 2024-03-16 20:32:36.795559+00:00 [info] <0.506.0>@ejabberd_listener:init/4:163 Start accepting TCP connections at [::]:5222 for ejabberd_c2s
2024-03-16T20:39:12.6604556Z 2024-03-16 20:32:36.795579+00:00 [info] <0.508.0>@ejabberd_listener:init/4:163 Start accepting TLS connections at [::]:5443 for ejabberd_http
2024-03-16T20:39:12.6607033Z 2024-03-16 20:32:36.795590+00:00 [info] <0.510.0>@ejabberd_listener:init/4:129 Start accepting UDP connections at [::]:3478 for ejabberd_stun
2024-03-16T20:39:12.6609256Z 2024-03-16 20:32:36.795567+00:00 [info] <0.507.0>@ejabberd_listener:init/4:163 Start accepting TCP connections at [::]:5280 for ejabberd_http
2024-03-16T20:39:12.6611427Z 2024-03-16 20:32:36.795583+00:00 [info] <0.509.0>@ejabberd_listener:init/4:163 Start accepting TCP connections at [::]:5269 for ejabberd_s2s_in
2024-03-16T20:39:12.6613508Z 2024-03-16 20:32:37.706391+00:00 [info] <0.505.0>@ejabberd_listener:accept/7:344 (<0.745.0>) Accepted connection 127.0.0.1:47566 -> 127.0.0.1:5281

Bug description

I started updating the ejabberd helm chart to newest 24.02. While doing that and updating the corresponding container image I found that there is a problem with MS SQL. Therefore, I tested the official ghcr.io/processone/ejabberd:24.02 as well to see if it depends on my custom image, but it does not (see error message above).

reason: {port_exit,memory_allocation_failed}

What solved the issue was reverting the container build from erlang 26.2 / elixir 16.1 to erlang 25.3 / elixir 14.4 . I haven't tested anything in between.

It most likely is unrelated to Alpine packages, because I also tried it with a custom Debian based variant with the same combination erlang 26.2 / elixir 16.1 .

@sando38
Copy link
Contributor Author

sando38 commented Mar 19, 2024

@badlop as discussed

@badlop badlop added Database:MSSQL Microsoft SQL server Kind:Bug labels Mar 20, 2024
@badlop
Copy link
Member

badlop commented Apr 5, 2024

Erlang versions affected

I've tested compilation of ejabberd master git from source code with different Erlang/OTP versions and the results are:

  • OTP 25.3: mssql connects correctly, i can register account and login successfully, and the MSSQL database shows it
  • OTP 26.0-rc1: same success
  • OTP 26.0-rc2: same success
  • OTP 26.0-rc3: Error ODBC: received unexpected info: {tcp_closed,#Port<0.75>}
  • OTP 26.0: same error
  • OTP 26.2: same error
  • OTP 26.2.3: same error
  • OTP 27.0-rc1: same error

Same problem with the ghcr.io/processone/ejabberd:latest container image.


Problematic commit

Using git bisect on erlang, I found that the problematic commit is erlang/otp@53ef5df

Using erlang OTP-27.0-rc1 and reverting that commit, ejabberd works perfectly with mssql.


Minimal reproduction steps

Right now the only way to reproduce the problem is using ejabberd... I found no way to provoke the problem in odbc without using ejabberd.


Hypothesis

Looking at
https://erlang.org/documentation/doc-15.0-rc2/lib/odbc-2.14.3/doc/html/getting_started.html#using-the-erlang-api

Trying to find some reproduction steps that wouldn't involve ejabberd:

Setup:

cat >"/tmp/odbcinst.ini" <<-'EOF'
[ODBC]
Driver = libtdsodbc.so
EOF

This works correctly and returns {updated,undefined}:

erl

os:putenv("ODBCSYSINI", "/tmp/").
odbc:start().
{ok, Ref} = odbc:connect("DRIVER=ODBC;SERVER=mssql;DATABASE=ejabberd_test;UID=ejabberd_test;PWD=ejabberd_Test1;PORT=1433;CLIENT_CHARSET=UTF-8;", []).

erlang:process_info(Ref).

odbc:sql_query(Ref, "CREATE TABLE EMPLOYEE (NR integer, FIRSTNAME  char varying(20), LASTNAME  char varying(20), GENDER char(1), PRIMARY KEY(NR))").

odbc:describe_table(Ref, "EMPLOYEE").

Interestingly, if we provoke some error in erlang, the odbc query fails with {error,connection_closed}:

erl

os:putenv("ODBCSYSINI", "/tmp/").
odbc:start().
{ok, Ref} = odbc:connect("DRIVER=ODBC;SERVER=mssql;DATABASE=ejabberd_test;UID=ejabberd_test;PWD=ejabberd_Test1;PORT=1433;CLIENT_CHARSET=UTF-8;", []).

erlang:process_info(Ref).
odbc:describe_table(Ref, "EMPLOYEE").

aa:bb().

erlang:process_info(Ref).
odbc:describe_table(Ref, "EMPLOYEE").

One wild hypothesis: maybe, when using erlang 26.0-rc2 or higher, ejabberd provokes some error that closes the odbc connection. That error is not provoked with older erlang versions. And the error itself is being silenced, so we are not aware of it.

@petertirrell
Copy link

I think I am experiencing this exact same issue, on ejabberd 24.02. We're using the OTP that came with that. Is the current solution to try and load and use a specific working version of OTP?

@badlop
Copy link
Member

badlop commented May 13, 2024

I think I am experiencing this exact same issue, on ejabberd 24.02. We're using the OTP that came with that.

Aha, the binary installers and containers for ejabberd 24.02 include Erlang/OTP 26.2, and are affected by this problem.

Is the current solution to try and load and use a specific working version of OTP?

Yes, either using Erlang 26.0-rc2 or older...

Or compile Erlang/OTP from source with a small patch that revers the problematic commit.

I've applied that solution in branch 24.02-otp-patch-mssql. This is just ejabberd 24.02 with a small patch for Erlang 26.2 that reverts the problematic commit.

There are available binary installers and containers. I've tried the amd64 container and seems to work: at least it starts, connects to MSSQL and I can register an account.

@badlop
Copy link
Member

badlop commented May 28, 2024

As a temporary workaround, the ejabberd binary installers now include a patch that reverts the problematic commit in Erlang/OTP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Database:MSSQL Microsoft SQL server Kind:Bug
Projects
None yet
Development

No branches or pull requests

3 participants