Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

after upgraded ioredis from 4.10.0 to 4.11.1 Error: connect ETIMEDOUT #918

Closed
p3x-robot opened this issue Jul 2, 2019 · 46 comments · Fixed by #925
Closed

after upgraded ioredis from 4.10.0 to 4.11.1 Error: connect ETIMEDOUT #918

p3x-robot opened this issue Jul 2, 2019 · 46 comments · Fixed by #925

Comments

@p3x-robot
Copy link

error:

Jul 02 21:23:28 server boot[8665]: [2019/07/02 21:23:28.994] [MASTER] [PID: 008665]  [ERROR] [CORE] [SERVICE] [REDIS] error
Jul 02 21:23:28 server boot[8665]: [2019/07/02 21:23:28.996] [MASTER] [PID: 008665]  [ERROR] 7/2/2019, 9:23:28 PM unhandledRejection Error: connect ETIMEDOUT
Jul 02 21:23:28 server boot[8665]:     at Socket.<anonymous> (/var/p3x/server.patrikx3.com/node_modules/ioredis/built/redis/index.js:270:31)
Jul 02 21:23:28 server boot[8665]:     at Object.onceWrapper (events.js:288:20)
Jul 02 21:23:28 server boot[8665]:     at Socket.emit (events.js:200:13)
Jul 02 21:23:28 server boot[8665]:     at Socket._onTimeout (net.js:434:8)
Jul 02 21:23:28 server boot[8665]:     at listOnTimeout (internal/timers.js:531:17)
Jul 02 21:23:28 server boot[8665]:     at processTimers (internal/timers.js:475:7) {
Jul 02 21:23:28 server boot[8665]:   errorno: 'ETIMEDOUT',
Jul 02 21:23:28 server boot[8665]:   code: 'ETIMEDOUT',
Jul 02 21:23:28 server boot[8665]:   syscall: 'connect'
Jul 02 21:23:28 server boot[8665]: } Promise {
Jul 02 21:23:28 server boot[8665]:   <rejected> Error: connect ETIMEDOUT
Jul 02 21:23:28 server boot[8665]:       at Socket.<anonymous> (/var/p3x/server.patrikx3.com/node_modules/ioredis/built/redis/index.js:270:31)
Jul 02 21:23:28 server boot[8665]:       at Object.onceWrapper (events.js:288:20)
Jul 02 21:23:28 server boot[8665]:       at Socket.emit (events.js:200:13)
Jul 02 21:23:28 server boot[8665]:       at Socket._onTimeout (net.js:434:8)
Jul 02 21:23:28 server boot[8665]:       at listOnTimeout (internal/timers.js:531:17)
Jul 02 21:23:28 server boot[8665]:       at processTimers (internal/timers.js:475:7) {
Jul 02 21:23:28 server boot[8665]:     errorno: 'ETIMEDOUT',
Jul 02 21:23:28 server boot[8665]:     code: 'ETIMEDOUT',
Jul 02 21:23:28 server boot[8665]:     syscall: 'connect'
Jul 02 21:23:28 server boot[8665]:   }
Jul 02 21:23:28 server boot[8665]: }

With 4.10.0 it works.

How I connect:

const settings = {
   "url2": "redis://:[email protected]:6379/0",
   "url": "redis+unix://:password@/var/run/redis/redis-server.sock?db=0"
}
return new Redis(settings.url);

with 4.10.0 i can connect either url, with 4.11.0 i get timeout.

@p3x-robot
Copy link
Author

p3x-robot commented Jul 2, 2019

is there any debug info to see what is going on on the behind?

because on my laptop it can connect, but with on my server it is giving this error, but when i revert to 4.10.0, it works. so werid!

@luin
Copy link
Collaborator

luin commented Jul 2, 2019

Debug info can be enabled by setting the env variable DEBUG to ioredis:*:

DEBUG=ioredis:* node yourapp.js

@p3x-robot
Copy link
Author

thanks.

I connect with 4.11.1, but I cannot connect, no more debug info, so bad:

[2019/07/02 21:38:33.399] [MASTER] [PID: 012619]  [INFO] [CORE] [SERVICE] [REDIS] started
[2019/07/02 21:38:33.400] [MASTER] [PID: 012619]  [INFO] [LOADER] [SERVICE] [CORE] [REDIS] alias: corifeus.core.redis  
  ioredis:redis status[/var/run/redis/redis-server.sock]: [empty] -> connecting +0ms
  ioredis:connection error: {"stack":"Error: connect ETIMEDOUT\n    at Socket.<anonymous> (/var/p3x/server.patrikx3.com/node_modules/ioredis/built/redis/index.js:270:31)\n    at Object.onceWrapper (events.js:288:20)\n    at Sock ... <REDACTED full-length="459"> +0ms
[2019/07/02 21:38:43.435] [MASTER] [PID: 012619]  [ERROR] [CORE] [SERVICE] [REDIS] error
[2019/07/02 21:38:43.440] [MASTER] [PID: 012619]  [ERROR] 7/2/2019, 9:38:43 PM unhandledRejection Error: connect ETIMEDOUT
    at Socket.<anonymous> (/var/p3x/server.patrikx3.com/node_modules/ioredis/built/redis/index.js:270:31)
    at Object.onceWrapper (events.js:288:20)
    at Socket.emit (events.js:200:13)
    at Socket._onTimeout (net.js:434:8)
    at listOnTimeout (internal/timers.js:531:17)
    at processTimers (internal/timers.js:475:7) {
  errorno: 'ETIMEDOUT',
  code: 'ETIMEDOUT',
  syscall: 'connect'
} Promise {
  <rejected> Error: connect ETIMEDOUT
      at Socket.<anonymous> (/var/p3x/server.patrikx3.com/node_modules/ioredis/built/redis/index.js:270:31)
      at Object.onceWrapper (events.js:288:20)
      at Socket.emit (events.js:200:13)
      at Socket._onTimeout (net.js:434:8)
      at listOnTimeout (internal/timers.js:531:17)
      at processTimers (internal/timers.js:475:7) {
    errorno: 'ETIMEDOUT',
    code: 'ETIMEDOUT',
    syscall: 'connect'
  }
}
  ioredis:redis status[/var/run/redis/redis-server.sock]: connecting -> close +10s
  ioredis:connection reconnect in 50ms +43ms
  ioredis:redis status[/var/run/redis/redis-server.sock]: close -> reconnecting +0ms
  ioredis:redis status[/var/run/redis/redis-server.sock]: reconnecting -> connecting +53ms
[2019/07/02 21:38:43.610] [MASTER] [PID: 012619]  [LOG] 250 2.0.0 Ok: queued as 7C5783413CC
[2019/07/02 21:38:43.612] [MASTER] [PID: 012619]  [LOG] 250 2.0.0 Ok: queued as 7BFEF340C4F
  ioredis:connection error: {"stack":"Error: connect ETIMEDOUT\n    at Socket.<anonymous> (/var/p3x/server.patrikx3.com/node_modules/ioredis/built/redis/index.js:270:31)\n    at Object.onceWrapper (events.js:288:20)\n    at Sock ... <REDACTED full-length="459"> +10s
[2019/07/02 21:38:53.517] [MASTER] [PID: 012619]  [ERROR] [CORE] [SERVICE] [REDIS] error
  ioredis:redis status[/var/run/redis/redis-server.sock]: connecting -> close +10s
  ioredis:connection reconnect in 100ms +5ms
  ioredis:redis status[/var/run/redis/redis-server.sock]: close -> reconnecting +0ms
  ioredis:redis status[/var/run/redis/redis-server.sock]: reconnecting -> connecting +101ms
[2019/07/02 21:38:53.670] [MASTER] [PID: 012619]  [LOG] 250 2.0.0 Ok: queued as 8B2A8340C4F
  ioredis:connection error: {"stack":"Error: connect ETIMEDOUT\n    at Socket.<anonymous> (/var/p3x/server.patrikx3.com/node_modules/ioredis/built/redis/index.js:270:31)\n    at Object.onceWrapper (events.js:288:20)\n    at Sock ... <REDACTED full-length="459"> +10s
[2019/07/02 21:39:03.634] [MASTER] [PID: 012619]  [ERROR] [CORE] [SERVICE] [REDIS] error
  ioredis:redis status[/var/run/redis/redis-server.sock]: connecting -> close +10s
  ioredis:connection reconnect in 150ms +4ms
  ioredis:redis status[/var/run/redis/redis-server.sock]: close -> reconnecting +1ms

with using ip and port, same error:

[2019/07/02 21:40:20.741] [MASTER] [PID: 012984]  [INFO] [LOADER] [SERVICE] [CORE] [REDIS] alias: corifeus.core.redis  
  ioredis:redis status[127.0.0.1:6379]: [empty] -> connecting +0ms
  ioredis:connection error: {"stack":"Error: connect ETIMEDOUT\n    at Socket.<anonymous> (/var/p3x/server.patrikx3.com/node_modules/ioredis/built/redis/index.js:270:31)\n    at Object.onceWrapper (events.js:288:20)\n    at Sock ... <REDACTED full-length="459"> +0ms
[2019/07/02 21:40:30.757] [MASTER] [PID: 012984]  [ERROR] [CORE] [SERVICE] [REDIS] error
[2019/07/02 21:40:30.758] [MASTER] [PID: 012984]  [ERROR] 7/2/2019, 9:40:30 PM unhandledRejection Error: connect ETIMEDOUT
    at Socket.<anonymous> (/var/p3x/server.patrikx3.com/node_modules/ioredis/built/redis/index.js:270:31)
    at Object.onceWrapper (events.js:288:20)
    at Socket.emit (events.js:200:13)
    at Socket._onTimeout (net.js:434:8)
    at listOnTimeout (internal/timers.js:531:17)
    at processTimers (internal/timers.js:475:7) {
  errorno: 'ETIMEDOUT',
  code: 'ETIMEDOUT',
  syscall: 'connect'
} Promise {
  <rejected> Error: connect ETIMEDOUT
      at Socket.<anonymous> (/var/p3x/server.patrikx3.com/node_modules/ioredis/built/redis/index.js:270:31)
      at Object.onceWrapper (events.js:288:20)
      at Socket.emit (events.js:200:13)
      at Socket._onTimeout (net.js:434:8)
      at listOnTimeout (internal/timers.js:531:17)
      at processTimers (internal/timers.js:475:7) {
    errorno: 'ETIMEDOUT',
    code: 'ETIMEDOUT',
    syscall: 'connect'
  }
}
  ioredis:redis status[127.0.0.1:6379]: connecting -> close +10s
  ioredis:connection reconnect in 50ms +14ms
  ioredis:redis status[127.0.0.1:6379]: close -> reconnecting +1ms
  ioredis:redis status[127.0.0.1:6379]: reconnecting -> connecting +50ms

When I revert again to 4.10.0, it works, how come?

@p3x-robot
Copy link
Author

on 4.10.0, my logs shows it is working:

Jul 02 21:42:03 server boot[13616]: [2019/07/02 21:42:03.056] [MASTER] [PID: 013616]  [INFO] [CORE] [SERVICE] [REDIS] started
Jul 02 21:42:03 server boot[13616]: [2019/07/02 21:42:03.057] [MASTER] [PID: 013616]  [INFO] [LOADER] [SERVICE] [CORE] [REDIS] alias: corifeus.core.redis
Jul 02 21:42:03 server boot[13616]: [2019/07/02 21:42:03.064] [MASTER] [PID: 013616]  [INFO] [CORE] [SERVICE] [REDIS] ready
Jul 02 21:42:03 server boot[13616]: [2019/07/02 21:42:03.064] [MASTER] [PID: 013616]  [INFO]
Jul 02 21:42:03 server boot[13616]: [2019/07/02 21:42:03.065] [MASTER] [PID: 013616]  [INFO] [LOADER] [SERVICE] [CORE] [REDIS] settings: corifeus.core.service.redis.settings

what did i do wrong?

@luin
Copy link
Collaborator

luin commented Jul 2, 2019

Can you connect via IP & port successfully? What the debug logs with 4.10.0?

@p3x-robot
Copy link
Author

p3x-robot commented Jul 2, 2019

of course i can connect (ip and porot) even via my laptop. that is what is super weird:
the output:

2019/07/02 21:47:31.723] [WORKER 002] [PID: 018259]  [INFO] 
[2019/07/02 21:47:31.723] [WORKER 002] [PID: 018259]  [INFO] [LOADER] [SERVICE] [CORE] loading: redis
  ioredis:redis write command[/var/run/redis/redis-server.sock]: 0 -> auth([ 'password' ]) +1ms
[2019/07/02 21:47:31.723] [WORKER 002] [PID: 018259]  [INFO] [LOADER] [SERVICE] [CORE] [REDIS] prefix: [CORE] [SERVICE] [REDIS] 
[2019/07/02 21:47:31.723] [WORKER 002] [PID: 018259]  [INFO] [CORE] [SERVICE] [REDIS] started
  ioredis:redis write command[/var/run/redis/redis-server.sock]: 0 -> info([]) +2ms
[2019/07/02 21:47:31.725] [WORKER 002] [PID: 018259]  [INFO] [LOADER] [SERVICE] [CORE] [REDIS] alias: corifeus.core.redis  
  ioredis:redis status[/var/run/redis/redis-server.sock]: connect -> ready +2ms
  ioredis:redis status[/var/run/redis/redis-server.sock]: [empty] -> connecting +0ms
[2019/07/02 21:47:31.726] [WORKER 003] [PID: 018265]  [INFO] [CORE] [SERVICE] [REDIS] ready
[2019/07/02 21:47:31.728] [WORKER 003] [PID: 018265]  [INFO] 

@p3x-robot
Copy link
Author

all same nodejs 12.5.0 both laptop and server. why does it not give more debug info. it just timeouts, but only with 4.11.1. no one will be able to help with this weird error. this looks right and everything is perfect.
just timeout on server on ioredis 4.11.1, btuhuuhuhu

@luin
Copy link
Collaborator

luin commented Jul 2, 2019

Yes, that's really strange. The related part is https://github.com/luin/ioredis/blob/v4.11.1/lib/redis/index.ts#L278-L307. We invoke Stream#setTimeout() with 10s (which is the default settings). The callback is triggered for unknown reasons.

@p3x-robot
Copy link
Author

p3x-robot commented Jul 2, 2019

how can i debug the actual connecting is not working?
i can telnet:

root@server:/var/p3x/server.patrikx3.com# telnet localhost 6379
Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
exit
-ERR unknown command `exit`, with args beginning with: 
quit
+OK
Connection closed by foreign host.

@p3x-robot
Copy link
Author

where can i see the actual connecting is happening, becuase there is something that is not working.

@p3x-robot
Copy link
Author

it is working up to 4.10.4, there is something that changed since 4.11.0, that is what is start stop working.

@p3x-robot
Copy link
Author

can i compare 4.10.4 vs 4.11.0 connecting? something is wrong

@p3x-robot
Copy link
Author

i can see it is missing a steram.data:

        stream.on('data', eventHandler.dataHandler(_this)); - missing in 4.11

image
image

@luin
Copy link
Collaborator

luin commented Jul 3, 2019

i can see it is missing a steram.data:

Listeners for "data" are set up on the "connect" event in v4.11.0. That may not be the reason for that, though, could you please add a stream.on('data', () => {}) in the corresponding position of v4.11.0 to see whether it works again?

with 4.10.0 i can connect either url, with 4.11.0 i get timeout.

I'd like to confirm that in 4.11.0, did "redis://:[email protected]:6379/0" and "redis+unix://:password@/var/run/redis/redis-server.sock?db=0" both emit timeouts, or only one emited and the other connected successfully?

new Redis(settings.url);

Could you log the Redis#options to see if the URL was parsed correctly? Like:

const redis = new Redis(settings.url)
console.log(redis.options)

@p3x-robot
Copy link
Author

the only difference is, that on ubuntu it is working in latest ioredis version, but the server is on debian buster.

@p3x-robot
Copy link
Author

p3x-robot commented Jul 3, 2019

See en the log a werid log <REDACTED full-length="506">:

[2019/07/03 04:51:48.171] [MASTER] [PID: 010424]  [INFO] [CORE] [SERVICE] [REDIS] construct redis redis://:[email protected]:6379/0
  ioredis:redis status[127.0.0.1:6379]: [empty] -> connecting +0ms
[2019/07/03 04:51:48.173] [MASTER] [PID: 010424]  [LOG] {
  password: 'password',
  db: 0,
  host: '127.0.0.1',
  port: 6379,
  family: 4,
  connectTimeout: 10000,
  retryStrategy: [Function: retryStrategy],
  keepAlive: 0,
  noDelay: true,
  connectionName: null,
  sentinels: null,
  name: null,
  role: 'master',
  sentinelRetryStrategy: [Function: sentinelRetryStrategy],
  natMap: null,
  enableTLSForSentinelMode: false,
  updateSentinels: true,
  dropBufferSupport: false,
  enableOfflineQueue: true,
  enableReadyCheck: true,
  autoResubscribe: true,
  autoResendUnfulfilledCommands: true,
  lazyConnect: false,
  keyPrefix: '',
  reconnectOnError: null,
  readOnly: false,
  stringNumbers: false,
  maxRetriesPerRequest: 20,
  maxLoadingRetryTime: 10000,
  showFriendlyErrorStack: false
}
  ioredis:connection error: {"stack":"Error: connect ETIMEDOUT\n    at Socket.<anonymous> (/media/var/patrikx3/projects/patrikx3/corifeus/corifeus-app-server-patrikx3/node_modules/ioredis/built/redis/index.js:270:31)\n    at Obj ... <REDACTED full-length="506"> +0ms
[2019/07/03 04:51:58.187] [MASTER] [PID: 010424]  [ERROR] [CORE] [SERVICE] [REDIS] error
[2019/07/03 04:51:58.188] [MASTER] [PID: 010424]  [ERROR] 7/3/2019, 4:51:58 AM unhandledRejection Error: connect ETIMEDOUT
    at Socket.<anonymous> (/media/var/patrikx3/projects/patrikx3/corifeus/corifeus-app-server-patrikx3/node_modules/ioredis/built/redis/index.js:270:31)
    at Object.onceWrapper (events.js:288:20)
    at Socket.emit (events.js:200:13)
    at Socket._onTimeout (net.js:434:8)
    at listOnTimeout (internal/timers.js:531:17)
    at processTimers (internal/timers.js:475:7) {
  errorno: 'ETIMEDOUT',
  code: 'ETIMEDOUT',
  syscall: 'connect'
}

@p3x-robot
Copy link
Author

of course on debian 4.10.4, it works perfectly.

@p3x-robot
Copy link
Author

i added stream.on('data', () => {}), no change.

@p3x-robot
Copy link
Author

patrikx3@server:~/projects/patrikx3/corifeus/corifeus-app-server-patrikx3$ telnet localhost 6379
Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
quit
+OK
Connection closed by foreign host.
patrikx3@server:~/projects/patrikx3/corifeus/corifeus-app-server-patrikx3$ 

@p3x-robot
Copy link
Author

oh yes, both ways is timeout (socket vs port),

only other error was when, i ran without permission on the boot program and was giving permission error, but i run on server with sudo and the permissions is satisfied.

@p3x-robot
Copy link
Author

of course on systemd, i use a differen user, that is in the redis group so there is no socket permission problem, but as now i am running it with in cli. i have to run on 4.10.4 on systemd, otherwise it is not working. so weird. i added DEBUG=*, but there is no other debug info either, where the connection is wrong. i see 4.11.x has lots of new changes, so that is what is wrong on debian.

@p3x-robot
Copy link
Author

Weird error: <REDACTED full-length="506">, see at the end of the log!

 ioredis:connection error: {"stack":"Error: connect ETIMEDOUT\n    at Socket.<anonymous> (/media/var/patrikx3/projects/patrikx3/corifeus/corifeus-app-server-patrikx3/node_modules/ioredis/built/redis/index.js:273:31)\n    at Obj ... <REDACTED full-length="506"> +10s

@luin
Copy link
Collaborator

luin commented Jul 3, 2019

<REDACTED full-length="506"> means the log was too large (506 bytes) so it was redacted before logged. It's expected.

Well, this error is pretty hard to debug without a reproducible example. I'll set up a Debian buster on my side to see whether I can reproduce the issue. Let me know for further discoveries if you got time to debug on this issue 😄 .

@p3x-robot
Copy link
Author

there is a docker error, with redacted error, but on docker and ubuntu:
docker/for-linux#535

@p3x-robot
Copy link
Author

very rarely, i could connect, but when i control + c, and run again, same timeout error.
what is weird is that i have other program and that is 4.11.1 and it works without issues. this is super hard to understand why it is not working.

@p3x-robot
Copy link
Author

ok i found the error, when i set the host to localhost, instead of 127.0.0.1, it works right away. so weird and with socket is again timeout. so i keep the localhost and via port.

@luin
Copy link
Collaborator

luin commented Jul 3, 2019

ok i found the error, when i set the host to localhost, instead of 127.0.0.1, it works right away. so weird and with socket is again timeout. so i keep the localhost and via port.

Did 127.0.0.1 work with 4.10.4 in the same program?

@p3x-robot
Copy link
Author

ok, that is the issue, not working with socket anymore (v4.10.0) no issues, and if i use an ip address, giving timeout, if i use localhost, then it works fully!
i fixed it, but ugly hack and not using socket, which was supposed to be faster.

@p3x-robot
Copy link
Author

in 4.10.4 it was working with even socket and 127.0.0.1, with 4.11.1, i could only connect with host and localhost.

@p3x-robot
Copy link
Author

this error is super hard to find the solution, as i have another program and i connect via 192.168.23.54 and it works!

@luin
Copy link
Collaborator

luin commented Jul 3, 2019

Are Node.js versions same when switching between 4.10.4 & 4.11.1?

@p3x-robot
Copy link
Author

both nodejs 12.5.0, same versions yes.

@p3x-robot
Copy link
Author

patrikx3@bitang:~$ node -v
v12.5.0
patrikx3@bitang:~$ uname -a
Linux bitang 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
patrikx3@bitang:~$ 
patrikx3@server:~/projects/patrikx3$ node -v
v12.5.0
patrikx3@server:~/projects/patrikx3$ uname -a
Linux server 4.19.0-5-amd64 #1 SMP Debian 4.19.37-5 (2019-06-19) x86_64 GNU/Linux
patrikx3@server:~/projects/patrikx3$ 

i think it is a warning people for same error, but the code in ioredis is perfect, it is supposed to work and other program it also working any ip address. it is not even worth it trying to fix it as it is correct, just it is a good not for other people that are getting timeouts without any solutions and at least i found that with localhost host it started working.

@jeremytm
Copy link

jeremytm commented Jul 9, 2019

I'm having this issue as well, in AWS using lambda to run code, with redis hosted in a single-node cluster in elasticache. I can confirm it breaks between ioredis versions 4.10.4 and 4.11.0.

I will try to get a log with debugging turned on shortly.

@jeremytm
Copy link

jeremytm commented Jul 9, 2019

Notes:

  • doesn't happen locally when connecting via NAT, ever
  • only happens sometimes in AWS lambda when connecting direct to cluster
  • doesn't happen on subsequent runs in lambda (once lambda function is warmed up), it happens on first lambda invoke only
  • we normally open two connections to different clusters, and this issue happens much more frequently when we do so, however, for simplicity I have confirmed the issue with just one connection for the log below.

Here's a log with debugging:

START RequestId: 30fc396c-0ce0-4c5b-8ecd-7fd2249ab08f Version: $LATEST
hosts [{"host":"rugby-cache-0001-001.e5di5i.0001.use1.cache.amazonaws.com","port":6379}]
options {"enableReadyCheck":false,"slotsRefreshInterval":60000,"slotsRefreshTimeout":11000}
2019-07-09T02:02:49.036Z ioredis:cluster status: [empty] -> connecting
2019-07-09T02:02:49.056Z ioredis:cluster resolved hostname rugby-cache-0001-001.e5di5i.0001.use1.cache.amazonaws.com to IP 172.30.100.206
2019-07-09T02:02:49.057Z ioredis:cluster:connectionPool Reset with [ { host: '172.30.100.206', port: 6379 } ]
2019-07-09T02:02:49.058Z ioredis:cluster:connectionPool Connecting to 172.30.100.206:6379 as master
2019-07-09T02:02:49.059Z ioredis:redis status[172.30.100.206:6379]: [empty] -> wait
2019-07-09T02:02:49.059Z ioredis:cluster getting slot cache from 172.30.100.206:6379
2019-07-09T02:02:49.060Z ioredis:redis status[172.30.100.206:6379 (ioredisClusterRefresher)]: [empty] -> wait
2019-07-09T02:02:49.062Z ioredis:redis status[172.30.100.206:6379 (ioredisClusterRefresher)]: wait -> connecting
2019-07-09T02:02:49.062Z ioredis:redis queue command[172.30.100.206:6379 (ioredisClusterRefresher)]: 0 -> cluster([ 'slots' ])
2019-07-09T02:02:49.063Z ioredis:cluster:subscriber selected a subscriber 172.30.100.206:6379
2019-07-09T02:02:49.063Z ioredis:redis status[172.30.100.206:6379 (ioredisClusterSubscriber)]: [empty] -> wait
2019-07-09T02:02:49.063Z ioredis:cluster:subscriber started
2019-07-09T02:02:59.080Z ioredis:connection error: Error: connect ETIMEDOUT
    at Timer.processTimers (timers.js:223:10)1:20)ioredis/built/redis/index.js:269:31)is] Unhandled error event: Error: connect ETIMEDOUT
2019-07-09T02:02:59.081Z ioredis:redis status[172.30.100.206:6379 (ioredisClusterRefresher)]: connecting -> close
2019-07-09T02:02:59.082Z ioredis:connection skip reconnecting because `retryStrategy` is not a function
2019-07-09T02:02:59.082Z ioredis:redis status[172.30.100.206:6379 (ioredisClusterRefresher)]: close -> end
    at TCP._handle.close (net.js:597:12)js:441:20)ioredis/built/redis/event_handler.js:80:20)
       at TCP._handle.close (net.js:597:12) }:441:20)ioredis/built/redis/event_handler.js:80:20)/async.js:17:14)4)
2019-07-09T02:02:59.085Z ioredis:cluster:connectionPool Reset with []
2019-07-09T02:02:59.085Z ioredis:cluster:connectionPool Disconnect 172.30.100.206:6379 because the node does not hold any slot
2019-07-09T02:02:59.085Z ioredis:redis status[172.30.100.206:6379]: wait -> close
2019-07-09T02:02:59.085Z ioredis:connection skip reconnecting since the connection is manually closed.
2019-07-09T02:02:59.085Z ioredis:redis status[172.30.100.206:6379]: close -> end
2019-07-09T02:02:59.085Z ioredis:cluster:subscriber subscriber has left, selecting a new one...
2019-07-09T02:02:59.098Z ioredis:redis status[172.30.100.206:6379 (ioredisClusterSubscriber)]: wait -> close
2019-07-09T02:02:59.098Z ioredis:connection skip reconnecting since the connection is manually closed.
2019-07-09T02:02:59.098Z ioredis:redis status[172.30.100.206:6379 (ioredisClusterSubscriber)]: close -> end
2019-07-09T02:02:59.098Z ioredis:cluster:subscriber selecting subscriber failed since there is no node discovered in the cluster yet
2019-07-09T02:02:59.099Z ioredis:cluster status: connecting -> close
    port: 6379 } ]gy called 1 [ { host: 'rugby-cache-0001-001.e5di5i.0001.use1.cache.amazonaws.com',
2019-07-09T02:02:59.100Z ioredis:cluster status: close -> reconnecting
redis reconnecting
2019-07-09T02:02:59.100Z ioredis:cluster connecting failed: Error: None of startup nodes is available
2019-07-09T02:02:59.150Z ioredis:cluster Cluster is disconnected. Retrying after 50ms
2019-07-09T02:02:59.151Z ioredis:cluster status: reconnecting -> connecting
2019-07-09T02:02:59.151Z ioredis:cluster resolved hostname rugby-cache-0001-001.e5di5i.0001.use1.cache.amazonaws.com to IP 172.30.100.206
2019-07-09T02:02:59.151Z ioredis:cluster:connectionPool Reset with [ { host: '172.30.100.206', port: 6379 } ]
2019-07-09T02:02:59.152Z ioredis:cluster:connectionPool Connecting to 172.30.100.206:6379 as master
2019-07-09T02:02:59.152Z ioredis:redis status[172.30.100.206:6379]: [empty] -> wait
2019-07-09T02:02:59.152Z ioredis:cluster:subscriber a new node is discovered and there is no subscriber, selecting a new one...
2019-07-09T02:02:59.152Z ioredis:cluster:subscriber selected a subscriber 172.30.100.206:6379
2019-07-09T02:02:59.152Z ioredis:redis status[172.30.100.206:6379 (ioredisClusterSubscriber)]: [empty] -> wait
2019-07-09T02:02:59.153Z ioredis:cluster getting slot cache from 172.30.100.206:6379
2019-07-09T02:02:59.153Z ioredis:redis status[172.30.100.206:6379 (ioredisClusterRefresher)]: [empty] -> wait
2019-07-09T02:02:59.153Z ioredis:redis status[172.30.100.206:6379 (ioredisClusterRefresher)]: wait -> connecting
2019-07-09T02:02:59.153Z ioredis:redis queue command[172.30.100.206:6379 (ioredisClusterRefresher)]: 0 -> cluster([ 'slots' ])
2019-07-09T02:02:59.153Z ioredis:redis status[172.30.100.206:6379 (ioredisClusterSubscriber)]: wait -> close
2019-07-09T02:02:59.153Z ioredis:connection skip reconnecting since the connection is manually closed.
2019-07-09T02:02:59.154Z ioredis:redis status[172.30.100.206:6379 (ioredisClusterSubscriber)]: close -> end
2019-07-09T02:02:59.154Z ioredis:cluster:subscriber selected a subscriber 172.30.100.206:6379
2019-07-09T02:02:59.154Z ioredis:redis status[172.30.100.206:6379 (ioredisClusterSubscriber)]: [empty] -> wait
2019-07-09T02:02:59.154Z ioredis:cluster:subscriber started
2019-07-09T02:02:59.156Z ioredis:redis status[172.30.100.206:6379 (ioredisClusterRefresher)]: connecting -> connect
2019-07-09T02:02:59.156Z ioredis:redis status[172.30.100.206:6379 (ioredisClusterRefresher)]: connect -> ready
2019-07-09T02:02:59.156Z ioredis:connection set the connection name [ioredisClusterRefresher]
2019-07-09T02:02:59.157Z ioredis:redis write command[172.30.100.206:6379 (ioredisClusterRefresher)]: 0 -> client([ 'setname', 'ioredisClusterRefresher' ])
2019-07-09T02:02:59.158Z ioredis:connection send 1 commands in offline queue
2019-07-09T02:02:59.158Z ioredis:redis write command[172.30.100.206:6379 (ioredisClusterRefresher)]: 0 -> cluster([ 'slots' ])
2019-07-09T02:02:59.161Z ioredis:cluster cluster slots result count: 1
2019-07-09T02:02:59.161Z ioredis:cluster cluster slots result [0]: slots 0~16383 served by 172.30.100.206:6379,172.30.100.206:6379
2019-07-09T02:02:59.178Z ioredis:cluster:connectionPool Reset with [ { host: '172.30.100.206', port: 6379, readOnly: false },
{ host: '172.30.100.206', port: 6379, readOnly: true } ]
2019-07-09T02:02:59.178Z ioredis:cluster:connectionPool Change role of 172.30.100.206:6379 to slave
2019-07-09T02:02:59.178Z ioredis:redis status[172.30.100.206:6379]: wait -> connecting
2019-07-09T02:02:59.178Z ioredis:redis queue command[172.30.100.206:6379]: 0 -> readonly([])
2019-07-09T02:02:59.178Z ioredis:cluster status: connecting -> connect
2019-07-09T02:02:59.178Z ioredis:cluster status: connect -> ready
redis connected
2019-07-09T02:02:59.181Z ioredis:cluster status: ready -> disconnecting
2019-07-09T02:02:59.181Z ioredis:redis status[172.30.100.206:6379 (ioredisClusterSubscriber)]: wait -> close
2019-07-09T02:02:59.182Z ioredis:connection skip reconnecting since the connection is manually closed.
2019-07-09T02:02:59.182Z ioredis:redis status[172.30.100.206:6379 (ioredisClusterSubscriber)]: close -> end
2019-07-09T02:02:59.182Z ioredis:cluster:subscriber stopped
2019-07-09T02:02:59.196Z ioredis:redis queue command[172.30.100.206:6379]: 0 -> quit([])
Completed
2019-07-09T02:02:59.197Z ioredis:redis status[172.30.100.206:6379 (ioredisClusterRefresher)]: ready -> close
2019-07-09T02:02:59.197Z ioredis:connection skip reconnecting since the connection is manually closed.
2019-07-09T02:02:59.197Z ioredis:redis status[172.30.100.206:6379 (ioredisClusterRefresher)]: close -> end
END RequestId: 30fc396c-0ce0-4c5b-8ecd-7fd2249ab08f
REPORT RequestId: 30fc396c-0ce0-4c5b-8ecd-7fd2249ab08f	Duration: 10281.17 ms	Billed Duration: 10300 ms 	Memory Size: 512 MB	Max Memory Used: 87 MB

@luin
Copy link
Collaborator

luin commented Jul 11, 2019

@jeremytm Sorry for the late response. It's a little hard to debug without a reproducible example. I tried to test on my side but failed. What the Node.js version did your lambda used?

@jeremytm
Copy link

I will make a new test function, strip our project back just enough so it's still reproducible and I will provide sample project, and probably a video of how to get it to break. Should be today or tomorrow.

@AVVS
Copy link
Collaborator

AVVS commented Jul 11, 2019

This actually breaks my tests in the incomplete cluster as well, cant connect indefinitely when upgrading from 4.10.4 to 4.11.x

Generally I spin https://hub.docker.com/r/makeomatic/redis-cluster and then try to connect to it using lazyConnect: true & manual connect

Unless cluster is stabilized & running (ie enough time had passed for it to start) ioredis will hang indefinitely

@jeremytm
Copy link

jeremytm commented Jul 11, 2019

@AVVS try using enableReadyCheck: false in your options. There is a bug (which I haven't reported yet) where if this is true, the connection will continually and reliably fail to become ready with cluster. (Disclaimer: Only tested with NAT)

@silverwind
Copy link
Contributor

silverwind commented Jul 12, 2019

I also noticed ioredis-related script hangs showing up with 4.11.1 with a single local redis server, no cluster or replication on Node.js 12.5.0. It only very rarely shows in my tests, like one in 50 calls of a rather complex integration testing script. Can not reproduce via trivial scripts.

Edit: output of DEBUG=ioredis:* node test.js when it hangs:

$ DEBUG=ioredis:* node test.js
  ioredis:redis status[localhost:6379]: [empty] -> connecting +0ms
  ioredis:redis queue command[localhost:6379]: 0 -> ping([]) +2ms
  ioredis:redis status[127.0.0.1:6379]: connecting -> connect +48ms
  ioredis:redis write command[127.0.0.1:6379]: 0 -> info([]) +1ms
  ioredis:redis status[127.0.0.1:6379]: connect -> ready +71ms
  ioredis:connection send 1 commands in offline queue +0ms
  ioredis:redis write command[127.0.0.1:6379]: 0 -> ping([]) +1ms

After the ping, I send a hgetall which never shows up in the log on a failed run.

@jeremytm
Copy link

In my testing today I have whittled it down to being related to the Bluebird promise library, or at least related somehow to promises. If I remove Bluebird and use native promises, ioredis 4.11.x runs fine over 300 connection attempts, every time. With Bluebird it fails easily on one of the connection attempts, and we get ETIMEDOUT.

ioredis 4.10.4 works fine with Bluebird promises. So something in ioredis 4.11.x doesn't play nice with Bluebird for sure.

I have to wrap up work for this week, but I will finalise the sample code and post it up on Monday.

@p3x-robot
Copy link
Author

@jeremytm thanks so much for you help, because i could not fix it with properly. besides i want to use sockets vs using ports as socket is much faster. but what was weird was that i had to use localhost vs 127.0.0.1 started working, but it was such as ugly fix.

@luin
Copy link
Collaborator

luin commented Jul 13, 2019

@jeremytm Thanks for the details and narrowing down the causes to the problem to Bluebird! I have successfully reproduced the issue just now according to your investigation. It turns out that there's a race condition when making connections in 4.11.x, which will happen easier with Bluebird & Unix socks.

I'll release a new version to fix this issue today.

@luin luin added bug and removed help wanted labels Jul 13, 2019
luin added a commit that referenced this issue Jul 13, 2019
There's a race condition when making connections in 4.11.x,
  which will happen easier with Bluebird & Unix socks.

Thank you to @jeremytm & @p3x-robot.

Close #918
@luin luin closed this as completed in #925 Jul 13, 2019
luin added a commit that referenced this issue Jul 13, 2019
There's a race condition when making connections in 4.11.x,
  which will happen easier with Bluebird & Unix socks.

Thank you to @jeremytm & @p3x-robot.

Close #918
ioredis-robot pushed a commit that referenced this issue Jul 13, 2019
## [4.11.2](v4.11.1...v4.11.2) (2019-07-13)

### Bug Fixes

* ETIMEDOUT error with Bluebird when connecting. ([#925](#925)) ([4bce38b](4bce38b)), closes [#918](#918)
@ioredis-robot
Copy link
Collaborator

🎉 This issue has been resolved in version 4.11.2 🎉

The release is available on:

Your semantic-release bot 📦🚀

@p3x-robot
Copy link
Author

i have confirmed, it is working now with 4.11.2 via socket! :)
thanks so much!

@jeremytm
Copy link

Confirmed. Thanks heaps!

janus-dev87 added a commit to janus-dev87/ioredis-work that referenced this issue Mar 1, 2024
## [4.11.2](redis/ioredis@v4.11.1...v4.11.2) (2019-07-13)

### Bug Fixes

* ETIMEDOUT error with Bluebird when connecting. ([#925](redis/ioredis#925)) ([4bce38b](redis/ioredis@4bce38b)), closes [#918](redis/ioredis#918)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment