Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chime chat(messaging) messagingSessionDidStop did not fire on disconnect #2372

Closed
4 tasks done
singulli1 opened this issue Jul 20, 2022 · 33 comments
Closed
4 tasks done
Labels
messaging-service Messaging service + Disperse

Comments

@singulli1
Copy link

singulli1 commented Jul 20, 2022

What happened and what did you expect to happen?

From time to time the chat websocket will disconnect.

We were able to capture a disconnect in the below log file:

Image

The log shows a webSocket close 1006, opening a connection , and another webSocket close of 4401.

The documentation : https://docs.aws.amazon.com/chime-sdk/latest/dg/websockets.html
mentions handling disconnects.

I'm assuming that a disconnect will fire the messagingSessionDidStop event of the MessagingSessionObserver. However, in the log file, I didn't see this happening.

What event should we use to initiate a reconnect? I'm assuming its the messagingSessionDidStop event but this isn't explicitly stated in the documentation.

Have you reviewed our existing documentation?

Reproduction steps

See above

Amazon Chime SDK for JavaScript version

2.24.0

What browsers are you seeing the problem on?

chrome

Browser version

Version 103.0.5060.114

Meeting and Attendee ID Information.

No response

Browser console logs

see above

@devalevenkatesh devalevenkatesh added the messaging-service Messaging service + Disperse label Jul 25, 2022
@devalevenkatesh
Copy link
Contributor

May be this was fixed in 2.30.1. Have tagged messaging team for more help on this.

Could you please check upgrading to 2.30.1 and whether that fixes it?

@singulli1
Copy link
Author

I will: While I'm doing this, can you answer the following:

From AWS documentation:
https://docs.aws.amazon.com/chime-sdk/latest/dg/websockets.html

When the application uses a close code to reconnect, the application should:

  1. Call the GetMessagingSessionEndpoint again to obtain a new base URL.
  2. Refresh the IAM credentials if they've expired.
  3. Connect via the WebSocket.

What event is supposed to through the close code and how are we supposed to listen for it?
Thanks,

@devalevenkatesh
Copy link
Contributor

When you are using amazon-chime-sdk-js, the reconnection is handled internally so canRoconnect decides the reconnection and you should be getting messagingSessionDidStartConnecting observer triggered when re-connecting. I believe in your case the reconnection was triggered but there was a bug which @daviwith fixed in PR: #2193. This PR was released in 2.30.1, hence suggested upgrading and seeing if that fixes the issue.

@singulli1
Copy link
Author

We updated amazon-chime-sdk-js to 3.6.0
We have 25 users chatting heavily in NM.
Things now see to work for a while, then fall apart.

I'm attaching an example log file:
chatlog3.txt

Any help would be greatly appreciated.

@dpwspoon
Copy link
Contributor

The fix has not been made in the 3.x line yet. It is present in the latest 2.30.1 version

We are working on the 3.x line, an initial attempt at that was made here: #2180. This had to be reverted because amazon-chime-sdk-js 3.x uses the new AWS javascript v3 client. The v3 aws client is modular and has (1) a new Command based API and (2) a backwards compatible based API. We are still figuring out how to make the fix in a way that will work regardless of which one of those 2 APIs is supplied to the amazon-chime-sdk-js. The initial fix only worked when using the Command based API.

@dpwspoon
Copy link
Contributor

Updated 3.x line PR for fix on reconnect with 4401: https://github.com/aws/amazon-chime-sdk-js/pull/2400/files

@singulli1
Copy link
Author

This is great news.
How long (best guess) until this is available in a release? We are trying to decide if we should revert back to 2.30.1 or wait for 3.x to be fixed.
Thanks so much for your help :)

@dpwspoon
Copy link
Contributor

Fix was was released in 3.7 here: https://github.com/aws/amazon-chime-sdk-js/tree/v3.7.0.

@singulli1
Copy link
Author

We applied the fix, we are not getting tons of error messages intermittently
See attached log file.

This is a huge deal for us.

chatLog4.txt

@dpwspoon
Copy link
Contributor

Can you show how you are initiating the Messaging Session Configuration and how you are importing the AWS client

@dpwspoon dpwspoon reopened this Aug 24, 2022
@singulli1
Copy link
Author

singulli1 commented Aug 24, 2022 via email

@singulli1
Copy link
Author

Message Session Connection:

async connect(currentUser: User, currentTenant: Tenant) {

this.initialize();
console.log(`Starting Chat Session`);

this.currentUser = currentUser;
this.currentTenant = currentTenant;

// loggers start
this.consoleLogger = new ConsoleLogger('SDK Chat', LogLevel.INFO);

const url = environment.logging_endpoint;

this.postLogger = new CustPOSTLogger({
  url: url,
  headersFn: this.headersFn.bind(this),
  logLevel: LogLevel.INFO,
  metadata: {
    feature: 'SDK chat',
    sessionId: this.sessionId,
    userName: this.currentUser.userName,
    tenantID: this.currentTenant.ID,
    browserVersion: this.defaultBrowserBehaviour.version(),
    browserName: this.defaultBrowserBehaviour.name(),
  }
});

this.multiLogger = new MultiLogger(this.consoleLogger, this.postLogger);
// loggers end


await this.setUsers();
await this.dataService.createAppInstanceUser(currentUser.ID, User.getfmlName(currentUser)).toPromise();
await this.setAWSCredentials();

const response = await this.chimeAPIService.getMessagingSessionEndpoint();

this.endpoint = response?.Endpoint?.Url;

const sessionConfig = new MessagingSessionConfiguration(
  this.chimeAPIService.createMemberArn(currentUser.ID),
  this.sessionId,
  this.endpoint,
  new AWS.ChimeSDKMessaging()
);

this.session = new DefaultMessagingSession(
  sessionConfig,
  this.multiLogger
);


this.session.addObserver(this.messageObserver);
await this.session.start();

await this.getChannels();
this.setDefaultActiveChannel();
this.createTop5Channels();
this.broadcastUserPresence('Online');

this.pingChatSubscription = this.intervalService.pingChat.subscribe(() => {
  this.sendPing();
});

this.connected = true;

}

@singulli1
Copy link
Author

imports

import { v4 as uuid } from 'uuid';
import AWS from 'aws-sdk';
import { ChimeAPIService } from './chimeAPI.service';
import { Injectable, Output, EventEmitter } from '@angular/core';
import { chimeChatConfigPsykdesk, environment } from '../../environments/environment';
import { DataService } from '../data-services/data.service';
import { User, Tenant } from '../data-services/patient-utils';
import { Channel1On1Member, ChannelMemberMeta, ChannelMessage} from './chat.objects';

import {
LogLevel,
ConsoleLogger,
DefaultMessagingSession,
MessagingSessionConfiguration,
MessagingSessionObserver,
Message,
MultiLogger,

} from 'amazon-chime-sdk-js';
import { ChannelMembershipForAppInstanceUserSummary, ChannelMessageSummary } from 'aws-sdk/clients/chime';
import { Subscription } from 'rxjs';
import { IntervalService } from '../auth-services/interval.service';
import { isAfter, differenceInSeconds, isEqual, isSameDay } from 'date-fns';
import { AmplifyService } from '../auth-services/amplify.service';
import CustPOSTLogger from '../custLogger/custPostLogger';
import { DefaultBrowserBehavior } from 'amazon-chime-sdk-js';

@singulli1
Copy link
Author

singulli1 commented Aug 24, 2022

If we revert to 2.31.0 will we have the same issue?

@dpwspoon
Copy link
Contributor

dpwspoon commented Aug 24, 2022

What version of the AWS client are you using?

import AWS from 'aws-sdk';

Can you debug where Messaging Session failed to resolve endpoint is see what is the object it is reading that it sees undefined on?

The Demo here shows an example of how it is working with 3.x AWS SDK: https://github.com/aws/amazon-chime-sdk-js/blob/v3.7.0/demos/browser/app/messagingSession/messagingSession.ts#L64, but would like to dive deeper into what you are seeing as well

@singulli1
Copy link
Author

package.json: "aws-sdk": "^2.1148.0",

npm list aws-sdk
C:\Users\scott\projectsAngular\psykdesk-mark7>npm list aws-sdk
[email protected] C:\Users\scott\projectsAngular\psykdesk-mark7
`-- [email protected]

@singulli1
Copy link
Author

These logs are coming from the server in our production environment - and they are sporadic, not everyone is having the issue.

So the question is how to recreate to debug. Throttling or taking the browser off line, then back on may do it, but I have other processes running in my app that will throw errors so hard to capture just for messaging.

Can you simulate the WebSocket close: 1006 error when you test? Unreliable connection when the amazon-chime-sdk-js ping does not get a response?

When you do, what happens when the sdk tries to reconnect?

@dpwspoon
Copy link
Contributor

dpwspoon commented Aug 24, 2022

1006 means closed abnormally and is normally due to a network disconnect. You can likely simulate by running the app on wifi and then turning off the wifi connection.

If this was the case this would match the "WebSocket close: 4999 Failed to get messaging session endpoint URL" that is retrying to reconnect. The expectation is that those will repeat while network connectivity is down. When network connectivity is restored the WebSocket will automatically reconnect (with the latest fix)

@singulli1
Copy link
Author

In the 3x code (The Demo here shows an example of how it is working with 3.x AWS SDK)

You have the line:

this.configuration = new MessagingSessionConfiguration(this.userArn, this.sessionId, undefined, chime);
How are you getting the session endpoint? Looks like it is getting passed in as undefined.

@dpwspoon
Copy link
Contributor

In the newer versions the chime client is used internally to resolve the session endpoint. This is done because on reconnect the endpoint might change. You can still pass it in, but it will only be used for the initial connect.

@singulli1
Copy link
Author

If I pass in the endpoint as undefined in the MessagingSessionConfiguration

const sessionConfig = new MessagingSessionConfiguration(
  this.chimeAPIService.createMemberArn(currentUser.ID),
  this.sessionId,
  undefined, // this.endpoint,
  new AWS.ChimeSDKMessaging()
);

I get

2022-08-24T17:23:32.041Z [ERROR] SDK Chat - Messaging Session failed to resolve endpoint: TypeError: Cannot read properties of undefined (reading 'Url')
2022-08-24T17:23:32.044Z [INFO] SDK Chat - WebSocket close: 4999 Failed to get messaging session endpoint URL

This is not what I should be getting, correct?

@singulli1
Copy link
Author

Do I need to install AWS SDK version 3.x? as well as the amazon-chime-sdk-js?

@dpwspoon
Copy link
Contributor

dpwspoon commented Aug 25, 2022

This is not what I should be getting, correct?

That is correct, will look into getting this fixed. The PR here has both tests and manual test instructions for the AWS SDK version 3.x client, for both the 3.x clients APIs; the new command based way, and the backwards compatible API. Based on what you are saying the test for the backwards compatible API using 3.x AWS does not seem to be working when AWS SDK 2.x client is used.

Will see if I can reproduce and get a fix that will make AWS SDK 2.x work with the chime sdk 3.x I do see however that there appears to be a slight different on the chime aws client sdk 2.x (only supports AWS SDK 2.x) and 3.x on the implementation of backwards compatible API.

Do I need to install AWS SDK version 3.x? as well as the amazon-chime-sdk-js?

That is likely your quickest path forward OR trying the amazon-chime-sdk-js 2.x with AWS SDK 2.x

Thanks for pointing this out

@singulli1
Copy link
Author

singulli1 commented Aug 25, 2022

Updated to AWS SDK 3.x using "@aws-sdk/client-chime-sdk-messaging": "^3.154.0" and moved to prod last night.
FYI, we also had to rework our code as part of the upgrade to 3.x

We're watching the logs.

We were getting some looping when the credentials expire - but this may be due problems with our shutdown logic.
chatLog5.txt

However, you may want to check what happens in the reconnect logic when credentials expire. You may want to end the reconnect attempt loop if expired credentials.

@dpwspoon
Copy link
Contributor

dpwspoon commented Aug 25, 2022

The AWS client that you pass in can be configured with logic to refresh the credentials when expired. Depending on how credentials are initialized this can be auto configured but sometimes requires manual configuration. The retry logic should be somewhat configurable here:

reconnectTimeoutMs: number = 10 * 1000;
.

Would be interested if there are different options that would be of use? Perhaps a max retry before giving up, or giving up if we can detect if credentials are not configured to refresh

@singulli1
Copy link
Author

Is there a reason you would want to continue retrying if credentials are expired (receive an ExpiredTokenException)? I'm not sure off hand.

Also, do you have any example code for refreshing credentials in version 3.x - we destroy and recreate ChimeSDKMessagingClient, ChimeSDKMessaging, and DefaultMessagingSession with new credentials 5 min before they expire but was wondering if there is a cleaner way...

@singulli1
Copy link
Author

singulli1 commented Aug 26, 2022

Looks like we've captured the looping scenario in chatlog5.txt attached above.

  1. lock computer

  2. browser sleeps, all timers shut down

  3. get WebSocket close: 1006 event

  4. wait for credentials to expire (say overnight)

  5. (next day) unlock computer

  6. reconnect logic starts up

  7. get "Messaging Session failed to resolve endpoint: ExpiredTokenException: The security token included in the request is expired"

  8. get "WebSocket close: 4999 Failed to get messaging session endpoint URL"

  9. then this starts looping until browser is closed.

  10. even calling DefaultMessagingSession.stop doesn't seem to break the loop.

    this.session.stop();
    this.session.removeObserver(this.messageObserver);
    this.session = undefined;

What should be happening in this scenario with 3.x?

Thanks again for all your help.

@KevinCGH
Copy link

KevinCGH commented Sep 1, 2022

Looks like we've captured the looping scenario in chatlog5.txt attached above.

  1. lock computer
  2. browser sleeps, all timers shut down
  3. get WebSocket close: 1006 event
  4. wait for credentials to expire (say overnight)
  5. (next day) unlock computer
  6. reconnect logic starts up
  7. get "Messaging Session failed to resolve endpoint: ExpiredTokenException: The security token included in the request is expired"
  8. get "WebSocket close: 4999 Failed to get messaging session endpoint URL"
  9. then this starts looping until browser is closed.
  10. even calling DefaultMessagingSession.stop doesn't seem to break the loop.
    this.session.stop();
    this.session.removeObserver(this.messageObserver);
    this.session = undefined;

What should be happening in this scenario with 3.x?

Thanks again for all your help.

any updates? i have the same problem.

FYI: The SDK version i used.

@manasisurve
Copy link
Contributor

We have been able to identify the root cause of the infinite reconnect loop on credential expiry and are working on a fix. In the meantime, if credentials are refreshed periodically prior to expiry, the above scenario will not be hit. How are you refreshing credentials with the aws sdk v3 upgrade? Are you using aws Cognito identity?

@singulli1
Copy link
Author

Approach to refreshing credentials in 3.x

Currently, we have custom timer that runs every minute and checks to see if the credentials are expired. If they are getting close to expiration, we

  1. stop the DefaultMessagingSession
  2. "teardown" the DefaultMessagingSession, ChimeSDKMessagingClient, ChimeSDKMessaging objects
  3. get new credentials by calling a server process that uses STS.assumeRole to retrieve new credentials
  4. recreate these DefaultMessagingSession, ChimeSDKMessagingClient, ChimeSDKMessaging with the new credentials.
  5. start the DefaultMessagingSession

Is there a more elegant way to refresh the credentials in 3.x?
Thanks again for you help.

@dpwspoon
Copy link
Contributor

dpwspoon commented Sep 9, 2022

I believe works as follows. On AWS 3.x you can provide the client with a credential provider. The credential provider is a function that returns a promise of credentials. The credentials can have an expiration set. When the client calls the APIs it will get/store credentials from the credential provider. When those credentials expire (based on expiration) the client will get new credentials from the provider.

I used the following code snip it to test/demonstrate:

let firstCall = true;
const credentialsThatAreActuallyValidFor1Hour = await this.fetchCredentials();

// A credential provider just is a function that returns a promise with credentials
const credentialProviderExample = function() {
  return new Promise((resolve, reject)=> {
    console.log("Credential Provider called to resolve credentials, is first call " + firstCall);
    
    // credential expiration is a Date object (getTime() is called internally)
    const credentialsExpireAt = new Date();
    credentialsExpireAt.setSeconds(credentialsExpireAt.getSeconds() + 1)
    credentialsThatAreActuallyValidFor1Hour.expiration = credentialsExpireAt;

    // To demonstrate refreshing credentials, always return expired credentials on first call
    if (firstCall) {    
      firstCall = false;
      setTimeout(()=> {
        resolve(credentialsThatAreActuallyValidFor1Hour)
      }, 2000)
    } else {
      resolve(credentialsThatAreActuallyValidFor1Hour);
    }
  });
}

// Initialize client with credential provider
const chime = new ChimeSDKMessagingClient({ region: 'us-east-1', credentials: credentialProviderExample });

// not shown, make an API call

and got on the console:

Calling API
Credential Provider called to resolve credentials, is first call true
Credential Provider called to resolve credentials, is first call false
API Called successfully

@singulli1
Copy link
Author

singulli1 commented Sep 11, 2022 via email

@paparekh-amzn
Copy link
Contributor

Above from dpwspoon shows how to refresh credentials, we don't expect any of the issues would have been hit if done that way.

The release 3.8.0 fixes the following :

  • Break infinite reconnect loop in messagingSession on timeout
  • Add backwards compatibility with AWS JS SDK V2 for getMessagingSessionEndpoint
  • Fix behavior of web socket disconnects before a session is connected
    • Session.start() promise shall fail in the scenario and the promise would return the close code
    • If a web socket disconnects after the session has connected, messagingSessionDidStop shall be fired

Feel free to reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
messaging-service Messaging service + Disperse
Projects
None yet
Development

No branches or pull requests

6 participants