Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: token refresh failure after container inactivity in Serverless environment using ycm.WithCredentials() #1465

Open
jaasco opened this issue Sep 16, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@jaasco
Copy link

jaasco commented Sep 16, 2024

Bug Report

YDB GO SDK version:
3.76.5

Environment

Yandex Cloud Serverless Container
OS: Linux (within container)
CPU Architecture: x86_64

Current behavior:

When using ycm.WithCredentials() from the package github.com/ydb-platform/ydb-go-yc-metadata in a Serverless Container with provision_policy { min_instances = 1 }, the application runs normally for 12 hours. After 12 hours, the following error occurs during YDB requests:
transport/Unauthenticated (code = 16, source error = "rpc error: code = Unauthenticated desc = Unauthenticated", address: "ydb.serverless.yandexcloud.net:2135")
This issue arises because the Go routine responsible for refreshing the token stops when the container becomes inactive, and it does not restart upon the container becoming active again.

Expected behavior:

The token refresh routine should continue to function, even after container inactivity, ensuring that token expiration is properly handled and avoiding unauthenticated errors.

Steps to reproduce:

  1. Deploy an application using YDB GO SDK with ycm.WithCredentials() in a Yandex Cloud Serverless Container.
package main

import (
  "context"
  "os"

  "github.com/ydb-platform/ydb-go-sdk/v3"
  ycm "github.com/ydb-platform/ydb-go-yc-metadata"
)

func main() {
  ctx, cancel := context.WithCancel(context.Background())
  defer cancel()
  db, err := ydb.Open(ctx,
    os.Getenv("YDB_CONNECTION_STRING"),
    ycm.WithCredentials(),
    ycm.WithInternalCA(), // append Yandex Cloud certificates
  )
  if err != nil {
    panic(err)
  }
  defer db.Close(ctx)
  ...
}
  1. Set provision_policy { min_instances = 1 } for the container.
  2. Wait for 12 hours while the container becomes inactive and is reactivated.
  3. Observe the token expiration and the subsequent Unauthenticated errors in logs when attempting to access YDB.

Related code:

func (m *InstanceServiceAccountCredentials) refreshLoop() {
	defer m.timer.Stop()

	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	for {
		select {
		case <-m.done:
			// Set up error
			m.mu.Lock()
			m.token, m.err = "", &createTokenError{
				Cause:  errClosed,
				Reason: errClosed.Error(),
			}
			m.mu.Unlock()
			return
		case <-m.timer.C:
			m.refreshOnce(ctx)
		}
	}
}

Other information:

The issue was mitigated in our project by rewriting the logic around ycm.WithCredentials() to ensure the token refresh is retriggered even after container inactivity like this:

func (m *InstanceServiceAccountLazyCredentials) Token(ctx context.Context) (token string, err error) {
	onDone := trace.TraceOnGetToken(m.trace, &ctx)
	defer func() {
		onDone(m.token, m.err)
	}()

	// check token before launching loop
	m.mu.RLock()
	token, err = m.token, m.err
	expiry := m.expiry
	m.mu.RUnlock()

	fmt.Printf("Token() call: token is \"%s\", err is %v, expiry is %s\n", masked(token), err, expiry.Format("2006-01-02 15:04:05"))

	// there was some error with the token, so it is unusable now. Avoid refreshing as it might cause log explosion
	if err != nil {
		fmt.Printf("Token() call: err is %v, return empty token\n", err)
		return "", err
	}

	fmt.Printf("Token() call: err is nil, expiry is %s, time.Now() is %s\n", expiry.Format("2006-01-02 15:04:05"), time.Now().Format("2006-01-02 15:04:05"))

	// if the token is empty or is near its expiration time, refresh it
	if token == "" || time.Now().Unix() > expiry.Unix() {
		fmt.Printf("Token() call: calling refresh\n")
		m.refreshOnce(ctx) // updates m.token, m.err, m.expiry
	} else {
		fmt.Printf("Token() call: token is not empty, now is %d, expiry is %d\n", time.Now().Unix(), expiry.Unix())
	}

	return m.token, m.err
}

...

// Perform single refresh iteration.
// If token was obtained:
// 1. Clear current err;
// 2. Set up new token and expiration;
// Otherwise, if current token has expired, clear it and set up err.
func (m *InstanceServiceAccountLazyCredentials) refreshOnce(ctx context.Context) {
	now := time.Now()
	fmt.Printf("Token() call: calling metaCall %s\n", m.metadataURL)
	tok, err := m.metaCall(ctx, m.metadataURL, false)
	fmt.Printf("Token() call: metadata token: %s, expires in %v, error: %v\n", masked(tok.Token), tok.ExpiresIn, err)

	// Call has been performed, now updating fields
	m.mu.Lock()
	defer m.mu.Unlock()

	if err != nil {
		m.token = ""
		m.err = err
		return
	}

	// Renew values.
	m.token, m.expiry, m.err = tok.Token, now.Add(tok.ExpiresIn).Add(-time.Minute*time.Duration(715)), nil
}
@jaasco jaasco added the bug Something isn't working label Sep 16, 2024
@jaasco jaasco changed the title bug: Token refresh failure after container inactivity in Serverless environment using ycm.WithCredentials() bug: token refresh failure after container inactivity in Serverless environment using ycm.WithCredentials() Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant