Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(llm): address an issue where saved AI API keys are not carried over to the next sessions #76

Closed
jooyoungseo opened this issue Aug 14, 2024 · 9 comments · Fixed by #102
Assignees
Labels
backlog Low priority issues

Comments

@jooyoungseo
Copy link
Member

Reproducible Steps

  1. Create any plots and display via maidr.show()

  2. Hit H key and provide API keys for both OpenAI and Google Gemini

  3. Click Save and Close

  4. If prompted, save the password in your browser

  5. Open LLM from the interactive plot area via Ctrl+Shift+/ (on Windows) or Alt+Shift+/ (on Mac)

  6. Make sure the AI responses are working

  7. Close the maidr browser and exit out of the current Python repl

  8. Repeat the steps above and see if the AI API keys are preserved to the next sessions

Current Behavior

API keys are not carried over to the next sessions even when it's saved in the browser

Suggested Solution

  1. When maidr.show is executed search for the following env variables via os.getenv():
  • OPENAI_API_KEY

  • GOOGLE_API_KEY

  1. Use these keys as fallback
@jooyoungseo
Copy link
Member Author

I have noticed some possible issues with the proposal of using the env variables:

  1. Missing API: Currently maidr.js does not have the API property in its JSON schema so we cannot directly pass the keys from Python to the browser frontend.

  2. Security risk: We cannot simply bundle the API keys inside the JSON which will open a malicious hole.

@dakshpokar Please investigate more reliable solutions. Ideally, we want to make the maidr.js session cookie reliable, and this issue needs to be further investigated from the upstream end.

@dakshpokar
Copy link
Collaborator

dakshpokar commented Aug 28, 2024

Greetings Professor @jooyoungseo,

Upon investigating the issue, I discovered that the changing port number with each run in interactive mode causes the loss of local storage variables. This is because the local storage, where we store the OPENAI and GEMINI keys in the settings_data field, is bound to the domain and port combination. As a result, each new run with a different port results in the loss of stored keys.

As you discussed in the previous comment, I also thought of passing it from Python Binder to Browser Frontend, but that could potentially be a security risk.

Another solution I explored was storing the keys upon initial entry and passing them to subsequent runs. However, this is not feasible due to the (Same Origin Policy).

I also found that we can have a way to share local storage data across domains (+ports) but for that, we need to know which domain we want the local storage data to be injected into, which in our case we will not know considering that every time we run we have a unique port. So this is eliminated.

Lastly, I considered using cookies, as their Same Origin Policy is based on the domain name, not the port. Since our domain remains 'localhost', this could work. However, there is a risk that any web application could access the cookies and retrieve the keys, making this approach unsuitable for our needs.

I am thinking of some other solutions as well, I will keep you posted about those but right now this is what I found. I will take tomorrow's time to find a reliable fix.

Best regards,
Daksh Pokar

@jooyoungseo
Copy link
Member Author

@dakshpokar Can we pin down the port number to make it stable? What's the tradeoff?

@dakshpokar
Copy link
Collaborator

Greetings Professor @jooyoungseo,
Port Selection is handled by py-htmltools internally. As we don't have control over this, pinning down the port number doesn't seem to be feasible.

@dakshpokar
Copy link
Collaborator

dakshpokar commented Aug 28, 2024

Professor @jooyoungseo,
I did think of one solution that just might work, but for that, we need to store the OpenAI and Gemini Keys in Python first. What we can do is that with every run we can just inject the keys in the localStorage of the current browser instance. We can easily do so with the following JS -

function addKeyValueLocalStorage(iframeId, key, value) {
          const iframe = document.getElementById(iframeId);

          if (iframe && iframe.contentWindow) {
              try {
                  iframe.contentWindow.localStorage.setItem(key, value);
              } catch (error) {
              console.error('Error accessing iframe localStorage:', error);
              }
          } else {
              console.error('Iframe not found or inaccessible.');
          }
}

addKeyValueLocalStorage('myIframe', 'openAIKey', <<fetch securely from python binder>>);

Within the onload attribute of the iframe, we can add the above JS snippet.

image

This does work perfectly, we just have to ask the user in the Python binder for the OpenAI and/or Gemini Keys. Once that is done we can store these keys in an encrypted manner and fetch this on the go whenever an instance is run.

cc: @SaaiVenkat

@jooyoungseo
Copy link
Member Author

@dakshpokar Why do we have to ask users to enter the keys each time? Could we just fetch the keys from users env variables for both OpenAI and Gemini keys?

@dakshpokar
Copy link
Collaborator

Yes, Professor @jooyoungseo, we will store the keys in environment variables and will not ask for them each time. I would like clarification on when to request the keys: if a user denies the request initially, should we ask again? I am considering the best approach to ensure a positive experience for our target audience.

@jooyoungseo
Copy link
Member Author

@dakshpokar -- I would rather include the instructions on how to add OPENAI_API_KEY and GOOGLE_API_KEY to their env variables in our user guide. We don't need to implement the interactive prompt just like other libraries (e.g., openai; langchain; etc.).

Just fetch the OPENAI_API_KEY and GOOGLE_API_KEY from user env variable please at this point.

@dakshpokar
Copy link
Collaborator

Sure Professor @jooyoungseo that works!

@jooyoungseo jooyoungseo added the backlog Low priority issues label Sep 12, 2024
jooyoungseo pushed a commit that referenced this issue Sep 13, 2024
<!-- Suggested PR Title: [feat/fix/refactor/perf/test/ci/docs/chore]
brief description of the change -->
<!-- Please follow Conventional Commits:
https://www.conventionalcommits.org/en/v1.0.0/ -->

## Description
This pull request fixes the handling of API keys for LLMs in the code.
It adds a JavaScript script to handle the API keys for LLMs and
initializes the LLM secrets in the MAIDR instance. The script injects
the LLM API keys into the MAIDR instance and sets the appropriate
settings based on the presence of the Gemini and OpenAI API keys. This
ensures that the LLM functionality works correctly with the updated API
key handling.

closes #76 

## Type of Change

- [x] Bug fix
- [ ] New feature
- [ ] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [ ] Documentation update

## Checklist

- [x] My code follows the style guidelines of this project
- [x] I have performed a self-review of my code
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [x] Any dependent changes have been merged and published in downstream
modules

# Pull Request

## Description
1. Added a new method called `initialize_llm_secrets()` in
environment.py which fetches the keys from the environment variable.
2. Injected the script when the maidr iframe loads initially.

## Checklist
<!-- Please select all applicable options. -->
<!-- To select your options, please put an 'x' in the all boxes that
apply. -->

- [x] I have read the [Contributor Guidelines](../CONTRIBUTING.md).
- [x] I have performed a self-review of my own code and ensured it
follows the project's coding standards.
- [x] I have tested the changes locally following
`ManualTestingProcess.md`, and all tests related to this pull request
pass.
- [x] I have commented my code, particularly in hard-to-understand
areas.
- [x] I have updated the documentation, if applicable.
- [x] I have added appropriate unit tests, if applicable.

## Additional Notes
<!-- Add any additional notes or comments here. -->
<!-- Template credit: This pull request template is based on Embedded
Artistry
{https://github.com/embeddedartistry/templates/blob/master/.github/PULL_REQUEST_TEMPLATE.md},
Clowder
{https://github.com/clowder-framework/clowder/blob/develop/.github/PULL_REQUEST_TEMPLATE.md},
and TalAter {https://github.com/TalAter/open-source-templates}
templates. -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog Low priority issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants