Version 0.17.5

Update list of Ollama models for function calling.
Centralize model list so things like Vertex and Open AI compatible libraries can have more accurate context lengths and capabilities.
Update default Gemini chat model to Gemini 1.5 Pro.

Version 0.17.4

Fix problem with Open AI’s llm-chat-token-limit.
Fix Open AI and Gemini’s parallel function calling.
Add variable llm-prompt-default-max-tokens to put a cap on number of tokens regardless of model size.

Version 0.17.3

More fixes with Claude and Ollama function calling conversation, thanks to Paul Nelson.
Make llm-chat-streaming-to-point more efficient, just inserting new text, thanks to Paul Nelson.
Don’t output streaming information when llm-debug is true, since it tended to be overwhelming.

Version 0.17.2

Fix compiled functions not being evaluated in llm-prompt.
Use Ollama’s new embed API instead of the obsolete one.
Fix Claude function calling conversations
Fix issue in Open AI streaming function calling.
Update Open AI and Claude default chat models to the later models.

Version 0.17.1

Support Ollama function calling, for models which support it.
Make sure every model, even unknown models, return some value for llm-chat-token-limit.
Add token count for llama3.1 model.
Make llm-capabilities work model-by-model for embeddings and functions

Version 0.17.0

Introduced llm-prompt for prompt management and creation from generators.
Removed Gemini and Vertex token counting, because llm-prompt uses token counting often and it’s best to have a quick estimate than a more expensive more accurate count.

Version 0.16.2

Fix Open AI’s gpt4-o context length, which is lower for most paying users than the max.

Version 0.16.1

Add support for HTTP / HTTPS proxies.

Version 0.16.0

Add “non-standard params” to set per-provider options.
Add default parameters for chat providers.

Version 0.15.0

Move to plz backend, which uses curl. This helps move this package to a stronger foundation backed by parsing to spec. Thanks to Roman Scherer for contributing the plz extensions that enable this, which are currently bundled in this package but will eventually become their own separate package.
Add model context information for Open AI’s GPT 4-o.
Add model context information for Gemini’s 1.5 models.

Version 0.14.2

Fix mangled copyright line (needed to get ELPA version unstuck).
Fix Vertex response handling bug.

Version 0.14.1

Fix various issues with the 0.14 release

Version 0.14

Introduce new way of creating prompts: llm-make-chat-prompt, deprecating the older ways.
Improve Vertex error handling

Version 0.13

Add Claude’s new support for function calling.
Refactor of providers to centralize embedding and chat logic.
Remove connection buffers after use.
Fixes to provider more specific error messages for most providers.

Verson 0.12.3

Refactor of warn-non-nonfree methods.
Add non-free warnings for Gemini and Claude.

Version 0.12.2

Send connection issues to error callbacks, and fix an error handling issue in Ollama.
Fix issue where, in some cases, streaming does not work the first time attempted.

Version 0.12.1

Fix issue in llm-ollama with not using provider host for sync embeddings.
Fix issue in llm-openai where were incompatible with some Open AI-compatible backends due to assumptions about inconsequential JSON details.

Version 0.12.0

Add provider llm-claude, for Anthropic’s Claude.

Version 0.11.0

Introduce function calling, now available only in Open AI and Gemini.
Introduce llm-capabilities, which returns a list of extra capabilities for each backend.
Fix issue with logging when we weren’t supposed to.

Version 0.10.0

Introduce llm logging (for help with developing against llm), set llm-log to non-nil to enable logging of all interactions with the llm package.
Change the default interaction with ollama to one more suited for converesations (thanks to Thomas Allen).

Version 0.9.1

Default to the new “text-embedding-3-small” model for Open AI. Important: Anyone who has stored embeddings should either regenerate embeddings (recommended) or hard-code the old embedding model (“text-embedding-ada-002”).
Fix response breaking when prompts run afoul of Gemini / Vertex’s safety checks.
Change Gemini streaming to be the correct URL. This doesn’t seem to have an effect on behavior.

Version 0.9

Add llm-chat-token-limit to find the token limit based on the model.
Add request timeout customization.

Version 0.8

Allow users to change the Open AI URL, to allow for proxies and other services that re-use the API.
Add llm-name and llm-cancel-request to the API.
Standardize handling of how context, examples and history are folded into llm-chat-prompt-interactions.

Version 0.7

Upgrade Google Cloud Vertex to Gemini - previous models are no longer available.
Added gemini provider, which is an alternate endpoint with alternate (and easier) authentication and setup compared to Cloud Vertex.
Provide default for llm-chat-async to fall back to streaming if not defined for a provider.

Version 0.6

Add provider llm-llamacpp.
Fix issue with Google Cloud Vertex not responding to messages with a system interaction.
Fix use of (pos-eol) which is not compatible with Emacs 28.1.

Version 0.5.2

Fix incompatibility with older Emacs introduced in Version 0.5.1.
Add support for Google Cloud Vertex model text-bison and variants.
llm-ollama can now be configured with a scheme (http vs https).

Version 0.5.1

Implement token counting for Google Cloud Vertex via their API.
Fix issue with Google Cloud Vertex erroring on multibyte strings.
Fix issue with small bits of missing text in Open AI and Ollama streaming chat.

Version 0.5

Fixes for conversation context storage, requiring clients to handle ongoing conversations slightly differently.
Fixes for proper sync request http error code handling.
llm-ollama can now be configured with a different hostname.
Callbacks now always attempts to be in the client’s original buffer.
Add provider llm-gpt4all.

Version 0.4

Add helper function llm-chat-streaming-to-point.
Add provider llm-ollama.

Version 0.3

Streaming support in the API, and for the Open AI and Vertex models.
Properly encode and decode in utf-8 so double-width or other character sizes don’t cause problems.

Version 0.2.1

Changes in how we make and listen to requests, in preparation for streaming functionality.
Fix overzealous change hook creation when using async llm requests.

Version 0.2

Remove the dependency on non-GNU request library.