Ollama Python Library

The Ollama Python library provides the easiest way to integrate Python 3.8+ projects with Ollama.

Install

pip install ollama

Usage

import ollama
response = ollama.chat(model='llama3.1', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
print(response['message']['content'])

Streaming responses

Response streaming can be enabled by setting stream=True, modifying function calls to return a Python generator where each part is an object in the stream.

import ollama

stream = ollama.chat(
    model='llama3.1',
    messages=[{'role': 'user', 'content': 'Why is the sky blue?'}],
    stream=True,
)

for chunk in stream:
  print(chunk['message']['content'], end='', flush=True)

API

The Ollama Python library's API is designed around the Ollama REST API

Chat

ollama.chat(model='llama3.1', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])

Generate

ollama.generate(model='llama3.1', prompt='Why is the sky blue?')

List

ollama.list()

Show

ollama.show('llama3.1')

Create

modelfile='''
FROM llama3.1
SYSTEM You are mario from super mario bros.
'''

ollama.create(model='example', modelfile=modelfile)

Copy

ollama.copy('llama3.1', 'user/llama3.1')

Delete

ollama.delete('llama3.1')

Pull

ollama.pull('llama3.1')

Push

ollama.push('user/llama3.1')

Embed

ollama.embed(model='llama3.1', input='The sky is blue because of rayleigh scattering')

Embed (batch)

ollama.embed(model='llama3.1', input=['The sky is blue because of rayleigh scattering', 'Grass is green because of chlorophyll'])

Ps

ollama.ps()

Custom client

A custom client can be created with the following fields:

host: The Ollama host to connect to
timeout: The timeout for requests

from ollama import Client
client = Client(host='http://localhost:11434')
response = client.chat(model='llama3.1', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])

Async client

import asyncio
from ollama import AsyncClient

async def chat():
  message = {'role': 'user', 'content': 'Why is the sky blue?'}
  response = await AsyncClient().chat(model='llama3.1', messages=[message])

asyncio.run(chat())

Setting stream=True modifies functions to return a Python asynchronous generator:

import asyncio
from ollama import AsyncClient

async def chat():
  message = {'role': 'user', 'content': 'Why is the sky blue?'}
  async for part in await AsyncClient().chat(model='llama3.1', messages=[message], stream=True):
    print(part['message']['content'], end='', flush=True)

asyncio.run(chat())

Errors

Errors are raised if requests return an error status or if an error is detected while streaming.

model = 'does-not-yet-exist'

try:
  ollama.chat(model)
except ollama.ResponseError as e:
  print('Error:', e.error)
  if e.status_code == 404:
    ollama.pull(model)

Name		Name	Last commit message	Last commit date
Latest commit History 185 Commits
.github		.github
examples		examples
ollama		ollama
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ollama Python Library

Install

Usage

Streaming responses

API

Chat

Generate

List

Show

Create

Copy

Delete

Pull

Push

Embed

Embed (batch)

Ps

Custom client

Async client

Errors

About

Releases 16

Used by 7.6k

Contributors 25

Languages

License

ollama/ollama-python

Folders and files

Latest commit

History

Repository files navigation

Ollama Python Library

Install

Usage

Streaming responses

API

Chat

Generate

List

Show

Create

Copy

Delete

Pull

Push

Embed

Embed (batch)

Ps

Custom client

Async client

Errors

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 16

Used by 7.6k

Contributors 25

Languages