Ollama integration

⭐ Community Toolkit

Ollama is a powerful, open source language model that can be used to generate text based on a given prompt. The Aspire Ollama integration provides a way to host Ollama models using the docker.io/ollama/ollama container image and access them via the OllamaSharp client.

Hosting integration

The Ollama hosting integration models an Ollama server as the OllamaResource type, and provides the ability to add models to the server using the AddModel extension method, which represents the model as an OllamaModelResource type. To access these types and APIs, install the 📦 CommunityToolkit.Aspire.Hosting.Ollama NuGet package in the AppHost project:

aspire add communitytoolkit-ollama

The Aspire CLI is interactive, be sure to select the appropriate search result when prompted:

Select an integration to add:

> communitytoolkit-ollama (CommunityToolkit.Aspire.Hosting.Ollama)
> Other results listed as selectable options...

#:package CommunityToolkit.Aspire.Hosting.Ollama@*

<PackageReference Include="CommunityToolkit.Aspire.Hosting.Ollama" Version="*" />

Add Ollama resource

In the AppHost project, register and consume the Ollama integration using the AddOllama extension method to add the Ollama container to the application builder. You can then add models to the container, which downloads and run when the container starts, using the AddModel extension method:

var builder = DistributedApplication.CreateBuilder(args);

var ollama = builder.AddOllama("ollama");

var phi35 = ollama.AddModel("phi3.5");

var exampleProject = builder.AddProject<Projects.ExampleProject>()
                            .WithReference(phi35);

builder.Build().Run();

Alternatively, if you want to use a model from the Hugging Face model hub, you can use the AddHuggingFaceModel extension method:

var llama = ollama.AddHuggingFaceModel("llama", "bartowski/Llama-3.2-1B-Instruct-GGUF:IQ4_XS");

When Aspire adds a container image to the AppHost, as shown in the preceding example with the docker.io/ollama/ollama image, it creates a new Ollama instance on your local machine.

Download the LLM

When the Ollama container for this integration first spins up, it downloads the configured LLMs. The progress of this download displays in the State column for this integration on the Aspire dashboard.

Cache the LLM

One or more LLMs are downloaded into the container which Ollama is running from, and by default this container is ephemeral. If you need to persist one or more LLMs across container restarts, you need to mount a volume into the container using the WithDataVolume method:

var ollama = builder.AddOllama("ollama")
                    .WithDataVolume();

var llama = ollama.AddModel("llama3");

Use GPUs when available

One or more LLMs are downloaded into the container which Ollama is running from, and by default this container runs on CPU. If you need to run the container in GPU you need to pass a parameter to the container runtime args.

Docker:

var ollama = builder.AddOllama("ollama")
                    .AddModel("llama3")
                    .WithContainerRuntimeArgs("--gpus=all");

For more information, see GPU support in Docker Desktop.

Podman:

var ollama = builder.AddOllama("ollama")
                    .AddModel("llama3")
                    .WithContainerRuntimeArgs("--device", "nvidia.com/gpu=all");

For more information, see GPU support in Podman.

Hosting integration health checks

The Ollama hosting integration automatically adds a health check for the Ollama server and model resources. For the Ollama server, a health check is added to verify that the Ollama server is running and that a connection can be established to it. For the Ollama model resources, a health check is added to verify that the model is running and that the model is available, meaning the resource will be marked as unhealthy until the model has been downloaded.

Open WebUI support

The Ollama integration also provided support for running Open WebUI and having it communicate with the Ollama container:

var ollama = builder.AddOllama("ollama")
                    .AddModel("llama3")
                    .WithOpenWebUI();

Client integration

To get started with the Aspire OllamaSharp integration, install the 📦 CommunityToolkit.Aspire.OllamaSharp NuGet package in the client-consuming project, that is, the project for the application that uses the Ollama client:

dotnet add package CommunityToolkit.Aspire.OllamaSharp

#:package CommunityToolkit.Aspire.OllamaSharp@*

<PackageReference Include="CommunityToolkit.Aspire.OllamaSharp" Version="*" />

Add Ollama client API

In the Program.cs file of your client-consuming project, call the AddOllamaClientApi extension to register an IOllamaClientApi for use via the dependency injection container. If the resource provided in the AppHost, and referenced in the client-consuming project, is an OllamaModelResource, then the AddOllamaClientApi method will register the model as the default model for the IOllamaClientApi:

builder.AddOllamaClientApi("llama3");

After adding IOllamaClientApi to the builder, you can get the IOllamaClientApi instance using dependency injection. For example, to retrieve your context object from service:

public class ExampleService(IOllamaClientApi ollama)
{
    // Use ollama...
}

Add keyed Ollama client API

There might be situations where you want to register multiple IOllamaClientApi instances with different connection names. To register keyed Ollama clients, call the AddKeyedOllamaClientApi method:

builder.AddKeyedOllamaClientApi(name: "chat");
builder.AddKeyedOllamaClientApi(name: "embeddings");

Then you can retrieve the IOllamaClientApi instances using dependency injection. For example, to retrieve the connection from an example service:

public class ExampleService(
    [FromKeyedServices("chat")] IOllamaClientApi chatOllama,
    [FromKeyedServices("embeddings")] IOllamaClientApi embeddingsOllama)
{
    // Use ollama...
}

Configuration

The Ollama client integration provides multiple configuration approaches and options to meet the requirements and conventions of your project.

Use a connection string

When using a connection string from the ConnectionStrings configuration section, you can provide the name of the connection string when calling the AddOllamaClientApi method:

builder.AddOllamaClientApi("llama");

Then the connection string will be retrieved from the ConnectionStrings configuration section:

{
  "ConnectionStrings": {
    "llama": "Endpoint=http//localhost:1234;Model=llama3"
  }
}

Integration with Microsoft.Extensions.AI

The 📦 Microsoft.Extensions.AI NuGet package provides an abstraction over the Ollama client API, using generic interfaces. OllamaSharp supports these interfaces, and they can be registered by chaining either the IChatClient or IEmbeddingGenerator<string, Embedding<float>> registration methods to the AddOllamaClientApi method.

To register an IChatClient, chain the AddChatClient method to the AddOllamaClientApi method:

builder.AddOllamaClientApi("llama")
       .AddChatClient();

Similarly, to register an IEmbeddingGenerator, chain the AddEmbeddingGenerator method:

builder.AddOllamaClientApi("llama")
       .AddEmbeddingGenerator();

After adding IChatClient to the builder, you can get the IChatClient instance using dependency injection. For example, to retrieve your context object from service:

public class ExampleService(IChatClient chatClient)
{
    // Use chat client...
}

Add keyed Microsoft.Extensions.AI clients

There might be situations where you want to register multiple AI client instances with different connection names. To register keyed AI clients, use the keyed versions of the registration methods:

builder.AddOllamaClientApi("chat")
       .AddKeyedChatClient("chat");
builder.AddOllamaClientApi("embeddings")
       .AddKeyedEmbeddingGenerator("embeddings");

Then you can retrieve the AI client instances using dependency injection. For example, to retrieve the clients from an example service:

public class ExampleService(
    [FromKeyedServices("chat")] IChatClient chatClient,
    [FromKeyedServices("embeddings")] IEmbeddingGenerator<string, Embedding<float>> embeddingGenerator)
{
    // Use AI clients...
}