Ollama on Windows

Ollama setup for local summarization on Windows

Overview

Ollama runs open-weight language models on your PC without sending prompts to a cloud API. If you want local summarization—turning transcripts into structured notes on your machine—Ollama is the most straightforward starting point on Windows.

This guide covers installation, pulling a first model, and verifying that inference works. It does not cover Nucleate installation; see Nucleate installation when you are ready to connect the two.

Prerequisites

Windows 10 or 11 (64-bit)
16 GB RAM recommended for 7–8B parameter models; more helps if you run transcription and summarization at the same time
NVIDIA GPU with 8 GB+ VRAM (optional but strongly recommended for reasonable speed). CPU-only inference works but is slow for long summaries
Disk space: plan for 5–10 GB per model you keep installed

For a full hardware picture when using Nucleate, see the compatibility cheat sheet.

Install Ollama

Download the Windows installer from ollama.com/download.
Run the installer and accept the defaults. Ollama installs a background service and adds the ollama CLI to your PATH.
After install, Ollama usually starts automatically. Confirm the tray icon is present, or open Ollama from the Start menu.

Pull a summarization model

Nucleate defaults to Mistral 7B on mid-range hardware and Qwen3 8B when VRAM allows. Either is a sensible first pull for general summarization:

ollama pull mistral

For higher-end GPUs:

ollama pull qwen3:8b

The first pull downloads several gigabytes. Subsequent models reuse shared layers when possible.

Verify local inference

Run a short prompt to confirm the model loads and responds:

ollama run mistral "Summarize in three bullets: local LLMs keep data on your machine."

You should see a streamed reply in the terminal. If the model loads and completes, Ollama is working.

To list installed models:

ollama list

Keep Ollama running

Many tools—including Nucleate—expect Ollama to be available in the background. The Windows installer configures Ollama to start with the system. If summarization fails with a connection error, check that the Ollama tray app is running and restart it from the Start menu.

ℹ️

Ollama stays running after you close other apps. That is normal; it keeps model load times down for the next request.

Troubleshooting

ollama is not recognized
Close and reopen your terminal after install, or log out and back in so PATH updates apply.

Very slow responses on CPU only
You are likely on integrated graphics or an unsupported GPU. Use a smaller model (mistral rather than larger Qwen variants) or expect long runtimes on long transcripts. See hardware prerequisites for Nucleate-specific GPU notes.

Out of memory / model fails to load
Try a smaller quant or Mistral 7B instead of larger models. Close other GPU-heavy apps before loading.

Firewall or corporate policy blocks download
Pull models on an unrestricted network first, or install manually following Ollama’s documentation.

Using this with Nucleate

Nucleate uses Ollama for local daily, weekly, and monthly summarization. On first launch it can auto-install Ollama if missing, but installing ahead of time avoids surprises during setup.

After Ollama is running:

Install Nucleate and complete first-time setup.
In AI Engine / AI Model Settings, confirm the summarizer points at your preferred Ollama model.
See Software prerequisites for in-app model customization and status indicators.

Nucleate will start Ollama if it is not already running and will rebuild model bindings when you change user modes.