Ollama setup for local summarization on macOS (Apple Silicon)
Overview
On Apple Silicon Macs, Ollama uses Metal for GPU-accelerated inference. It is the standard way to run local summarization models without cloud APIs—whether you use Nucleate, another app, or the terminal directly.
This guide covers installation on macOS, pulling a first model, and verifying inference. Nucleate-specific install steps are on the installation page.
Prerequisites
- macOS 12 or later on Apple Silicon (M1, M2, M3, M4, or newer)
- 16 GB unified memory recommended; 8 GB works for smaller models but leaves little headroom when transcription and summarization overlap
- Disk space: 5–10 GB per model
Intel Macs are not covered here; Ollama supports them with CPU inference, but Nucleate targets Apple Silicon for macOS.
Install Ollama
- Download the macOS app from ollama.com/download.
- Open the
.dmg, drag Ollama to Applications, and launch it. - Approve the security prompt if macOS asks to open an app from the internet.
- Ollama runs from the menu bar. Leave it open while you pull models and run tests.
Alternatively, install via Homebrew:
brew install ollamaThen start the app from Applications or run ollama serve in a terminal.
Pull a summarization model
For most Apple Silicon Macs, Mistral 7B is a balanced default:
ollama pull mistralOn M-series chips with 16 GB or more unified memory, Qwen3 8B can produce higher-quality summaries at the cost of speed:
ollama pull qwen3:8bFirst-time downloads are large; use Wi‑Fi or wired Ethernet.
Verify local inference
ollama run mistral "Summarize in three bullets: local LLMs keep data on your Mac."You should see a streamed response. List installed models with:
ollama listPerformance expectations
Apple Silicon is capable but generally slower per token than a mid-range NVIDIA desktop GPU for the same model size. That is a tradeoff many accept for silence, power efficiency, and an all-local workflow.
Nucleate includes an in-app Overdrive toggle and performance tooling; see Performance benchmarks for Mac Mini M4 example numbers.
Troubleshooting
Ollama not running / connection refused
Launch Ollama from Applications and confirm the menu bar icon is present. Restart the app after sleep if requests hang.
Model download fails or stalls
Check free disk space and network stability. Retry ollama pull with a stable connection.
Summaries extremely slow
Use Mistral instead of larger Qwen variants, close memory-heavy apps, and ensure you are on Apple Silicon builds—not Rosetta-translated x86 tools.
Prefer not to use Nucleate’s auto-installer
Nucleate on macOS can trigger Homebrew and sudo steps for FFmpeg and dependencies during first launch. Install Ollama manually first if you want to avoid that path; see the warning on the installation page.
Using this with Nucleate
Nucleate detects Ollama at launch and starts it if needed. The Hub indicator reflects Ollama status.
Recommended path:
- Install and verify Ollama using this guide.
- Install Nucleate.
- Review Software prerequisites for default models (Qwen3 vs Mistral) and custom model names.
On startup, Nucleate selects a model tier based on your hardware. You can override it anytime in AI Engine / AI Model Settings.