Back to Blog
team@tinypod.app

Self-Hosting Ollama: Run AI Models Locally

Ollama lets you run large language models locally. Llama, Mistral, Gemma — run AI without sending data to cloud APIs.

ollamaaillmprivacy

What Is Ollama?


Ollama makes it easy to run large language models locally. Download a model, run it, and interact via API or CLI.


Available Models


  • Llama 3 (8B, 70B) — Meta's open model
  • Mistral (7B) and Mixtral (8x7B)
  • Gemma (2B, 7B) — Google's open model
  • Phi-3 — Microsoft's small model
  • CodeLlama — code-focused
  • Qwen — Alibaba's model
  • And many more community models

  • Getting Started


    bash

    ollama run llama3


    That's it. Ollama downloads the model and starts a chat session.


    API


    Ollama provides an OpenAI-compatible API:

    bash

    curl http://localhost:11434/v1/chat/completions \

    -d '{"model": "llama3", "messages": [{"role": "user", "content": "Hello"}]}'


    Use Cases


  • Private AI assistant (no data leaves your server)
  • Code completion and review
  • Document summarization
  • Data extraction
  • Chatbots for internal tools
  • RAG (Retrieval-Augmented Generation)

  • Hardware Requirements


  • 7B models: 8 GB RAM, runs on CPU
  • 13B models: 16 GB RAM
  • 70B models: 64 GB RAM or GPU
  • GPU recommended for speed (NVIDIA CUDA, Apple Metal)

  • Ollama + Lobe Chat / Open WebUI


    Combine Ollama with a web UI:

    1. Ollama runs the model

    2. Lobe Chat or Open WebUI provides the ChatGPT-like interface

    3. Everything runs locally


    Deployment


    Deploy on TinyPod with sufficient RAM. CPU-only is fine for smaller models.


    Ollama democratizes AI. Run capable models on commodity hardware with complete privacy.