Back to Blog
team@tinypod.app

Running AI Models Locally: Self-Hosting Ollama and Open WebUI

Run ChatGPT-like AI locally without sending data to OpenAI. Self-host Ollama with Open WebUI for private, uncensored AI.

aiollamaopen-webuillm

Why Self-Host AI?


Using ChatGPT or Claude means sending your data to external servers. For many use cases — analyzing confidential documents, coding with proprietary code, healthcare data — that's not acceptable.


Self-hosting AI models with Ollama gives you:

  • **Privacy**: Data never leaves your server
  • **No API costs**: Run unlimited queries
  • **No censorship**: Use uncensored model variants
  • **Speed**: Low latency for local inference
  • **Offline**: Works without internet access

  • What Is Ollama?


    Ollama is a tool for running large language models locally. It handles model downloading, optimization, and serving with a simple API. Think of it as Docker for AI models.


    What Is Open WebUI?


    Open WebUI (formerly Ollama WebUI) provides a ChatGPT-like interface for interacting with Ollama models. It supports chat history, multiple conversations, document upload, and more.


    Available Models


    Ollama supports dozens of models:

  • **Llama 3.1**: Meta's latest, great general-purpose model
  • **Mistral**: Fast and efficient, good for coding
  • **CodeLlama**: Specialized for code generation
  • **Phi-3**: Microsoft's efficient small model
  • **Gemma**: Google's open-weight model

  • Resource Requirements


    AI models are resource-intensive:


    | Model Size | RAM Required | Quality |

    |-----------|-------------|---------|

    | 7B parameters | 8 GB | Good for simple tasks |

    | 13B parameters | 16 GB | Good general quality |

    | 70B parameters | 48 GB | Near GPT-4 quality |


    For most self-hosting scenarios, 7B-13B models on a server with 8-16 GB RAM provide excellent results.


    Deploying on TinyPod


    1. Find the "Ollama + Open WebUI" template in the directory

    2. Deploy with at least 2 cores and 4 GB RAM

    3. Access Open WebUI at your subdomain

    4. Pull a model: the UI lets you download models with one click

    5. Start chatting!


    Use Cases


    Private Code Assistant

    Use CodeLlama or Mistral for code completion, review, and debugging without sending your code to external APIs.


    Document Analysis

    Upload PDFs and documents, ask questions, get summaries. Everything stays on your server.


    Data Extraction

    Process sensitive data through AI without compliance concerns.


    Learning and Experimentation

    Try different models, fine-tune for your use case, experiment without per-query costs.


    Tips for Best Performance


  • Start with smaller models (7B) and upgrade if quality isn't sufficient
  • Use quantized models (Q4_K_M) for 50% less memory usage with minimal quality loss
  • Set appropriate context window sizes — larger contexts use more memory
  • Consider GPU acceleration for production workloads