Back to Blog
team@tinypod.app

Prometheus and Grafana: The Self-Hosted Monitoring Stack

The industry-standard monitoring duo. Prometheus collects metrics, Grafana visualizes them. Here's how to set it up for your infrastructure.

monitoringprometheusgrafana

The Monitoring Stack


Prometheus and Grafana together form the most popular open-source monitoring solution. Prometheus handles data collection and storage. Grafana handles visualization and alerting.


How Prometheus Works


Prometheus uses a pull model: it scrapes metrics from your services at regular intervals.


Metrics Endpoints

Applications expose metrics at a /metrics endpoint. Prometheus scrapes this endpoint every 15-60 seconds and stores the data.


Time Series Data

Every metric is a time series: a sequence of timestamped values. "CPU usage was 45% at 10:00, 52% at 10:15, 38% at 10:30."


PromQL

Prometheus has its own query language for analyzing metrics. Calculate averages, rates, percentiles, and correlations.


How Grafana Works


Dashboards

Visual representations of your metrics. Line charts, gauges, tables, heatmaps, and more.


Data Sources

Grafana connects to Prometheus (and 40+ other data sources). One dashboard can combine data from multiple sources.


Alerting

Set conditions on any metric. When CPU exceeds 80% for 5 minutes, send an alert via Slack, email, PagerDuty, etc.


Essential Dashboards


Node Exporter Dashboard

System-level metrics: CPU, RAM, disk, network for each server.


Container Dashboard

Per-container metrics: CPU usage, memory, network I/O, restart count.


Application Dashboard

Request rate, error rate, latency (the RED method). Per-endpoint breakdown.


Deploying on TinyPod


The Grafana + Prometheus template deploys both services pre-configured:

1. Find the template in the TinyPod directory

2. Deploy with one click

3. Access Grafana at your subdomain

4. Import community dashboards from grafana.com


Key Metrics to Monitor


For Every Server

  • CPU usage (percent)
  • Memory usage (used vs available)
  • Disk usage (used vs total)
  • Network throughput (in/out)

  • For Every Application

  • Request rate (requests/second)
  • Error rate (errors/second)
  • Response time (p50, p95, p99)
  • Active connections

  • For Databases

  • Query latency
  • Connection count
  • Cache hit rate
  • Replication lag (if applicable)

  • Start with the basics and add more metrics as you identify what matters for your specific workload.