Prometheus + Grafana: Self-Hosted Monitoring Guide | TinyPod

The industry-standard monitoring duo. Prometheus collects metrics, Grafana visualizes them. Here's how to set it up for your infrastructure.

The Monitoring Stack

Prometheus and Grafana together form the most popular open-source monitoring solution. Prometheus handles data collection and storage. Grafana handles visualization and alerting.

How Prometheus Works

Prometheus uses a pull model: it scrapes metrics from your services at regular intervals.

Metrics Endpoints

Applications expose metrics at a /metrics endpoint. Prometheus scrapes this endpoint every 15-60 seconds and stores the data.

Time Series Data

Every metric is a time series: a sequence of timestamped values. "CPU usage was 45% at 10:00, 52% at 10:15, 38% at 10:30."

PromQL

Prometheus has its own query language for analyzing metrics. Calculate averages, rates, percentiles, and correlations.

How Grafana Works

Dashboards

Visual representations of your metrics. Line charts, gauges, tables, heatmaps, and more.

Data Sources

Grafana connects to Prometheus (and 40+ other data sources). One dashboard can combine data from multiple sources.

Alerting

Set conditions on any metric. When CPU exceeds 80% for 5 minutes, send an alert via Slack, email, PagerDuty, etc.

Essential Dashboards

Node Exporter Dashboard

System-level metrics: CPU, RAM, disk, network for each server.

Container Dashboard

Per-container metrics: CPU usage, memory, network I/O, restart count.

Application Dashboard

Request rate, error rate, latency (the RED method). Per-endpoint breakdown.

Deploying on TinyPod

The Grafana + Prometheus template deploys both services pre-configured:

1. Find the template in the TinyPod directory

2. Deploy with one click

3. Access Grafana at your subdomain

4. Import community dashboards from grafana.com

Key Metrics to Monitor

For Every Server

CPU usage (percent)

Memory usage (used vs available)

Disk usage (used vs total)

Network throughput (in/out)

For Every Application

Request rate (requests/second)

Error rate (errors/second)

Response time (p50, p95, p99)

Active connections

For Databases

Query latency

Connection count

Cache hit rate

Replication lag (if applicable)

Start with the basics and add more metrics as you identify what matters for your specific workload.

Prometheus and Grafana: The Self-Hosted Monitoring Stack