Learning Prometheus, Thanos & Loki: Monitoring & Logging Notes from a Beginner

Table of Contents


What is Prometheus?

Prometheus is an open-source time-series database (TSDB) built for monitoring and alerting. It collects numeric metrics from various systems at regular intervals and stores them with timestamps and labels. These metrics can then be queried using PromQL, its powerful and flexible query language.

Prometheus works especially well in cloud-native environments, and has strong support for Kubernetes. In many setups, it’s managed using the Prometheus Operator, which simplifies deployment and configuration.

Prometheus Components

Prometheus Server

The core of Prometheus. Responsible for scraping metrics from configured endpoints and storing them in its internal time-series database (TSDB).

PromQL

The query language used to extract and analyze time-series data. It supports filtering, aggregation, and mathematical operations, making it powerful and flexible.

Prometheus Scraping

Prometheus collects data by pulling (scraping) metrics from HTTP endpoints at regular intervals. Targets and intervals are defined in a YAML configuration file.

Alertmanager

Manages alerts triggered by Prometheus rules. It handles deduplication, grouping, silencing, and routes notifications to external systems such as Slack, email, or PagerDuty.

Exporters

Software components that expose metrics from third-party services (e.g., databases, hardware), so Prometheus can scrape them.

Pushgateway

Used when services can’t be scraped directly by Prometheus, allowing them to push metrics to Prometheus via a central gateway.

Prometheus Operator

A Kubernetes-native tool for automating the deployment and management of Prometheus and Alertmanager instances within Kubernetes environments.

Prometheus Storage

Prometheus stores metrics in its built-in time-series database. It is optimized for fast writes and queries but is not designed for long-term data retention. For long-term storage, tools like Thanos are commonly used.


What is Thanos?

Thanos is an open-source project that extends Prometheus to solve these kinds of problems—mainly around long-term storage, scalability, and high availability.

It’s not a replacement for Prometheus, but more like an add-on layer. Thanos integrates directly with Prometheus by adding a few lightweight components that make the system much more powerful and production-ready.

Benefits of Using Thanos

Problem with Prometheus How Thanos Solves It
Limited Scalability Prometheus instances are isolated by default. Thanos Querier provides a global query view by aggregating data from multiple Prometheus servers, making it easier to scale across clusters or regions.
No Built-in High Availability A failed Prometheus instance can result in data loss. Thanos Sidecar uploads data to remote object storage, providing redundant, durable storage and enabling highly available setups.
Short-Term Data Retention Prometheus stores data locally, which limits how long metrics can be retained. Thanos enables long-term storage by offloading old data to services like AWS S3 or GCS. This supports retention over months or years.
No Downsampling or Deduplication As data grows, queries slow down. Thanos automatically downsamples old data and deduplicates metrics collected from multiple Prometheus replicas. This keeps queries fast and accurate.
Storage Extension & API Compatibility Thanos provides a Store Gateway that reads historical data from object storage and exposes it via a Prometheus-compatible API, making it easy to integrate with existing dashboards like Grafana.

Thanos Components

Thanos Sidecar

Runs alongside each Prometheus instance. It uploads metrics data to remote object storage (like S3 or GCS) and makes the local Prometheus data accessible to the rest of the Thanos system.

Thanos Querier

Provides a unified global query layer across multiple Prometheus instances and remote storage. This is where users send their queries.

Thanos Compactor

Optimizes and manages stored data by compacting time blocks and downsampling older metrics. This reduces storage usage and speeds up long-range queries.

Thanos Store Gateway

Connects to remote object storage and serves historical metric data back to the Querier. Even if local Prometheus no longer stores the data, it can still be queried.

Thanos Frontend

Improves query performance by splitting and parallelizing large queries. Useful in high-load or multi-user environments.


What Is Loki?

Loki is a horizontally scalable, highly available, and multi-tenant log aggregation system inspired by Prometheus. While Prometheus focuses on collecting and querying metrics, Loki focuses on logs. Another key difference is how data is collected—Loki uses a push-based model, meaning logs are pushed to it by agents like Promtail or Grafana Alloy, rather than being scraped like in Prometheus.

Unlike traditional log systems (e.g., ELK), Loki does not index the full content of logs. Instead, it indexes only a set of labels (metadata) for each log stream. This lightweight approach makes Loki more efficient and easier to operate at scale.

Loki vs. Prometheus

Data Types and Collection Methods

Feature Prometheus Loki
Data Type Structured numerical metrics (time-series) Unstructured log data (text)
Collection Model Pull-based (scrapes metrics from targets) Push-based (agents send logs to Loki)
Best Use Case Real-time monitoring of performance and health Debugging, incident investigation, forensic analysis

Storage Approach and Efficiency

Feature Prometheus Loki
Indexing Strategy Stores and indexes full metric series Indexes only metadata (labels), not full log content
Storage Method Compressed time-series database (TSDB) Compressed chunks stored in object storage (e.g., S3, GCS)
Cost Model Efficient for numeric data; storage grows with metric cardinality Cost-efficient; users pay mainly for queries, not just storage
Retention Controlled by internal TSDB configuration Object storage enables long-term retention at lower cost

References

Overview | Prometheus
Prometheus project documentation for Overview
Prometheus vs Thanos: Key Differences & Best Practices | Last9
Everything you want to know about Prometheus and Thanos, their differences, and how they can work together.
Grafana Loki OSS | Log aggregation system
Loki is a horizontally scalable, highly available, multi-tenant log aggregation system inspired by Prometheus.
Loki overview | Grafana Loki documentation
Loki product overview and features.
Loki vs Prometheus: Side-by-Side Comparison for Logs and Metrics | Last9
Loki handles logs. Prometheus handles metrics. Here’s a side-by-side look at what they do, how they work, and when to use each.