What is Prometheus – Overview of its role, components, and how it collects metrics
What is Thanos – How it adds storage, scalability, and high availability to Prometheus
What is Loki – How it handles logs and differs from Prometheus
References – Sources used in this note

What is Prometheus?

Prometheus is an open-source time-series database (TSDB) built for monitoring and alerting. It collects numeric metrics from various systems at regular intervals and stores them with timestamps and labels. These metrics can then be queried using PromQL, its powerful and flexible query language.

Prometheus works especially well in cloud-native environments, and has strong support for Kubernetes. In many setups, it’s managed using the Prometheus Operator, which simplifies deployment and configuration.

Prometheus Components

Prometheus Server

The core of Prometheus. Responsible for scraping metrics from configured endpoints and storing them in its internal time-series database (TSDB).

PromQL

The query language used to extract and analyze time-series data. It supports filtering, aggregation, and mathematical operations, making it powerful and flexible.

Prometheus Scraping

Prometheus collects data by pulling (scraping) metrics from HTTP endpoints at regular intervals. Targets and intervals are defined in a YAML configuration file.

Alertmanager

Manages alerts triggered by Prometheus rules. It handles deduplication, grouping, silencing, and routes notifications to external systems such as Slack, email, or PagerDuty.

Exporters

Software components that expose metrics from third-party services (e.g., databases, hardware), so Prometheus can scrape them.

Pushgateway

Used when services can’t be scraped directly by Prometheus, allowing them to push metrics to Prometheus via a central gateway.

Prometheus Operator

A Kubernetes-native tool for automating the deployment and management of Prometheus and Alertmanager instances within Kubernetes environments.

Prometheus Storage

Prometheus stores metrics in its built-in time-series database. It is optimized for fast writes and queries but is not designed for long-term data retention. For long-term storage, tools like Thanos are commonly used.

What is Thanos?

Thanos is an open-source project that extends Prometheus to solve these kinds of problems—mainly around long-term storage, scalability, and high availability.

It’s not a replacement for Prometheus, but more like an add-on layer. Thanos integrates directly with Prometheus by adding a few lightweight components that make the system much more powerful and production-ready.

Benefits of Using Thanos

Problem with Prometheus	How Thanos Solves It
Limited Scalability	Prometheus instances are isolated by default. Thanos Querier provides a global query view by aggregating data from multiple Prometheus servers, making it easier to scale across clusters or regions.
No Built-in High Availability	A failed Prometheus instance can result in data loss. Thanos Sidecar uploads data to remote object storage, providing redundant, durable storage and enabling highly available setups.
Short-Term Data Retention	Prometheus stores data locally, which limits how long metrics can be retained. Thanos enables long-term storage by offloading old data to services like AWS S3 or GCS. This supports retention over months or years.
No Downsampling or Deduplication	As data grows, queries slow down. Thanos automatically downsamples old data and deduplicates metrics collected from multiple Prometheus replicas. This keeps queries fast and accurate.
Storage Extension & API Compatibility	Thanos provides a Store Gateway that reads historical data from object storage and exposes it via a Prometheus-compatible API, making it easy to integrate with existing dashboards like Grafana.

Thanos Components

Thanos Sidecar

Runs alongside each Prometheus instance. It uploads metrics data to remote object storage (like S3 or GCS) and makes the local Prometheus data accessible to the rest of the Thanos system.

Thanos Querier

Provides a unified global query layer across multiple Prometheus instances and remote storage. This is where users send their queries.

Thanos Compactor

Optimizes and manages stored data by compacting time blocks and downsampling older metrics. This reduces storage usage and speeds up long-range queries.

Thanos Store Gateway

Connects to remote object storage and serves historical metric data back to the Querier. Even if local Prometheus no longer stores the data, it can still be queried.

Thanos Frontend

Improves query performance by splitting and parallelizing large queries. Useful in high-load or multi-user environments.

What Is Loki?

Loki is a horizontally scalable, highly available, and multi-tenant log aggregation system inspired by Prometheus. While Prometheus focuses on collecting and querying metrics, Loki focuses on logs. Another key difference is how data is collected—Loki uses a push-based model, meaning logs are pushed to it by agents like Promtail or Grafana Alloy, rather than being scraped like in Prometheus.

Unlike traditional log systems (e.g., ELK), Loki does not index the full content of logs. Instead, it indexes only a set of labels (metadata) for each log stream. This lightweight approach makes Loki more efficient and easier to operate at scale.

Loki vs. Prometheus

Data Types and Collection Methods

Feature	Prometheus	Loki
Data Type	Structured numerical metrics (time-series)	Unstructured log data (text)
Collection Model	Pull-based (scrapes metrics from targets)	Push-based (agents send logs to Loki)
Best Use Case	Real-time monitoring of performance and health	Debugging, incident investigation, forensic analysis

Storage Approach and Efficiency

Feature	Prometheus	Loki
Indexing Strategy	Stores and indexes full metric series	Indexes only metadata (labels), not full log content
Storage Method	Compressed time-series database (TSDB)	Compressed chunks stored in object storage (e.g., S3, GCS)
Cost Model	Efficient for numeric data; storage grows with metric cardinality	Cost-efficient; users pay mainly for queries, not just storage
Retention	Controlled by internal TSDB configuration	Object storage enables long-term retention at lower cost