An observability pipeline is the end-to-end flow to capture, process, and visualize logs and metrics to gain insights into system health, performance, and user activity.
There are several ways to build the pipeline using different tools and technologies. We will try to understand the components of the pipeline with the help of simple example.
An observability pipeline typically handles the following operations on telemetry data,
Telementry Data: Collected information about a system’s performance, behavior, and health that is transmitted for monitoring and analysis
The observability pipeline can include many components and operations as shown in the diagram below:
Pipeline components often overlap with data pipelines, and technology choices vary by architecture and scale.
Data Source : The origin of telemetry data - including application logs, system metrics, traces, and infrastructure logs. Examples: Applications (Java, Python etc.), Infrastructure (Kubernetes, Linux Servers etc.)
Collection : The lightweight mechanism that gathers telemetry from data sources via agents, SDKs, or APIs. Examples: Prometheus (Metric Collection), Filebeat
Processing : This can be a data pipeline to clean, filter, enrich, and transfortm the data into a processable format. Examples: Kafka, Datadog Processor
Storage: Telemetry data is persistently stored in suitable data stores. Examples: Elasticsearch, Grafana Loki, S3, Prometheus (Metric Store)
Query: Enables fast retrieval of telemetry data via domain specific query language. Examples: PromQL (Prometheus), Lucene/KQL (Elasticsearch), LogQL (Loki)
Visualisation: Present telemetry data through charts, graphs, heatmaps, geo-visualizations etc. to identify patterns, problems, trends and anomalies. Examples: Grafana, Kibana, PowerBI
Alert: Define thresholds and rules that trigger notifications when systems behave abnormally or violate the rules. Examples: Grafana, Elastcsearch security
Automation: To triggers functions, workflows or events for respective action of scaling, remediation, rollback etc. Examples: Lambda, Ansible
Security: Controls and policies to secure telemetry data at every stage. Includes encryption, access controls, masking PII, and auditing. Examples: TLS, Okta, Secret Manager, IAM
Governance & Compliance: Data management policies for usage, retention, and operational transparency. Ensures observability pipelines align with organizational and legal obligations. Examples: Open Policy Agent (OPA), IAM, Elastic Security
Let's build a very simple pipeline where we are ingesting user location detail and want to view the users on a world map. We will going to use following tools,
Make sure you have docker already installed on your system, we will going to use it for a quick setup. If you are not familiar with it, you can visit this page to get some basic understanding.
Our goal is to generate some random logs, filebeat will ingest it to elastic search and kibana can help to view it on the world map. This small exercise will help us to understand the pipeline better.
We may need to write a small program to generate the random logs which we will ingest to filebeat.
import json
import time
import random
users = ["mark", "bob", "charlie", "david", "james", "carl", "jennie", "jane", "alex"]
log_file = "logs.json"
while True:
user = random.choice(users)
log = {
"username": user,
"location": {
"lat": round(random.uniform(-90, 90), 6),
"lon": round(random.uniform(-180, 180), 6)
}
}
with open(log_file, "a") as f:
f.write(json.dumps(log) + "\n")
time.sleep(1)
This script will keep on appending logs in logs.json file in following format,
...
{"username": "david", "location": {"lat": 46.638099, "lon": -141.664286}}
{"username": "jennie", "location": {"lat": 2.067681, "lon": 6.107782}}
{"username": "charlie", "location": {"lat": -68.763366, "lon": -138.960341}}
{"username": "bob", "location": {"lat": -62.107125, "lon": -119.426837}}
...
Now, let's create docker-compose.yml,
version: "3.9"
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.15.0
container_name: elasticsearch
environment:
- discovery.type=single-node
- xpack.security.enabled=false
- ES_JAVA_OPTS=-Xms1g -Xmx1g
ports:
- "9200:9200"
volumes:
- es_data:/usr/share/elasticsearch/data
kibana:
image: docker.elastic.co/kibana/kibana:8.15.0
container_name: kibana
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
ports:
- "5601:5601"
depends_on:
- elasticsearch
filebeat:
image: docker.elastic.co/beats/filebeat:8.15.0
container_name: filebeat
user: root
volumes:
- ./filebeat.yml:/usr/share/filebeat/filebeat.yml
- ./logs.json:/usr/share/filebeat/logs/logs.json
depends_on:
- elasticsearch
volumes:
es_data:
We have added Elasticsearch, Kibana and Filebeat. For Filebeat we may need to supply additional configuration through filebeat.yml.
filebeat.inputs:
- type: log
enabled: true
paths:
- /usr/share/filebeat/logs/logs.json
start_position: beginning
json.keys_under_root: true
json.add_error_key: true
json.overwrite_keys: true
output.elasticsearch:
hosts: ["http://elasticsearch:9200"]
Once we have these files we can run following commands.
python3 log-generator.py
docker-compose in separate terminal,docker-compose up -d
This will spin the containers, first run will take few seconds to minutes. Once all the containers are running we can check following URLs,
In Kibana we will need to create a data view which specifies which elasticseach indices to use. This way we will be able to see our logs based on the time filter configured on top right.
In the side bar,
filebeat-*.
(*Index pattern is matching with three sources in my case which I created manually using dev tool but for your case it would be only one - data stream.)
Once we save this data view we can see logs are coming getting updated (update time filter or press refresh).
This confirms that the logs are properly getting ingested.
Now we want to view these location on the map,
There are many other features in Kibana you can explore and experiment with to deepen your understanding.
The simple setup that we tried can be extended to more complex observability scenarios, enabling real-time insights into applications, infrastructure, and user activity across the globe.
The starting point of any observability pipeline is the data source itself, along with a clear understanding of what you expect from that data—whether it’s querying, visualization, alerting, or automation.