Self-Hosted Logging Pipeline

January 24, 2025

For the last few months, I’ve been diving deep into my homelab, building infrastructure and automation. One of my main priorities has been automating logging and monitoring for all my hosts. This led to the creation of an automated monitoring pipeline leveraging Ansible, Grafana, Loki, Prometheus, and Jenkins. This blog will walk through my approach and share some code that might be useful if you choose to replicate this.

The Stack

Here’s a quick overview of the tools I used:

  1. Ansible: Automates the deployment and configuration of agents.
  2. Prometheus: Scrapes metrics for performance monitoring.
  3. Promtail: Forwards logs from the host to Loki.
  4. Node Exporter: Exports metrics on each host to be scraped by Prometheus.
  5. Loki: Centralizes and indexes logs from all hosts.
  6. Grafana: Visualization layer for logs and metrics.

Organizing the Ansible Inventory

First, ensure your Ansible inventory has reasonable grouping. This allows you to organize Prometheus jobs and into logical collections and will help keep your logs organized and searchable.

proxy:
  caddy1:
    ansible_host: 10.0.0.2
  caddy2:
    ansible_host: 10.0.0.3
gaming_servers:
  minecraft:
    ansible_host: 10.0.0.4

Deploying Promtail and Node Exporter Logging Agents

To install the Loki and Node Exporter (Prometheus) agents, I used two Ansible collections: Grafana Ansible Collection and Prometheus Collection.

The Node Exporter agent runs on port :9100 and passively exposes metrics about the host system, such as CPU usage, memory consumption, and disk I/O. Prometheus will be configured to actively scrape these endpoints, pulling the data into our monitoring stack for analysis and visualization.

The Loki agent, also known as Promtail, is responsible for forwarding logs to a Loki endpoint. It sends log data via HTTP/S and supports features like dynamic log label assignment based on file paths or content, making it easier to categorize and analyze logs.

These agents can be installed using two simple Ansible roles, which significantly simplify the setup process:

- name: Install logging agents on all hosts
  hosts: all
  become: true
  roles:
    - role: prometheus.prometheus.node_exporter
    - role: grafana.grafana.promtail

In a later section, I’ll demonstrate a method to generalize these configurations. This will help avoid overly complex or sprawling Ansible playbooks, making the setup easier to maintain and extend.

Generalize the Prometheus Configuration

Installing the agents is straightforward, but Prometheus needs to know where to find the node exported endpoints. This is defined in the /etc/prometheus/prometheus.yml. Manually configuring this file can be time-consuming and prone to errors, especially in environments with many agents.

By using templates and the Ansible inventory file, you can automate the process of writing and updating the prometheus.yml configuration. This ensures consistency, reduces the chance of mistakes, and makes it easier to scale or modify your setup as your environment grows.

Below is an example template for the Prometheus configuration:

{% for group, hosts in groups.items() %}
{% if group not in ['all', 'ungrouped'] %}

- job_name: '{{ group }}'
  static_configs:
{% for host in hosts %}
  - targets: ['{{ hostvars[host].ansible_host }}:9100']
    labels:
      instance: '{{ host }}'

{% endfor %}
{% endif %}
{% endfor %}

This template uses the Ansible groups and hosts to map static configurations for Prometheus. While some cases may require custom exporters or more advanced configurations, this approach covers 95% of deployments in my use case.

To render this configuration file, use the following Ansible task:

- name: Render Prometheus configuration file
  ansible.builtin.template:
    src: "{{ playbook_dir }}/files/prometheus.yml.j2"
    dest: /etc/prometheus/prometheus.yml
    owner: prometheus
    group: prometheus
    mode: '0644'

Generalizing the Promtail Configuration

To generalize the Promtail configuration, I used two methods. The first is leveraging Ansible group_vars, which allows you to set the promtail_scrape_configs and promtail_clients role variables for each group defined in our inventory.yaml. For details about these variables refer to the loki docs.

Currently, I am using group_vars/all.yml, which applies to all hosts and groups. However, if you had a specific use case, such as a group named proxy, you could create a new file, group_vars/proxy.yml, to define variables specific to that group. Variables in group_vars/proxy.yml would override any conflicting settings defined in group_vars/all.yml, providing flexibility for group-specific configurations.

The second method is to generalize how Promtail selects logs to forward. In this setup, Promtail forwards logs from systemd services and files located in /var/log/*.log. In the future, if you stand up a service, as long as it’s associated with a unit file or logs to /var/log, you will have visibility. This approach provides a broad and consistent logging configuration across your hosts, while still allowing for group-specific overrides when needed.

# group_vars/all.yml

promtail_runtime_mode: "acl"
promtail_clients:
  - url: "https://loki.example.com/loki/api/v1/push"

# Scrape logs from /var/log/*.log
promtail_scrape_configs:
  - job_name: system
    static_configs:
      - targets:
          - localhost
        labels:
          job: varlogs
          instance: "{{ ansible_fqdn }}"
          __path__: /var/log/*.log

  # Scrape logs from systemd journal
  - job_name: journal
    journal:
      max_age: 12h
      labels:
        job: systemd-journal
        instance: "{{ ansible_fqdn }}"
        ipv4: "{{ ansible_default_ipv4.address }}"
        ansible_groups: "{{ group_names | join(',') }}"
    relabel_configs:
      - source_labels: ['__journal__systemd_unit']
        target_label: 'unit'

The configuration uses labels to enrich the logs with useful metadata, including the IP address, instance name, group, and the specific systemd unit file name. By adding these labels, logs become much more structured and easier to search, especially when dealing with multiple instances.

arch.png

Automating with CI/CD

Finally to tie it all together I used Jenkins to run the pipeline and monitor version control for changes. You can chose what ever CI/CD solutions works for you but here is a sample of my pipeline.

jenkins-pipeline.png

The Final Result

With this setup, my homelab is not only fully functional but also highly observable. The logging pipeline ensures that any new host added to the inventory is automatically picked up and onboarded. This eliminates the need for manual configuration for each new device, saving time and reducing potential for error.

For anyone managing a homelab or small-scale infrastructure, this stack I think is a great choice. It’s powerful, flexible, and, best of all, incredibly rewarding to see it working seamlessly in action.