How to Install and Use Fluentd for Log Aggregation on RHEL 7

Log aggregation is a critical discipline in any production environment: raw log files scattered across dozens of servers are nearly impossible to correlate during an incident. Fluentd, distributed as the td-agent package by Treasure Data, is an open-source data collector written in Ruby (with C extensions for performance) that unifies log collection, filtering, buffering, and routing from many sources to many destinations. On RHEL 7, td-agent integrates cleanly with systemctl and the standard yum package manager, making it straightforward to install and maintain. This tutorial covers installing td-agent from the official Treasure Data yum repository, writing a complete td-agent.conf configuration, reading from file tails and syslog, forwarding to a local file, Elasticsearch, and S3, using filter plugins for field manipulation, tuning buffer settings, testing the pipeline interactively, and understanding when to prefer the lighter-weight Fluent Bit alternative.

Prerequisites

RHEL 7 server with sudo or root access
Active internet connection to reach the Treasure Data yum repository
At least 512 MB of free RAM (Fluentd runs a Ruby runtime)
Optional: a running Elasticsearch cluster or S3 bucket if forwarding to those destinations
Basic understanding of log formats and YAML/configuration file syntax

Step 1: Add the Treasure Data yum Repository and Install td-agent

Treasure Data provides an official RPM repository for RHEL/CentOS. The recommended approach is to use the provided install script, which sets up the repo and installs the package in one step:

curl -L https://toolbelt.treasuredata.com/sh/install-redhat-td-agent4.sh | sh

If you prefer to add the repository manually for air-gapped or policy-controlled environments:

sudo tee /etc/yum.repos.d/td-agent.repo <<'EOF'
[td-agent]
name=TD Agent
baseurl=https://packages.treasuredata.com/4/redhat/7/x86_64
gpgcheck=1
gpgkey=https://packages.treasuredata.com/GPG-KEY-td-agent
enabled=1
EOF

sudo yum install -y td-agent

Enable and start the service:

sudo systemctl enable td-agent
sudo systemctl start td-agent
sudo systemctl status td-agent

By default, td-agent listens on TCP port 24224 (forward protocol) and writes its own logs to /var/log/td-agent/td-agent.log.

Step 2: Understand the Configuration File Structure

The main configuration file is /etc/td-agent/td-agent.conf. It is built around four directive types:

<source> — defines where Fluentd reads data from (tail, syslog, forward, HTTP, etc.)
<filter> — transforms or enriches events matching a tag pattern
<match> — sends events matching a tag pattern to an output plugin
<label> — groups directives into a named routing scope to avoid tag pollution

Tags are dot-separated strings (e.g., app.nginx.access) and wildcard patterns (app.**) route events through the pipeline.

Step 3: Configure a File Tail Source

The tail input plugin reads lines appended to a file, similar to tail -f. Open /etc/td-agent/td-agent.conf and replace its contents:

sudo tee /etc/td-agent/td-agent.conf <<'EOF'
# ─── Sources ─────────────────────────────────────────────────────────────────

# Tail the Nginx access log
<source>
  @type tail
  path /var/log/nginx/access.log
  pos_file /var/lib/td-agent/nginx-access.log.pos
  tag app.nginx.access
  read_from_head true
  <parse>
    @type nginx
  </parse>
</source>

# Collect syslog messages via UDP
<source>
  @type syslog
  port 5140
  bind 0.0.0.0
  tag system.syslog
  <parse>
    @type syslog
  </parse>
</source>

# ─── Filters ─────────────────────────────────────────────────────────────────

# Add a hostname field to every nginx event
<filter app.nginx.**>
  @type record_transformer
  <record>
    hostname "#{Socket.gethostname}"
    environment production
  </record>
</filter>

# Drop health-check requests to reduce noise
<filter app.nginx.access>
  @type grep
  <exclude>
    key path
    pattern /^/healthz/
  </exclude>
</filter>

# ─── Outputs ─────────────────────────────────────────────────────────────────

# Write nginx logs to a local file (JSON, rotated daily)
<match app.nginx.**>
  @type file
  path /var/log/td-agent/nginx/access
  append true
  <format>
    @type json
  </format>
  <buffer time>
    timekey 1d
    timekey_use_utc true
    timekey_wait 10m
  </buffer>
</match>

# Forward syslog events to Elasticsearch
<match system.**>
  @type elasticsearch
  host elasticsearch.example.com
  port 9200
  index_name fluentd-syslog
  type_name _doc
  <buffer>
    @type file
    path /var/lib/td-agent/buffer/elasticsearch
    flush_mode interval
    flush_interval 10s
    retry_max_times 5
    retry_wait 1s
    chunk_limit_size 8MB
    total_limit_size 512MB
  </buffer>
</match>
EOF

Step 4: Install the Elasticsearch Output Plugin

td-agent ships with many bundled plugins but the Elasticsearch output is a gem that must be installed separately:

sudo td-agent-gem install fluent-plugin-elasticsearch
# Verify installation
sudo td-agent-gem list | grep elasticsearch

For S3 output, install the S3 plugin:

sudo td-agent-gem install fluent-plugin-s3

Then add an S3 match block to your configuration:

<match app.**>
  @type s3
  aws_key_id YOUR_AWS_ACCESS_KEY
  aws_sec_key YOUR_AWS_SECRET_KEY
  s3_bucket your-log-bucket
  s3_region us-east-1
  path logs/nginx/%Y/%m/%d/
  s3_object_key_format %{path}%{time_slice}_%{index}.%{file_extension}
  store_as gzip
  <format>
    @type json
  </format>
  <buffer time>
    timekey 1h
    timekey_use_utc true
    timekey_wait 5m
  </buffer>
</match>

Step 5: Understanding Buffer Settings

Buffers are Fluentd’s reliability mechanism: events are first written to a buffer (memory or file), then flushed to the output. Key parameters:

@type file — persists the buffer to disk, survives td-agent restarts
flush_interval — how often to attempt flushing (e.g., 10s, 1m)
chunk_limit_size — maximum size of a single chunk before it is flushed
total_limit_size — maximum total buffer size; events are dropped if exceeded
retry_max_times — number of retry attempts before a chunk is discarded
overflow_action — what to do when the buffer is full: block, drop_oldest_chunk, or throw_exception

<buffer>
  @type file
  path /var/lib/td-agent/buffer/myoutput
  flush_mode interval
  flush_interval 30s
  flush_thread_count 2
  chunk_limit_size 16MB
  total_limit_size 1GB
  retry_type exponential_backoff
  retry_wait 1s
  retry_max_interval 30s
  retry_max_times 10
  overflow_action block
</buffer>

Step 6: Testing with echo and the HTTP Input Plugin

For quick interactive testing, enable Fluentd’s built-in HTTP input plugin by adding this source to the configuration:

<source>
  @type http
  port 8888
  bind 0.0.0.0
</source>

Reload td-agent and send a test event:

sudo systemctl restart td-agent

# Send a test JSON event with tag "test.http"
curl -s -X POST -d 'json={"message":"hello from curl","level":"info"}' 
  http://localhost:8888/test.http

# Check td-agent's own log to confirm receipt
tail -f /var/log/td-agent/td-agent.log

You can also pipe shell output directly to Fluentd using the fluentd command (standalone mode) or via td-agent with the exec input plugin:

# One-shot test using the fluent-cat utility bundled with td-agent
echo '{"message":"test event","host":"rhel7-node"}' | 
  sudo td-agent-gem exec fluent-cat --host 127.0.0.1 --port 24224 debug.test

Step 7: Fluentd vs Fluent Bit

Fluent Bit is a lightweight sibling project written entirely in C. It is designed for edge nodes, containers, and resource-constrained environments where the Ruby runtime overhead of Fluentd (td-agent) would be prohibitive. The typical architecture in a Kubernetes or microservices environment runs Fluent Bit as a DaemonSet on every node to collect and forward logs, with a central Fluentd aggregator handling enrichment, buffering, and fan-out to multiple destinations. On RHEL 7 bare-metal servers where resources are plentiful and rich plugin support (e.g., complex filtering, dozens of output targets) is needed, td-agent is the better choice. For IoT or embedded scenarios, Fluent Bit saves hundreds of megabytes of memory.

Step 8: Open Firewall Ports

# Forward protocol (for remote Fluent Bit forwarders)
sudo firewall-cmd --permanent --add-port=24224/tcp
sudo firewall-cmd --permanent --add-port=24224/udp
# Syslog UDP input
sudo firewall-cmd --permanent --add-port=5140/udp
# HTTP input
sudo firewall-cmd --permanent --add-port=8888/tcp
sudo firewall-cmd --reload

After reloading td-agent with your final configuration, verify that events flow end-to-end by tailing the output file or querying your Elasticsearch index. Fluentd’s structured approach to log routing — treating every log line as a typed, tagged event rather than a raw string — pays dividends as your infrastructure grows. By centralizing logs into Elasticsearch you gain full-text search and Kibana dashboards; by archiving to S3 you have cost-effective long-term retention. Once comfortable with the basics, explore the fluent-plugin-parser for custom regex-based parsing, and the fluent-plugin-rewrite-tag-filter for dynamic tag manipulation based on record content.

How to Install and Use Fluentd for Log Aggregation on RHEL 7