How to Install and Use Fluentd for Log Aggregation on RHEL 7
Log aggregation is a critical discipline in any production environment: raw log files scattered across dozens of servers are nearly impossible to correlate during an incident. Fluentd, distributed as the td-agent package by Treasure Data, is an open-source data collector written in Ruby (with C extensions for performance) that unifies log collection, filtering, buffering, and routing from many sources to many destinations. On RHEL 7, td-agent integrates cleanly with systemctl and the standard yum package manager, making it straightforward to install and maintain. This tutorial covers installing td-agent from the official Treasure Data yum repository, writing a complete td-agent.conf configuration, reading from file tails and syslog, forwarding to a local file, Elasticsearch, and S3, using filter plugins for field manipulation, tuning buffer settings, testing the pipeline interactively, and understanding when to prefer the lighter-weight Fluent Bit alternative.
Prerequisites
- RHEL 7 server with
sudoor root access - Active internet connection to reach the Treasure Data yum repository
- At least 512 MB of free RAM (Fluentd runs a Ruby runtime)
- Optional: a running Elasticsearch cluster or S3 bucket if forwarding to those destinations
- Basic understanding of log formats and YAML/configuration file syntax
Step 1: Add the Treasure Data yum Repository and Install td-agent
Treasure Data provides an official RPM repository for RHEL/CentOS. The recommended approach is to use the provided install script, which sets up the repo and installs the package in one step:
curl -L https://toolbelt.treasuredata.com/sh/install-redhat-td-agent4.sh | sh
If you prefer to add the repository manually for air-gapped or policy-controlled environments:
sudo tee /etc/yum.repos.d/td-agent.repo <<'EOF'
[td-agent]
name=TD Agent
baseurl=https://packages.treasuredata.com/4/redhat/7/x86_64
gpgcheck=1
gpgkey=https://packages.treasuredata.com/GPG-KEY-td-agent
enabled=1
EOF
sudo yum install -y td-agent
Enable and start the service:
sudo systemctl enable td-agent
sudo systemctl start td-agent
sudo systemctl status td-agent
By default, td-agent listens on TCP port 24224 (forward protocol) and writes its own logs to /var/log/td-agent/td-agent.log.
Step 2: Understand the Configuration File Structure
The main configuration file is /etc/td-agent/td-agent.conf. It is built around four directive types:
- <source> — defines where Fluentd reads data from (tail, syslog, forward, HTTP, etc.)
- <filter> — transforms or enriches events matching a tag pattern
- <match> — sends events matching a tag pattern to an output plugin
- <label> — groups directives into a named routing scope to avoid tag pollution
Tags are dot-separated strings (e.g., app.nginx.access) and wildcard patterns (app.**) route events through the pipeline.
Step 3: Configure a File Tail Source
The tail input plugin reads lines appended to a file, similar to tail -f. Open /etc/td-agent/td-agent.conf and replace its contents:
sudo tee /etc/td-agent/td-agent.conf <<'EOF'
# ─── Sources ─────────────────────────────────────────────────────────────────
# Tail the Nginx access log
<source>
@type tail
path /var/log/nginx/access.log
pos_file /var/lib/td-agent/nginx-access.log.pos
tag app.nginx.access
read_from_head true
<parse>
@type nginx
</parse>
</source>
# Collect syslog messages via UDP
<source>
@type syslog
port 5140
bind 0.0.0.0
tag system.syslog
<parse>
@type syslog
</parse>
</source>
# ─── Filters ─────────────────────────────────────────────────────────────────
# Add a hostname field to every nginx event
<filter app.nginx.**>
@type record_transformer
<record>
hostname "#{Socket.gethostname}"
environment production
</record>
</filter>
# Drop health-check requests to reduce noise
<filter app.nginx.access>
@type grep
<exclude>
key path
pattern /^/healthz/
</exclude>
</filter>
# ─── Outputs ─────────────────────────────────────────────────────────────────
# Write nginx logs to a local file (JSON, rotated daily)
<match app.nginx.**>
@type file
path /var/log/td-agent/nginx/access
append true
<format>
@type json
</format>
<buffer time>
timekey 1d
timekey_use_utc true
timekey_wait 10m
</buffer>
</match>
# Forward syslog events to Elasticsearch
<match system.**>
@type elasticsearch
host elasticsearch.example.com
port 9200
index_name fluentd-syslog
type_name _doc
<buffer>
@type file
path /var/lib/td-agent/buffer/elasticsearch
flush_mode interval
flush_interval 10s
retry_max_times 5
retry_wait 1s
chunk_limit_size 8MB
total_limit_size 512MB
</buffer>
</match>
EOF
Step 4: Install the Elasticsearch Output Plugin
td-agent ships with many bundled plugins but the Elasticsearch output is a gem that must be installed separately:
sudo td-agent-gem install fluent-plugin-elasticsearch
# Verify installation
sudo td-agent-gem list | grep elasticsearch
For S3 output, install the S3 plugin:
sudo td-agent-gem install fluent-plugin-s3
Then add an S3 match block to your configuration:
<match app.**>
@type s3
aws_key_id YOUR_AWS_ACCESS_KEY
aws_sec_key YOUR_AWS_SECRET_KEY
s3_bucket your-log-bucket
s3_region us-east-1
path logs/nginx/%Y/%m/%d/
s3_object_key_format %{path}%{time_slice}_%{index}.%{file_extension}
store_as gzip
<format>
@type json
</format>
<buffer time>
timekey 1h
timekey_use_utc true
timekey_wait 5m
</buffer>
</match>
Step 5: Understanding Buffer Settings
Buffers are Fluentd’s reliability mechanism: events are first written to a buffer (memory or file), then flushed to the output. Key parameters:
@type file— persists the buffer to disk, survives td-agent restartsflush_interval— how often to attempt flushing (e.g.,10s,1m)chunk_limit_size— maximum size of a single chunk before it is flushedtotal_limit_size— maximum total buffer size; events are dropped if exceededretry_max_times— number of retry attempts before a chunk is discardedoverflow_action— what to do when the buffer is full:block,drop_oldest_chunk, orthrow_exception
<buffer>
@type file
path /var/lib/td-agent/buffer/myoutput
flush_mode interval
flush_interval 30s
flush_thread_count 2
chunk_limit_size 16MB
total_limit_size 1GB
retry_type exponential_backoff
retry_wait 1s
retry_max_interval 30s
retry_max_times 10
overflow_action block
</buffer>
Step 6: Testing with echo and the HTTP Input Plugin
For quick interactive testing, enable Fluentd’s built-in HTTP input plugin by adding this source to the configuration:
<source>
@type http
port 8888
bind 0.0.0.0
</source>
Reload td-agent and send a test event:
sudo systemctl restart td-agent
# Send a test JSON event with tag "test.http"
curl -s -X POST -d 'json={"message":"hello from curl","level":"info"}'
http://localhost:8888/test.http
# Check td-agent's own log to confirm receipt
tail -f /var/log/td-agent/td-agent.log
You can also pipe shell output directly to Fluentd using the fluentd command (standalone mode) or via td-agent with the exec input plugin:
# One-shot test using the fluent-cat utility bundled with td-agent
echo '{"message":"test event","host":"rhel7-node"}' |
sudo td-agent-gem exec fluent-cat --host 127.0.0.1 --port 24224 debug.test
Step 7: Fluentd vs Fluent Bit
Fluent Bit is a lightweight sibling project written entirely in C. It is designed for edge nodes, containers, and resource-constrained environments where the Ruby runtime overhead of Fluentd (td-agent) would be prohibitive. The typical architecture in a Kubernetes or microservices environment runs Fluent Bit as a DaemonSet on every node to collect and forward logs, with a central Fluentd aggregator handling enrichment, buffering, and fan-out to multiple destinations. On RHEL 7 bare-metal servers where resources are plentiful and rich plugin support (e.g., complex filtering, dozens of output targets) is needed, td-agent is the better choice. For IoT or embedded scenarios, Fluent Bit saves hundreds of megabytes of memory.
Step 8: Open Firewall Ports
# Forward protocol (for remote Fluent Bit forwarders)
sudo firewall-cmd --permanent --add-port=24224/tcp
sudo firewall-cmd --permanent --add-port=24224/udp
# Syslog UDP input
sudo firewall-cmd --permanent --add-port=5140/udp
# HTTP input
sudo firewall-cmd --permanent --add-port=8888/tcp
sudo firewall-cmd --reload
After reloading td-agent with your final configuration, verify that events flow end-to-end by tailing the output file or querying your Elasticsearch index. Fluentd’s structured approach to log routing — treating every log line as a typed, tagged event rather than a raw string — pays dividends as your infrastructure grows. By centralizing logs into Elasticsearch you gain full-text search and Kibana dashboards; by archiving to S3 you have cost-effective long-term retention. Once comfortable with the basics, explore the fluent-plugin-parser for custom regex-based parsing, and the fluent-plugin-rewrite-tag-filter for dynamic tag manipulation based on record content.