How to Use rsync for Efficient File Synchronisation on RHEL 7
The rsync utility is the de facto standard for file synchronisation on Linux. Unlike a simple cp or scp, rsync transfers only the differences between source and destination, making it extraordinarily efficient for large datasets and incremental backups. On Red Hat Enterprise Linux 7, rsync comes pre-installed and integrates well with SSH, cron and the system’s firewall infrastructure. This tutorial covers rsync from the basics through to daemon mode, bandwidth limiting, and automated backup scheduling — everything you need to use rsync confidently in production.
Prerequisites
- RHEL 7 system with root or sudo access.
- rsync installed:
yum install -y rsync(usually present by default). - SSH access configured for remote operations.
- Basic familiarity with the command line and cron.
Step 1: Understanding Basic rsync Syntax
The fundamental rsync command follows the same source/destination pattern as cp, but with a critical subtlety around trailing slashes on directory paths.
# Basic syntax
rsync [OPTIONS] SOURCE DESTINATION
# Copy a single file
rsync /home/john/report.pdf /backup/report.pdf
# Copy a directory (trailing slash on source matters!)
# With trailing slash: copies CONTENTS of source into destination
rsync /home/john/docs/ /backup/docs/
# Without trailing slash: copies the directory ITSELF into destination
# Result: /backup/docs/ will contain a "docs" subdirectory
rsync /home/john/docs /backup/
The trailing slash rule is one of the most common sources of rsync confusion. A good mental model: the trailing slash means “the contents of this directory”, while no trailing slash means “this directory itself”.
Step 2: Essential rsync Flags (-avz)
The combination -avz is the most commonly used flag set for everyday rsync operations:
- -a (archive) — Enables archive mode: recursion (-r), preserve symlinks (-l), preserve permissions (-p), preserve timestamps (-t), preserve group (-g), preserve owner (-o), and preserve device files (-D). This is a single flag that replaces many individual flags.
- -v (verbose) — Print the name of each file being transferred. Use
-vvfor even more detail. - -z (compress) — Compress file data during transfer. Reduces bandwidth usage, especially on slow networks. Not beneficial on fast local networks as it adds CPU overhead.
# Standard archive sync with verbose output and compression
rsync -avz /home/john/projects/ /backup/projects/
# Sample output:
# sending incremental file list
# ./
# app.py
# config.yaml
# static/
# static/logo.png
# sent 45,231 bytes received 92 bytes 90,646.00 bytes/sec
# total size is 1,024,512 speedup is 22.62
Step 3: Deleting Files at the Destination with –delete
By default rsync only adds and updates files at the destination — it never removes files that have been deleted from the source. To create a true mirror, add --delete.
# Mirror source to destination exactly, including deletions
rsync -avz --delete /home/john/projects/ /backup/projects/
# --delete-before: delete at destination before transferring (default behaviour)
# --delete-after: delete at destination after transferring new files
# --delete-delay: delay deletions until after all other transfers
# SAFETY: Use --dry-run first to see what would be deleted
rsync -avz --delete --dry-run /home/john/projects/ /backup/projects/
# Files prefixed with "deleting" in the output would be removed
Always test with --dry-run (or its alias -n) before running a destructive sync for the first time. This is especially important when --delete is involved.
Step 4: Excluding Files and Directories
The --exclude and --exclude-from options allow you to skip files matching specific patterns.
# Exclude a specific file
rsync -avz --exclude="*.log" /var/www/html/ /backup/webroot/
# Exclude multiple patterns
rsync -avz
--exclude="*.log"
--exclude="*.tmp"
--exclude=".git/"
--exclude="node_modules/"
/home/john/projects/ /backup/projects/
# Exclude using a file containing patterns (one per line)
cat > /etc/rsync-excludes.txt << 'EOF'
*.log
*.tmp
*.swp
.git/
node_modules/
__pycache__/
*.pyc
EOF
rsync -avz --exclude-from="/etc/rsync-excludes.txt" /home/john/projects/ /backup/projects/
Step 5: Remote Synchronisation Over SSH
rsync can synchronise files between two machines using SSH as the transport layer. No special daemon setup is required — just SSH access.
# Push: copy from local to remote server
rsync -avz /home/john/projects/ [email protected]:/backup/projects/
# Pull: copy from remote server to local machine
rsync -avz [email protected]:/var/www/html/ /local/backup/webroot/
# Use a non-standard SSH port
rsync -avz -e "ssh -p 2222" /home/john/docs/ [email protected]:/backup/docs/
# Use SSH with a specific identity file (key-based auth)
rsync -avz -e "ssh -i /root/.ssh/backup_key" /data/ [email protected]:/data/
# Full mirror with delete, compression, and non-standard SSH port
rsync -avz --delete
-e "ssh -p 2222 -i /root/.ssh/backup_key"
/var/www/html/
[email protected]:/backup/webroot/
For automated backups over SSH, use key-based authentication and restrict the backup key in ~/.ssh/authorized_keys using the command= option to limit what the key can do.
Step 6: Showing Progress During Transfer
For large transfers, seeing progress is important to know the sync is still running and to estimate completion time.
# Show per-file progress (percentage, bytes, transfer rate)
rsync -avz --progress /data/large-dataset/ /backup/large-dataset/
# --info=progress2 shows overall progress (more useful for many small files)
rsync -avz --info=progress2 /data/large-dataset/ /backup/large-dataset/
# Sample output:
# 125,892,608 42% 98.54MB/s 0:00:02 (xfr#127, to-chk=341/523)
# Combine both for maximum visibility
rsync -avz --progress --stats /data/ /backup/data/
Step 7: Limiting Bandwidth Usage
When running rsync over a shared network link, limiting bandwidth prevents it from saturating the connection.
# Limit transfer rate to 10 MB/s (value is in KB/s)
rsync -avz --bwlimit=10240 /data/ [email protected]:/backup/data/
# Limit to 5 MB/s
rsync -avz --bwlimit=5120 /var/www/ [email protected]:/backup/www/
# You can also use the suffix K, M (rsync 3.1+)
rsync -avz --bwlimit=10M /data/ /backup/data/
Step 8: Automating Backups with cron
Combining rsync with cron provides a powerful, lightweight backup solution without requiring dedicated backup software.
# Create a backup script
cat > /usr/local/bin/rsync-backup.sh <> "$LOG"
rsync -avz --delete
--exclude="*.tmp"
--exclude="*.log"
-e "ssh -i $SSH_KEY -o StrictHostKeyChecking=no"
/var/www/html/
"$REMOTE":/backup/webroot/ >> "$LOG" 2>&1
if [ $? -eq 0 ]; then
echo "[$DATE] Backup completed successfully" >> "$LOG"
else
echo "[$DATE] Backup FAILED" >> "$LOG"
fi
EOF
chmod +x /usr/local/bin/rsync-backup.sh
# Schedule it to run daily at 2:00 AM via cron
crontab -e
# Add this line:
# 0 2 * * * /usr/local/bin/rsync-backup.sh
Step 9: rsync Daemon Mode
rsync can run as a daemon (rsyncd) to serve files without requiring SSH. This is useful in controlled environments where you want faster transfers and simpler access control without SSH overhead.
# Install rsync (if not already installed)
yum install -y rsync
# Create the rsyncd configuration file
cat > /etc/rsyncd.conf < /etc/rsyncd.secrets
chmod 600 /etc/rsyncd.secrets
# Enable and start the rsyncd service
systemctl enable rsyncd
systemctl start rsyncd
# Open the firewall port (rsync uses port 873)
firewall-cmd --permanent --add-service=rsyncd
firewall-cmd --reload
# Connect to the rsync daemon from a client
rsync -avz [email protected]::webbackup /local/backup/
Conclusion
rsync is a remarkably versatile tool that covers a wide spectrum of use cases, from quick local file copies through to automated encrypted remote backups with bandwidth throttling. On RHEL 7, the combination of rsync with SSH key authentication, cron scheduling, and thoughtful use of --exclude and --delete gives you a production-grade backup and synchronisation solution with no additional software required. Always test with --dry-run before committing to destructive operations, monitor your backup logs regularly, and consider rotating or timestamping backup directories for point-in-time recovery capability.