A Simple & Powerful Log Aggregation Pipeline with Logstash and Filebeat

Imagine you’re a DevOps guy. You and your dream team have just released a quite big e-commerce system for your company. After celebrating with the team, you get a request from a QA girl to check whether an order with ID xxx-yyy has succeeded or not.

With youthful enthusiasm, you immediately come back to working desk. In front of your laptop, you open a terminal window and make SSH connection to database server. There’s no database record releated to the OrderID from QA girl, then you try connecting via SSH tunnel back to application (app) servers — the places where the application logic processing happened — to check application log files.

Unfortunately, there’re many app servers (e.g. 10 servers) need checking, thus you have to connect & check all of them — one by one, from 1st server to 10th server. It’s a boring task, and you may get frustrated if the related log files locate in the 10th server.

Now, I bet that you don’t want to have similarly unpleasant experience when support the QA girl next time. Pls keep reading, I will show you how!

Requirements

Of course, as this article title, we’re going to leverage Logstash and Filebeat — a couple come from elastic.co.

First and foremost, before rolling up our sleeves, let’s craft some requirements for the target solution:

  • It must be fast & lightweight because we don’t want to pay extras for server’s hardware upgradation;
  • Aggregated log files store at destination server must be grouped by day and source server’s hostname to make DevOps’ life easier;
  • The sub-folder tree of the aggregated log folder should be short for exploring (e.g. /home/centralized-log/2020-03-19/app1/var-log/messages is better than /home/centralized-log/2020-03-19/app1/var/log/messages);
  • The aggregated log record should be kept as it is (respect to source servers), i.e. we don’t need to parse the log lines at destination server. Why? To keep the overall cost as low as possible, we don’t install Elastic Search engine so we don’t need to transfer unstructured log lines to structured ones. In addition to that reason, plain old log files would be easier to check by human and they won’t occupy more disk space than structured ones;
  • Solution must provide a way to remove old logs automatically.

Simple enough? Let’s get started on work!

Deployment Process

Topology

Deployment diagram

Suppose that all our servers use CentOS 7 as the host OS, and:

  • Destination server’s internal IP: 192.168.1.1
  • VM 1's internal IP: 192.168.1.2
  • VM 2's internal IP: 192.168.1.3

Destination server setup

  • Install JDK version 1.8.0_151 at /home/jdk path
  • Install Logstash version 7.6.0 at /home folder
  • Create config/app.conf file, the Logstash process will listen at port 5044
input {
beats {
port => "5044"
}
}
filter {
mutate {
add_field => {
"agent_host" => "%{[host][name]}"
"agent_src_path" => "%{[log][file][path]}"
}
}
mutate {
split => {
"agent_src_path" => "/"
}
}
ruby {
code => "
last_idx = event.get('agent_src_path').length-1;
event.set('agent_src_path_last_index', last_idx);
event.set('agent_log_file_name', event.get('agent_src_path')[last_idx]);
"
}
mutate {
remove_field => [ "[agent_src_path][%{agent_src_path_last_index}]", "[agent_src_path][0]" ]
}
mutate {
join => { "agent_src_path" => "-" }
remove_field => ["agent_src_path_last_index"]
}
}
output {
stdout {
#codec => line
}
file {
path => "/home/centralized-log/%{+YYYY-MM-dd}/%{agent_host}/%{agent_src_path}/%{agent_log_file_name}"
codec => line { format => "%{message}"}
}
}
  • Create startup.sh script
#!/bin/bashJAVA_HOME=/home/jdk
export JAVA_HOME
cd /home/logstash-7.6.0
./bin/logstash -f config/app.conf --config.reload.automatic >/dev/null 2>&1 &
  • And the shutdown.sh script
#!/bin/bashpid=`pgrep -f logstash`
kill -9 $pid

Source servers setup

  • Install Filebeat version 7.6.0-linux-x86_64 at /home folder
  • Create app.ymlfile
filebeat.inputs:
- type: log
paths:
# general system logs
- /var/log/messages*
- /var/log/audit/audit.log*
# your application log path here
# ...
# ...
logging.metrics.enabled: falseoutput.logstash:
hosts: ["192.168.1.1:5044"]
  • Create startup.sh script
#!/bin/bashcd /home/filebeat-7.6.0-linux-x86_64
./filebeat -c app.yml -d "publish" >/dev/null 2>&1 &
  • And the shutdown.sh script
#!/bin/bashpid=`pgrep -f filebeat`
kill -9 $pid

Results

Just run all the startup.sh script and check the results yourself. For reference, here’s mine:

# tree /home/centralized-log/ -d
/home/centralized-log/
├── 2020-03-17
│ ├── app1
│ │ ├── home-RestApi-cython-dist-logs
│ │ └── var-log
│ ├── app2
│ │ ├── home-RestApi-cython-dist-logs
│ │ └── var-log
├── 2020-03-18
│ ├── app1
│ │ ├── home-RestApi-cython-dist-logs
│ │ └── var-log
│ ├── app2
│ │ ├── home-RestApi-cython-dist-logs
│ │ └── var-log

Last Thoughts

So far, you may wonder how to remove old logs automatically (the final requirement). It’s trivial task so I will leave this as your homework :)

Now, do you feel your DevOps life’s getting easier :) At next post, we will discover how Ansible could make it even better.

Feel free to follow me for further posts. Thanks for reading!

Happy DevOpsing \(^_^)/

Backend Leader @ Pingcom, Runner