Log Management using logstash, kibana, graylog2 on CentOS, RHEL, Fedora Part-1

Overview

So my rsyslog setup with log analyzer didn’t last 10 full days. The logs were being collected from the servers running about all kind of applications. It worked for a while before everything start crashing on me. To start with, the setup worked fine with few servers but as I continued adding more log analyzer slowed down. Filling mysql db with thousands of entries and accessing data simultaneously was a slow process. Also loganalyzer comes with basic search and I wanted something more, if you have used splunk you probably know what I mean. This post is about upgrading my existing setup with some cool products such as logstash and kibana.

Log Management using logstash, kibana, graylog2 on CentOS, RHEL, Fedora Part-1

Install some dependencies


# yum install gcc make automake autoconf curl-devel openssl-devel zlib-devel httpd-devel apr-devel apr-util-devel sqlite-devel ruby-rdoc ruby-devel gcc-c++ java git

The first change is to replace mysql with elasticsearch to save indexed logs. ElasticSearch is another NoSQL implementation to store all documents in the original form. It is able to achieve fast search responses because, instead of searching the text directly, it searches an index just as to find something quick you jump over to the index page at the end of a book.

Since there is no rpm package available for elasticsearch lets create one from the tarball.

To start the build process first install rpmtools


# yum install rpm-build rpmdevtools

Now a SPEC file for elasticsearch is written and made available by Tavis Aitken(big thanks to him). It could be downloaded from his github page.


# wget https://github.com/tavisto/elasticsearch-rpms/tarball/master
# tar zxvf master
# rpmdev-setuptree
# cp -r tavisto-elasticsearch-rpms-4b006bb/SPECS/* rpmbuild/SPECS/
# cp -r tavisto-elasticsearch-rpms-4b006bb/SOURCES/* rpmbuild/SOURCES/
# spectool -g rpmbuild/SPECS/elasticsearch.spec
# mv elasticsearch-0.20.4.tar.gz rpmbuild/SOURCES/elasticsearch-0.20.4.tar.gz

Start the build process and install the package


# rpmbuild -bb rpmbuild/SPECS/elasticsearch.spec
# rpm -ivh rpmbuild/RPMS/x86_64/elasticsearch-0.20.4-3.el6.x86_64.rpm

Now to setup elasticsearch edit basic configurations like cluster name, node name, data path and ip to listen on etc.
Open the elasticsearch.yml file in an editor of choice


# vi /etc/elasticsearch/elasticsearch.yml

Make some basic changes to it.


cluster.name: elasticsearch
node.name: “Keenan Kimble”
path.conf: /etc/elasticsearch
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: _eth0:ipv4_

Start the service and make sure it starts across server reboots.


# service elasticsearch start
# chkconfig elasticsearch on

Now there are some cool web frontends available for elasticsearch, the one I am using is known as elasticsearch-paramedic.

Install elasticsearch paramedic


# /usr/share/java/elasticsearch/bin/plugin -install karmi/elasticsearch-paramedic

elasticsearch01 300x176 Log Management using logstash, kibana, graylog2 on CentOS, RHEL, Fedora Part 1

Next we need to install grok. Grok is used to parse the log files, breaking it into feilds so to make better sense out of them. If you have played with logs, you probably know that parsing logs require writing regex. The beauty of grok is that you do not have to write regex, best you don’t even need to know what it is, everything is readymade here, thanks to Jordan Sissel the man behind grok and logstash.


# yum install libevent-devel tokyocabinet-devel pcre-devel gperf gcc make automake gcc-c++
# git clone https://github.com/jordansissel/grok.git /root/grok
# cd /root/grok
# make
# make install

Since we are not sending our logs to mysql anymore, I am gonna write them to a file and let logstash work on them. Hopefully you have rsyslog up and running by now, if not click here. The only change that has to be performed here is on the server side of rsyslog.conf. You can also setup logstash to read logs directly from port 514 instead of writing it to a file first.


# vi /etc/rsyslog.conf

Comment the lines for mysql and append the the line to add everything to a file called remotesys.log


#$ModLoad ommysql
#*.* :ommysql:127.0.0.1,rsyslogdb,rsyslog,Password
*.* /var/log/remote/remotesys.log

Create a new configuration file to be used by logstash


# mkdir -p /etc/logstash
# vi /etc/logstash/logstash.conf

Append the following to it.


input {
file {
type => "remote-syslog"
# Wildcards work, here :)
path => [ "/var/log/remote/remotesys.log" ]
}
}
filter {
grok {
type => "remote-syslog"
pattern => [ "%{SYSLOGBASE}" ]
}
grep {
type => "remote-syslog"
match => [ "@message", "apache-access:" ]
add_tag => "apache-access"
drop => false
}
grok {
type => "remote-syslog"
tags => ["apache-access"]
pattern => [ "%{COMBINEDAPACHELOG}" ]
}
}
output {
elasticsearch {
}}

The configuration consists of 3 parts, the input part is where I have added the log files to be processed. The filter comes next, this is where grok comes handy, the predefined patterns does the parsing of log files. The output is defined to send all the filtered documents to elasticsearch for storage.

If you wish logstash to pick logs directly from port 514 then tweak the conf file as follows. First stop rsyslog daemon.


# /etc/init.d/rsyslog stop
# vi /etc/logstash/logstash.conf

Replace the code with this one.


input {
tcp {
port => 514
type => rsyslog
}
udp {
port => 514
type => rsyslog
}
}

filter {
grok {
type => "rsyslog"
pattern => [ "<%{POSINT:syslog_pri}>%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{PROG:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" ]
add_field => [ "received_at", "%{@timestamp}" ]
add_field => [ "received_from", "%{@source_host}" ]
}
syslog_pri {
type => "rsyslog"
}
date {
type => "rsyslog"
syslog_timestamp => [ "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}
mutate {
type => "rsyslog"
exclude_tags => "_grokparsefailure"
replace => [ "@source_host", "%{syslog_hostname}" ]
replace => [ "@message", "%{syslog_message}" ]
}
mutate {
type => "rsyslog"
remove => [ "syslog_hostname", "syslog_message", "syslog_timestamp" ]
}
}

output {
elasticsearch { }
}

Finally download and start logstash


# wget https://logstash.objects.dreamhost.com/release/logstash-1.1.9-monolithic.jar

# java -jar /opt/logstash-1.1.9-monolithic.jar agent -f /etc/logstash/logstash.conf -- web --backend elasticsearch://localhost/?local &

Open the web browser and point it to

http://ipaddress-or-domainname:9292

Did you like what you see, probably not the best of interfaces. The second part of the series would change that. Click here for part 2