Better logstash filter to analyze SystemOut.log and some more

Last week I wrote a post about Using Docker and ELK to Analyze WebSphere Application Server SystemOut.log, but i wasn’t happy with my date filter and how the websphere response code is analyzed. The main problem was, that the WAS response code is not always on the beginning of a log message, or do not end with “:” all the time.

I replaced the used filter (formerly 4 lines with match) with following code:

grok {
        # was_shortname need to be regex, because numbers and $ can be in the word
        match => ["message", "\[%{DATA:wastimestamp} %{WORD:tz}\] %{BASE16NUM:was_threadID} (?<was_shortname>\b[A-Za-z0-9\$]{2,}\b) %{SPACE}%{WORD:was_loglevel}%{SPACE} %{GREEDYDATA:message}"]
        overwrite => [ "message" ]
        #tag_on_failure => [ ]
    }
grok {
        # Extract the WebSphere Response Code
        match => ["message", "(?<was_responsecode>[A-Z0-9]{9,10})[:,\s\s]"]
        tag_on_failure => [ ]
    }

You see i replaced the different patterns with a regular expression to find the response code. tag_on_failure => [] prevents generating an error, when no resonse code was logged.

Now i’m able to use was_responsecode to generate a graph with the different response codes over a timeline, so i’m able to see when errors appear more often.

Example

I created a new search for was_loglevel:E AND was_responsecode:* (show me all log messages with a response code and of loglevel E) and created a line chart on basis of this search:

2016-05-29_22-54-38

You see a strong peak for one of the response codes:

2016-05-29_22-56-05

Now we get the information, that CLFRW0034E is the reponse code of this peak. Let’s check what log message comes with this code:

2016-05-29_22-57-36

com.ibm.connections.search.index.process.incremental.IndexBuilder 
buildService CLFRW0034E: Error reading or writing to the index directory.  
Please check permissions and capacity.

Ok, that’s quite interesting, disk full or problem with NAS, NFS or something like this. I know this issue is already solved, because no more errors after this peak, but when kibana would send me an information when some error counts increase (and that’s possible) it would be great.

Adding timezone

To get the time in my local timezone or utc, even when the log was generated outside in an other timezone, i added following lines:

# add timezone information
    translate {
        field       => 'tz'
        destination => 'tz_num'
        dictionary  => [
            'CET',   '+0100',
            'CEST',  '+0200',
            'EDT', '-0400'
            ]
    }
    mutate {
        replace => ['timestamp', '%{wastimestamp} %{tz_num}']
    }

I need to install the translate plugin for logstash and you need to extend the list of timezones manually:

/opt/logstash/bin/logstash-plugin install logstash-filter-translate

Add plugins to Docker container

Since this weekend i like Docker more and more. It’s really easy to test different filters (i work on IBM Domino console.log filter and additional filebeat stuff) and restart with a clean environment again.

The official images for ELK do not have all plugins i wanted to use installed, so i need to create my own Docker containers for ElasticSearch and Logstash, only small changes were need for docker-compose.

Dockerfile Elasticsearch

FROM elasticsearch:latest
RUN bin/plugin install lmenezes/elasticsearch-kopf

Creating elasticsearch container:

docker build -t "stoeps:elasticsearch" .

Dockerfile Logstash

FROM logstash:latest

# Install logstash-input-beats
RUN /opt/logstash/bin/logstash-plugin install logstash-input-beats && /opt/logstash/bin/logstash-plugin install logstash-filter-translate

Creating logstash container:

docker build -t "stoeps:logstash" .

Docker-compose.yml

elasticsearch:
  image: stoeps:elasticsearch
  ...
logstash:
  image: stoeps:logstash
  ...
  ports:
    - "5000:5000"
    - "5044:5044"
  ...

So you see i changed the image names to the container names i created before and i added an extra port to enable filebeat.