In the last few months, I have been helping to identify and resolve production issues (both performance and product related). I had to analyze vast amount of logs, identify performance degradation and deviation, and issues related to Java heap and Garbage Collection (GC), as well as different issues affecting the health of WebSphere Application Server (WAS). In order to do the above-mentioned tasks efficiently, I have employed different tools (both open source and commercial). Even-though these tools are readily available, and usually good what they do, they may not be as effective as we like for our particular circumstances and we end-up writing our own custom tool or script to complement in certain areas. Same story here, I ended up writing a custom tool (let me call it a Log Parser) for log parsing, analyzing, making correlation, and reporting. I'm sharing my custom Log Parser here, hoping that it may be useful for other people as well. It is written in AWK and Shell script. It processes the following logs:
Let me shed some light on the internal functioning of the Parser visually. See the diagram 1.0 below
|
Diagram 1.0 |
As depicted in the diagram, the Parser is made of a set of script files (collection of different Parsers) and wrapper script - together acting like a suite. Each parser can be executed independently or invoked by the wrapper script. The Parser is driven by the logic in the script and is controlled by the input parameters and their values (control parameters, threshold parameters, correlation parameters, and transaction baseline values). It consumes the logs and writes different reports as an output.
Most interesting part is the input here. The Feedback loop/mechanism as shown in the diagram is to let the analyst know that he/she should continually refine the threshold and other applicable input parameter values based on output analysis.This feedback mechanism makes the Parser - a kind of expert engine. So, it is very important to regularly update your threshold values, filter keywords, and maintain an well established performance baseline. Parser helps you to maintain feedback mechanism, because it collects vital statistics and updates historical data files, doing so, it is not only collecting important data, but also quantifying the system events. Quantification helps to compare, generate alerts and make decisions. For example, you quantify in average how many particular errors you get per day or per hour or per server, or per transaction and based on that you define your threshold value. Let's say, based on a month long of observation, average number of daily transaction errors from server A, fluctuates from 10 to 30 in normal situation. So, your high mark for normal situation is 30. Based on this data, you can define your threshold value 35 for that particular error for that server.
What are the key benefits of using this Parser:
- Make troubleshooting faster and effective with built-in intelligence from lesson learned and baseline data. Parser identifies critical errors, their frequency, location, key performance numbers, current state of the environment like how many users, sessions, transactions, (if any) anomalies in the system, which help to narrow down the issue(s).
- Automatically collect key statistical data (performance, error or usage) and build a data mart. Parser collects all vital statistics like performance numbers, performance range, hourly user/session statistics, heap snapshots etc. and updates historical data files. These data can be used to generate history report and also in decision making process.
- Auto generate key summary reports for internal consumption and create delimited data files, which can be imported to spread sheet like Excel to prepare management reports. Basically, it can provide visibility to your entire application infrastructure.
- Create correlation. Parser creates correlation so that it becomes easier to identify and map transaction path (Web server to the Application server).
- Generate warning for possible future incidents/events. Parser can provide early warning of possible future events. Here is an example of generated warning: "
2.18383e+06 : average of Perm Generation After Full GC exceeds threshold 2097152 (K). There is a possibility of OutOfMemory in near future because of Not sufficient PermGen Space for AppSrv04
"
Getting started is very simple. No big-bang installation, or configuration is required. If you are running in Unix like environment, you just download the script, and launch the Parser from the directory where it is downloaded. If you are on Windows, you need Cygwin or Bash Shell that comes with MINGW to execute it.
1. thresholdValues.csv
As name implies, this file contains pre-defined name and threshold value pair for certain condition or events. Parser lookups these pre-defined condition, and when it detects one in a log file, it compares with threshold value and triggers/writes notification into output file (00_Alert.txt) if logged event exceeds the threshold value. Threshold can be performance based like 'notify if maximum response time exceeds 9 seconds' or event based like 'notify if maximum fatal count for a JVM exceeds 5'
Format:
Each line in thresholdValues.csv has multiple columns separated by pipe '|' and represent threshold definition for one complete event condition. See below:
Where:
See a sample thresholdValues.csv in github:
https://github.com/pppoudel/log-parser/blob/master/thresholdValues.csv
2. perfBaseLine.csv
This file contains pre-defined transactions (request URIs) and their baseline response time (in seconds). I suppose, you can get content for this file from your performance test result.
Format:
Each line in perfBaseLine.csv has two columns separated by pipe '|' which represent performance value for a given transaction (request/response). See below:
Where:
See a sample perfBaseLine.csv in github:
https://github.com/pppoudel/log-parser/blob/master/perfBaseLine.csv
3. WASCustomFilter.txt
Currently this input file is only consumed by websphereLogParser. It defines some custom error/keywords. It is to tell parser that you're interested and like to know if certain keywords or string in general are logged (because of certain condition) in a log file, which may be non-standard and specific to your environment/application.
Format:
It uses Regular expression to define custom error/keywords. Each new error definition goes to new line. See below:
See a sample WASCustomFilter.txt in github:
https://github.com/pppoudel/log-parser/blob/master/WASCustomFilter.txt
4. WAS_CloneIDs.csv
This file contains information that defines relationship (mapping) between HTTP session clone ID and WAS name. Clone ID constitutes part of HTTP session and can be logged into Web Server access_log.
With the relationship in hand, we can generate helpful analytical data that helps to identify transaction/request path end to end. Easiest way to find out clone ID for each WAS is to look your plugin-cfg.xml file.
Format:
Each line in WAS_CloneIDs.csv four columns separated by pipe '|'. See below:
Where:
See a sample WAS_CloneIDs.csv in github:
https://github.com/pppoudel/log-parser/blob/master/WAS_CloneIDs.csv
Output:
Each Parser update Alert file, and history reports (only if report type is 'daily') as well as generate summary report and other report files. For the complete list, see '#--------- Report/Output files -------#' and '#--------- History Report/Output files -------#' sections in each script file.
For further detail of each individual parser, visit the following blog posts: