SysGenius-e-Log: access

Showing posts with label access_log. Show all posts

Log Parsing, Analysis, Correlation, and Reporting Engine

In the last few months, I have been helping to identify and resolve production issues (both performance and product related). I had to analyze vast amount of logs, identify performance degradation and deviation, and issues related to Java heap and Garbage Collection (GC), as well as different issues affecting the health of WebSphere Application Server (WAS). In order to do the above-mentioned tasks efficiently, I have employed different tools (both open source and commercial). Even-though these tools are readily available, and usually good what they do, they may not be as effective as we like for our particular circumstances and we end-up writing our own custom tool or script to complement in certain areas. Same story here, I ended up writing a custom tool (let me call it a Log Parser) for log parsing, analyzing, making correlation, and reporting. I'm sharing my custom Log Parser here, hoping that it may be useful for other people as well. It is written in AWK and Shell script. It processes the following logs:

SystemOut.log generated by IBM WebSphere Application Server (WAS)
access_log and error_log logs generated by Apache or IBM HTTP Server(IHS)
native_stdout.log or verbose GC logs generated by Java Virtual Machine (JVM).

Let me shed some light on the internal functioning of the Parser visually. See the diagram 1.0 below

Diagram 1.0

As depicted in the diagram, the Parser is made of a set of script files (collection of different Parsers) and wrapper script - together acting like a suite. Each parser can be executed independently or invoked by the wrapper script. The Parser is driven by the logic in the script and is controlled by the input parameters and their values (control parameters, threshold parameters, correlation parameters, and transaction baseline values). It consumes the logs and writes different reports as an output.
Most interesting part is the input here. The Feedback loop/mechanism as shown in the diagram is to let the analyst know that he/she should continually refine the threshold and other applicable input parameter values based on output analysis.This feedback mechanism makes the Parser - a kind of expert engine. So, it is very important to regularly update your threshold values, filter keywords, and maintain an well established performance baseline. Parser helps you to maintain feedback mechanism, because it collects vital statistics and updates historical data files, doing so, it is not only collecting important data, but also quantifying the system events. Quantification helps to compare, generate alerts and make decisions. For example, you quantify in average how many particular errors you get per day or per hour or per server, or per transaction and based on that you define your threshold value. Let's say, based on a month long of observation, average number of daily transaction errors from server A, fluctuates from 10 to 30 in normal situation. So, your high mark for normal situation is 30. Based on this data, you can define your threshold value 35 for that particular error for that server.

What are the key benefits of using this Parser:

Make troubleshooting faster and effective with built-in intelligence from lesson learned and baseline data. Parser identifies critical errors, their frequency, location, key performance numbers, current state of the environment like how many users, sessions, transactions, (if any) anomalies in the system, which help to narrow down the issue(s).
Automatically collect key statistical data (performance, error or usage) and build a data mart. Parser collects all vital statistics like performance numbers, performance range, hourly user/session statistics, heap snapshots etc. and updates historical data files. These data can be used to generate history report and also in decision making process.
Auto generate key summary reports for internal consumption and create delimited data files, which can be imported to spread sheet like Excel to prepare management reports. Basically, it can provide visibility to your entire application infrastructure.
Create correlation. Parser creates correlation so that it becomes easier to identify and map transaction path (Web server to the Application server).
Generate warning for possible future incidents/events. Parser can provide early warning of possible future events. Here is an example of generated warning: "2.18383e+06 : average of Perm Generation After Full GC exceeds threshold 2097152 (K). There is a possibility of OutOfMemory in near future because of Not sufficient PermGen Space for AppSrv04"

Getting started is very simple. No big-bang installation, or configuration is required. If you are running in Unix like environment, you just download the script, and launch the Parser from the directory where it is downloaded. If you are on Windows, you need Cygwin or Bash Shell that comes with MINGW to execute it.

How to execute?

You can see all the available options, by just executing:


$> ./masterLogParser.sh



Manadatory option '--rootcontext' or '-c' missing

`-c\|--rootcontext:`	`Required. Source path from where log files are read.`
`-t\|--rpttype:`	`Optional. Values are: 'daily' or 'ondemand'. 'ondemand' is default value. It is used to control logic - like whether or not to update historical data files. Only 'daily' option creates and updates historical data files.`
`-d\|--recorddate:`	`Optional. It is the log entry date. Meaning log entries with that date will be processed. It takes the format 'YYYY-MM-DD'. Default is to use current date. However, if 'daily' is chosen as 2nd argument, and log entry date is not provided, it defaults to 'date - 1 day'.`
`-l\|--rptloc:`	`Optional. It is report directory where all generated reports are written. Default value is /tmp/`
`-o\|--procoption:`	`Optional. It represents the processing option. Values can be 'full' or 'partial'. Default value is 'partial'. This option is currently being used only for Verbose GC log parser.`

Here are few examples:


  # processing current day's logs

   $> ./masterLogParser.sh --rootcontext <log-path>

  

  # processing yesterday's logs with historical report updates

   $> ./masterLogParser.sh --rootcontext <log-path> --rpttype daily

  

  # processing any day's logs updates

   $> ./masterLogParser.sh --rootcontext <log-path> --recorddate <date in (YYYY-MM-DD) format>

See masterLogParser.sh in github: https://github.com/pppoudel/log-parser/blob/master/masterLogParser.sh

Input

1. thresholdValues.csv

As name implies, this file contains pre-defined name and threshold value pair for certain condition or events. Parser lookups these pre-defined condition, and when it detects one in a log file, it compares with threshold value and triggers/writes notification into output file (00_Alert.txt) if logged event exceeds the threshold value. Threshold can be performance based like 'notify if maximum response time exceeds 9 seconds' or event based like 'notify if maximum fatal count for a JVM exceeds 5'
Format:
Each line in thresholdValues.csv has multiple columns separated by pipe '|' and represent threshold definition for one complete event condition. See below:


event-name|value|server-identifier|event-description

e.g.

httpAvgRespTimeTh|2.5|http|Threshold for Average response time in sec.

Where:

`event-name:`	`name of the event like httpAvgRespTimeTh (http) Average Response Time threshold.`
`value:`	`threshold value for this specific event. In this case it is 2.5 seconds.`
`server-identifier:`	`Which log/server this value belongs to. In this case it is 'http' server.`
`event-description:`	`provides some details what this threshold is about.`

See a sample thresholdValues.csv in github: https://github.com/pppoudel/log-parser/blob/master/thresholdValues.csv

2. perfBaseLine.csv

This file contains pre-defined transactions (request URIs) and their baseline response time (in seconds). I suppose, you can get content for this file from your performance test result.

Format:
Each line in perfBaseLine.csv has two columns separated by pipe '|' which represent performance value for a given transaction (request/response). See below:


request-name|response-time (in seconds)

e.g.

finManagement/account_add.do|1.57756

Where:

`request-name:`	`represents request/response URI or transaction name, whatever you call it. In this case it is finManagement/account_add.do`
`response-time:`	`response time for the transaction to complete in seconds. 1.5 seconds in this case.`

See a sample perfBaseLine.csv in github: https://github.com/pppoudel/log-parser/blob/master/perfBaseLine.csv

3. WASCustomFilter.txt

Currently this input file is only consumed by websphereLogParser. It defines some custom error/keywords. It is to tell parser that you're interested and like to know if certain keywords or string in general are logged (because of certain condition) in a log file, which may be non-standard and specific to your environment/application.

Format:
It uses Regular expression to define custom error/keywords. Each new error definition goes to new line. See below:


Error.*Getting.*Folder

503.*Service.*Temporarily.*Unavailable

CORBA.*NO_RESPONSE

ORA-01013:

See a sample WASCustomFilter.txt in github: https://github.com/pppoudel/log-parser/blob/master/WASCustomFilter.txt

4. WAS_CloneIDs.csv

This file contains information that defines relationship (mapping) between HTTP session clone ID and WAS name. Clone ID constitutes part of HTTP session and can be logged into Web Server access_log. With the relationship in hand, we can generate helpful analytical data that helps to identify transaction/request path end to end. Easiest way to find out clone ID for each WAS is to look your plugin-cfg.xml file.

Format:
Each line in WAS_CloneIDs.csv four columns separated by pipe '|'. See below:


cloneID|WAS-name|hostname

e.g.

23532em3r|AppSrv01|washost082

Where:

`cloneID`	`cloneID is part of jSession. 23532em3r in above example. Refer to https://www.ibm.com/support/knowledgecenter/en/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/txml_httpsessionclone.html`
`WAS-name`	`WebSphere Application Server (WAS) name. AppSrv01 in above example.`
`hostname`	`Hostname of machine/server where particular WAS resides. washost082 in above example.`

See a sample WAS_CloneIDs.csv in github: https://github.com/pppoudel/log-parser/blob/master/WAS_CloneIDs.csv

Output:

Each Parser update Alert file, and history reports (only if report type is 'daily') as well as generate summary report and other report files. For the complete list, see '#--------- Report/Output files -------#' and '#--------- History Report/Output files -------#' sections in each script file.

For further detail of each individual parser, visit the following blog posts:

websphereLogParser.sh for parsing, analyzing and reporting WebSphere Application Server (WAS) SystemOut.log
webAccessLogParser.sh for parsing, analyzing and reporting Apache/IBM HTTP Server (IHS) access_log
webErrorLogParser.sh for parsing, analyzing and reporting Apache/IBM HTTP Server (IHS) error_log
javaGCStatsParser.sh for parsing, analyzing and reporting Java verbose Garbage Collection (GC) log

How to Parse Apache access_Log for Troubleshooting & Reporting

Note: if you haven't already, see Log Parsing, Analysis, Correlation, and Reporting Engine post first.

Access log is a great source of information (for troubleshooting, performance analysis, user trend reporting etc.) as it records all requests processed by Apache Web server. What information to capture in access log is controlled using CustomLog and LogFormat directives. Visit Apache site (https://httpd.apache.org/docs/2.4/logs.html#accesslog) for more information about the access log.
This particular Log Parser that I'm discussing here is written to parse the access_log generated using the following log format:


   LogFormat "%h %l %u %t \"%r\" %>s %b JSESSIONID=\"%{JSESSIONID}C\" UID=\"%{UID}C\" %D %I %O \"%{User-agent}i\" %v" common

Note: if your access_log is generated using different LogFormat, you may need to tweak the script a little bit.

Finding log files: currently parser finds all access_log in the given path if:
$recDate == $currDate
or access_log.$rec0MM$rec0DD$recYY
if ($recDate < $currDate).
Where:

`recDate:`	`Optional. It is the log entry date. Meaning log entries with that date will be processed. It takes the format 'YYYY-MM-DD'. Default is to use current date. However, if 'daily' is chosen as 2nd argument, and log entry date is not provided, it defaults to 'date - 1 day'.`
`currDate:`	`Optional. It is the log entry date. Meaning log entries with that date will be processed. It takes the format 'YYYY-MM-DD'. Default is to use current date. However, if 'daily' is chosen as 2nd argument, and log entry date is not provided, it defaults to 'date - 1 day'.`
`rec0MM:`	`rec0MM is Month like 01 (01 represent month of January)`
`rec0DD:`	`rec0DD is Day like 01 (01 represents the first day of a month)`
`recYY:`	`recYY is Year like 17 (17 represent year of 2017)`

Review the actual script available in github - https://github.com/pppoudel/log-parser/blob/master/webAccessLogParser.sh for details.

Note: script is written to parse the date format like '13/Jun/2015:10:32:04 -0400' in access_log. If your access_log uses different date format, you may need to tweak the section of script which parses date.

How to execute:

You can see all the available options, by just launching:
$> ./webAccessLogParser.sh

Few examples are here:


  # processing current day's logs

   $> ./webAccessLogParser.sh --rootcontext <log-path>

  

  # processing yesterday's logs with historical report updates

   $> ./webAccessLogParser.sh --rootcontext <log-path> --rpttype daily

  

  # processing any day's logs updates

   $> ./webAccessLogParser.sh --rootcontext <log-path> --recorddate <date in (YYYY-MM-DD) format>

Output

Report/Output files:

$rptDir/00_Alert.txt
$rptDir/02_WebAccessLogSummaryRpt.txt
$rptDir/WebAccessLogRpt_all.csv
$rptDir/WebAccessLog_discardedRpt.csv
$rptDir/WebAccessLogSummaryByDomainRpt.csv
$rptDir/WebAccessLogSummaryByTransactionRpt.csv
$rptDir/WebAccessLogSummaryByUIDRpt.csv
$rptDir/WebAccessLogSummaryByRC400PlusURLRpt.csv
$rptDir/WebAccessLogSummaryByUidSessionRpt.csv
$rptDir/WebAccessLogSummaryUnknowUARpt.csv
$rptDir/WebHourlyDomainUsageByUid.csv
$rptDir/WebHourlyDomainUsageBySess.csv
$rptDir/WebDlyDomainUsage.csv

Where $rptDir is report directory. Default value is $TMP/$recDate

History Report/Output files:

# These are historical reports. Each run will append record in existing report file.

$pDir/WebPerfHistoryRpt.csv
$pDir/WebHourlyAvgRespTimeHistoryRpt.csv
$pDir/WebUniqueUsersHourlyHistoryRpt_all.csv
$pDir/WebRequestTypeHistoryRpt.csv
pDir/WebResponseCodeHistoryRpt.csv
$pDir/WebStatsByIHSHistoryRpt.csv
$pDir/WebStatsByWASHistoryRpt.csv

Where $pDir is parent of $rptDir.

See sample summary report in github - https://github.com/pppoudel/log-parser/blob/master/sample_reports/02_WebAccessLogSummaryRpt.txt
See my other posts in this series

websphereLogParser.sh for parsing, analyzing and reporting WebSphere Application Server (WAS) SystemOut.log
webErrorLogParser.sh for parsing, analyzing and reporting Apache/IBM HTTP Server (IHS) error_log
javaGCStatsParser.sh for parsing, analyzing and reporting Java verbose Garbage Collection (GC) log

Pages

Log Parsing, Analysis, Correlation, and Reporting Engine

How to execute?

Input

1. thresholdValues.csv

2. perfBaseLine.csv

3. WASCustomFilter.txt

4. WAS_CloneIDs.csv

Output:

How to Parse Apache access_Log for Troubleshooting & Reporting

How to execute:

Output