How to Parse Apache access_Log for Troubleshooting & Reporting


Note: if you haven't already, see Log Parsing, Analysis, Correlation, and Reporting Engine post first.

Access log is a great source of information (for troubleshooting, performance analysis, user trend reporting etc.) as it records all requests processed by Apache Web server. What information to capture in access log is controlled using CustomLog and LogFormat directives. Visit Apache site (https://httpd.apache.org/docs/2.4/logs.html#accesslog) for more information about the access log.
This particular Log Parser that I'm discussing here is written to parse the access_log generated using the following log format:
LogFormat "%h %l %u %t \"%r\" %>s %b JSESSIONID=\"%{JSESSIONID}C\" UID=\"%{UID}C\" %D %I %O \"%{User-agent}i\" %v" common

Note: if your access_log is generated using different LogFormat, you may need to tweak the script a little bit.

Finding log files: currently parser finds all access_log in the given path if:
$recDate == $currDate
or access_log.$rec0MM$rec0DD$recYY
if ($recDate < $currDate).
Where:
recDate: Optional. It is the log entry date. Meaning log entries with that date will be processed. It takes the format 'YYYY-MM-DD'. Default is to use current date. However, if 'daily' is chosen as 2nd argument, and log entry date is not provided, it defaults to 'date - 1 day'.
currDate: Optional. It is the log entry date. Meaning log entries with that date will be processed. It takes the format 'YYYY-MM-DD'. Default is to use current date. However, if 'daily' is chosen
as 2nd argument, and log entry date is not provided, it defaults to 'date - 1 day'.
rec0MM: rec0MM is Month like 01 (01 represent month of January)
rec0DD: rec0DD is Day like 01 (01 represents the first day of a month)
recYY: recYY is Year like 17 (17 represent year of 2017)

Review the actual script available in github - https://github.com/pppoudel/log-parser/blob/master/webAccessLogParser.sh for details.

Note: script is written to parse the date format like '13/Jun/2015:10:32:04 -0400' in access_log. If your access_log uses different date format, you may need to tweak the section of script which parses date.

How to execute:

You can see all the available options, by just launching:
$> ./webAccessLogParser.sh

Few examples are here:
# processing current day's logs
$> ./webAccessLogParser.sh --rootcontext <log-path>

# processing yesterday's logs with historical report updates
$> ./webAccessLogParser.sh --rootcontext <log-path> --rpttype daily

# processing any day's logs updates
$> ./webAccessLogParser.sh --rootcontext <log-path> --recorddate <date in (YYYY-MM-DD) format>

Output

Report/Output files:
  • $rptDir/00_Alert.txt
  • $rptDir/02_WebAccessLogSummaryRpt.txt
  • $rptDir/WebAccessLogRpt_all.csv
  • $rptDir/WebAccessLog_discardedRpt.csv
  • $rptDir/WebAccessLogSummaryByDomainRpt.csv
  • $rptDir/WebAccessLogSummaryByTransactionRpt.csv
  • $rptDir/WebAccessLogSummaryByUIDRpt.csv
  • $rptDir/WebAccessLogSummaryByRC400PlusURLRpt.csv
  • $rptDir/WebAccessLogSummaryByUidSessionRpt.csv
  • $rptDir/WebAccessLogSummaryUnknowUARpt.csv
  • $rptDir/WebHourlyDomainUsageByUid.csv
  • $rptDir/WebHourlyDomainUsageBySess.csv
  • $rptDir/WebDlyDomainUsage.csv

Where $rptDir is report directory. Default value is $TMP/$recDate

History Report/Output files:
# These are historical reports. Each run will append record in existing report file.
  • $pDir/WebPerfHistoryRpt.csv
  • $pDir/WebHourlyAvgRespTimeHistoryRpt.csv
  • $pDir/WebUniqueUsersHourlyHistoryRpt_all.csv
  • $pDir/WebRequestTypeHistoryRpt.csv
  • pDir/WebResponseCodeHistoryRpt.csv
  • $pDir/WebStatsByIHSHistoryRpt.csv
  • $pDir/WebStatsByWASHistoryRpt.csv
Where $pDir is parent of $rptDir.

See sample summary report in github - https://github.com/pppoudel/log-parser/blob/master/sample_reports/02_WebAccessLogSummaryRpt.txt
See my other posts in this series
  1. websphereLogParser.sh for parsing, analyzing and reporting WebSphere Application Server (WAS) SystemOut.log
  2. webErrorLogParser.sh for parsing, analyzing and reporting Apache/IBM HTTP Server (IHS) error_log
  3. javaGCStatsParser.sh for parsing, analyzing and reporting Java verbose Garbage Collection (GC) log

No comments:

Post a Comment