Possible Issues caused by CA Wily Introscope APM BRTM feature

After going live with CA APM on week-ends, right next Monday morning when site opened for business, help desk started getting string of complains. Either both or one of the following were reported:
1) Users experiencing spinning wheel when clicking submit on their web page.
2) Some users were losing their sessions and were prompted to re-login to their web application.

Initially, we were completely lost as we did not know what's causing this. Odd thing, not all users were experiencing the difficulties. And also server resources (CPU, memory) usage were normal. Performance was also normal for those users who did not experience the above mentioned issues.   After analyzing the web server access log, we were able to put some pattern based on the affected users' User Agent information. Some how, all the affected users were using Internet Explorer (IE) browsers.  IE 9/Trident 5.0, IE 10/Trident 6.0.  Even though more than 60% of our users were using Firefox (different versions), none of them reported any such issues, only IE users were affected. With this information in hand, we asked QA to test with different browser and they were able to reproduce the issues with IE and not with Firefox.

     Next step was to identify what was causing the issues. Again, looking into the Access log, we were able to identify that the request URL coming from those users just before they were kicked out had query parameter 'WilyCmd=cmdMetrics' like '/abc/def.jsp?WilyCmd=cmdMetrics'.
We also noticed that number of http(s) requests were considerably higher than usual. Basically almost each request url has duplicate request with the above mentioned query parameter appended. This information allowed us to pin-point the newly installed monitoring tool's BRTM Business Transaction Monitoring" feature (which injects Java Script into response header so that it's executed on user's browser to make another request with query parameter 'WilyCmd=cmdMetrics' appended), as primary suspect.  In our case, we would normally see in average 450,000 requests/day, but with this feature in place we recorded 750,000 requests in average per day.
You can read more about BRTM feature here:
https://wiki.ca.com/display/APMDEVOPS98/APM+for+Browser+Response+Time+Monitor

In the above mentioned link, you can also see a short statement about IE, it states, "... The CA BRTM JavaScript is loaded asynchronously so it does not block the loading and execution of any application JavaScript files and other components. However, Internet Explorer (6 through 9) does not wait on asynchronous loads before it generates the load event. This limitation can affect metric generation for Internet Explorer."

From our own investigation, we concluded that BRTM feature caused the above mentioned issues when using certain versions of IE browser. As a work-around, we had to turn off this feature until it is fixed by the vendor. Disabling BRTM feature is easy, you need to assign 'false' to 'introscope.agent.brtm.enabled' property in 'IntroscopeAgent.profile' file and restart the application server.

Lesson learned: until the issue is fixed, make sure you test your web application using all supported browsers before enabling this feature into your production environment.

No comments:

Post a Comment