{"id":864,"date":"2010-11-15T13:28:26","date_gmt":"2010-11-15T18:28:26","guid":{"rendered":"http:\/\/appcrawler.com\/wordpress\/?p=864"},"modified":"2011-07-06T10:10:23","modified_gmt":"2011-07-06T15:10:23","slug":"resmgrcpu-quantum-preventing-logins","status":"publish","type":"post","link":"http:\/\/appcrawler.com\/wordpress\/2010\/11\/15\/resmgrcpu-quantum-preventing-logins\/","title":{"rendered":"\u201cresmgr:cpu quantum\u201d preventing logins??"},"content":{"rendered":"<p>Below is the content of an email I wrote this morning.  We could not login to a single instance database due to the statistics gathering job running long.  What is odd is that when we tried to troubleshoot, we couldn&#8217;t even login using sqlplus -prelim \/ as sysdba&#8230;not good.<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<\/p>\n<p>At around 11:20AM on Friday, November 12th, an alert was generated via email every five minutes until noon requesting that the WCAS service be \u201cverified\u201d, as a test query against it did not return in under 60 seconds.  This query normally takes less than two seconds.  At noon that same day, the DBA team logged in to troubleshoot.  We found that we could not login to the instance, as our login would \u201chang\u201d.  We found LDAP error messages in the OS system log file.  Thinking (erroneously) these messages may be related, we elected to bounce the instance and troubleshoot after the fact.  After the instance was restarted, we found we could not find any useful information, but the issue had \u201cgone away\u201d as well.<\/p>\n<p>On Monday morning, November 15th at 9:43AM, the same alert that was generated on Friday began to be propagated to operations, the DBA\u2019s, and the application support teams.  Once again, the DBA\u2019s logged in and found they could not login to the instance.  We were able to generate \u201csystem state dumps\u201d, which can contain useful information to troubleshoot an issue after the fact.  However, when we found no \u201csmoking gun\u201d, we elected to once again restart the instance.<\/p>\n<p>This time however, the problem appeared again shortly after it was restarted.  We found we could login, but only intermittently.  When we did get logged in, we elected to run a \u201changanalyze\u201d.  This command prints in human readable format a list of what sessions are blocking others, as well as the operational source of the block (what are they doing).  When we did this, we found almost all sessions were blocked waiting for \u201ccpu quantum\u201d.  This code tree is taken when a \u201cresource manager\u201d plan is enabled in the instance, and is acting as a CPU traffic cop, of sorts.  The particular plan was the one that is enabled when query optimizer statistics are being gathered.  Our understanding of this is that Oracle enables this plan to *protect* online users from CPU resource starvation.  In other words, the statistics gathering job steps aside when a \u201creal\u201d user comes in.<\/p>\n<p>Obviously, either:<\/p>\n<ol>\n<li>Our understanding is incorrect<\/li>\n<li>It doesn\u2019t work as advertised<\/li>\n<\/ol>\n<p>I\u2019m thinking number 2 \ud83d\ude42<\/p>\n<p>For now, we have disabled the statistics gathering job as we dig into exactly what it does to ensure:<\/p>\n<ol>\n<li>We understand it<\/li>\n<li>It either:<\/li>\n<\/ol>\n<ul>\n<li>Doesn\u2019t do it again, or<\/li>\n<\/ul>\n<ul>\n<li>We have a workaround<\/li>\n<\/ul>\n<p>For now the problem has been resolved pending further investigation.<\/p>\n<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Below is the content of an email I wrote this morning. We could not login to a single instance database due to the statistics gathering job running long. What is odd is that when we tried to troubleshoot, we couldn&#8217;t&hellip;<\/p>\n<p class=\"more-link-p\"><a class=\"more-link\" href=\"http:\/\/appcrawler.com\/wordpress\/2010\/11\/15\/resmgrcpu-quantum-preventing-logins\/\">Read more &rarr;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_mi_skip_tracking":false,"footnotes":""},"categories":[19,22],"tags":[],"_links":{"self":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/864"}],"collection":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/comments?post=864"}],"version-history":[{"count":6,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/864\/revisions"}],"predecessor-version":[{"id":1330,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/864\/revisions\/1330"}],"wp:attachment":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/media?parent=864"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/categories?post=864"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/tags?post=864"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}