{"id":681,"date":"2010-05-04T12:11:09","date_gmt":"2010-05-04T17:11:09","guid":{"rendered":"http:\/\/appcrawler.com\/wordpress\/?p=681"},"modified":"2011-07-06T10:13:03","modified_gmt":"2011-07-06T15:13:03","slug":"clustering_factor-explained-with-a-real-life-email","status":"publish","type":"post","link":"http:\/\/appcrawler.com\/wordpress\/2010\/05\/04\/clustering_factor-explained-with-a-real-life-email\/","title":{"rendered":"CLUSTERING_FACTOR"},"content":{"rendered":"<p>I had to analyze a performance problem this morning, and it occurred to me it is a good example of troubleshooting queries impacted by the clustering_factor of an index.\u00a0 As such, below is the email exchange.<\/p>\n<p><em>I took a look, and I think I understand why the query optimizer is choosing a full scan and not using the index on institution_number.<\/em><\/p>\n<p><em>The issue is the UPDATE_TIME filter.\u00a0 If we were just asking for the column2 for institution 22, then it could resolve the query with the index on column1, column2.\u00a0 Since we are also asking for UPDATE_TIME, we will *have* to visit the table at some point.\u00a0 That is when it gets dicey.\u00a0 There are only about 6 million storage *blocks* in the entire table, as the rows are fairly \u201cskinny\u201d.\u00a0 There are about 3 million holdings for OSU.\u00a0 If Oracle uses the index, it will get the 3 million records for the column1 from the index, and use those to lookup each record to see if it meets the UPDATE_TIME criteria.<\/em><\/p>\n<p><em>The deciding factor for the query optimizer is what Oracle calls the \u201cclustering_factor\u201d.\u00a0 This is an indication as to how ordered the data in the table is when compared to the order of the data in the index.\u00a0 The bad news is ours is not ordered at all when compared to each other.\u00a0 In other words, the next entry in the index almost never refers to the same table block as the previous index entry did.<\/em><\/p>\n<p><em>What this means is the following example for OSU (column1 = 22) and the entries for it in the column1, column2 index:<\/em><\/p>\n<p><em>column1 22 \u2013 table rowid = file, 12, block 123456<br \/>\ncolumn1 22 \u2013 table rowid = file, 14, block 345126<br \/>\ncolumn1 22 \u2013 table rowid = file, 15, block 759253<br \/>\ncolumn1 22 \u2013 table rowid = file, 12, block 784359<br \/>\ncolumn1 22 \u2013 table rowid = file, 15, block 834127<\/em><\/p>\n<p><em>This requires Oracle to jump back and forth between the index and the table, reading different blocks one at a time (that is the real problem).\u00a0 Since there are 3 million holdings, being read one at a time on up to 3 million table blocks, Oracle finds it\u00a0 quicker to full scan the 6 million blocks of the table (16 at a time), as the overall throughput will be better.\u00a0 Think of it like looking up a word in an index, where the word never occurs twice on the same page, and the pages on which it does occur are in no particular sorted order.\u00a0 If you could speed read, it may be quicker to just read the book if the number of entries you find is large enough.<\/em><\/p>\n<p><em>The only solutions I can think of are:<\/em><\/p>\n<p><em>1.\u00a0\u00a0 \u00a0Parallel with a higher db_file_multiblock_read_count parameter value \u2013 I am the last guy on the block to suggest parallel query, as the \u201cneed\u201d for it usually indicates a fixable design issue.\u00a0 However in this case, it may be the least invasive, assuming you won\u2019t be running multiple versions of this SQL simultaneously.\u00a0 I retrieved the entire set in about 90 seconds with a db_file_multiblock_read_count=128 and parallel 8.\u00a0 That leads me to believe you could probably do it with parallel 4.<br \/>\n2.\u00a0\u00a0 \u00a0Add update_time to the column1, column2 index.\u00a0 This would allow the same query to be resolved in about a minute, as it could use the index exclusively.\u00a0 It may also be useful for other queries that need update_time.<br \/>\n<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I had to analyze a performance problem this morning, and it occurred to me it is a good example of troubleshooting queries impacted by the clustering_factor of an index.\u00a0 As such, below is the email exchange. I took a look,&hellip;<\/p>\n<p class=\"more-link-p\"><a class=\"more-link\" href=\"http:\/\/appcrawler.com\/wordpress\/2010\/05\/04\/clustering_factor-explained-with-a-real-life-email\/\">Read more &rarr;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_mi_skip_tracking":false,"footnotes":""},"categories":[19,22],"tags":[],"_links":{"self":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/681"}],"collection":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/comments?post=681"}],"version-history":[{"count":9,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/681\/revisions"}],"predecessor-version":[{"id":1341,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/681\/revisions\/1341"}],"wp:attachment":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/media?parent=681"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/categories?post=681"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/tags?post=681"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}