Who is the big load on my server?!

Monday, September 13, 2010
By Steve

What is below will show SQL statements for a given window (the last four days in the example below) that are tightly correlated with load average on a server. In general, a statistical correlation greater than 60 means there is an increasing degree of correlation between the two data sets. If one goes up, so does the other.

It ignores PL/SQL blocks, so it only focuses on SELECT, INSERT, UPDATE, and DELETE statements.

FWIW, an excellent write up on load average can be found at http://www.linuxjournal.com/article/9001. A hint is that it is not always user CPU utilization that drives up load average. It is often high I/O, or poor average I/O response time that is the issue.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
SET serveroutput ON size unlimited
DECLARE
  CURSOR top_sql_execs IS
    SELECT DISTINCT sql_id
      FROM (SELECT dhss.sql_id,iowait_delta
              FROM dba_hist_sqlstat dhss, dba_hist_sqltext dhst
              WHERE snap_id >= (SELECT MIN(snap_id)
                                  FROM dba_hist_snapshot
                                  WHERE begin_interval_time > SYSDATE - 4)
                AND dhss.sql_id = dhst.sql_id
                AND LOWER(sql_text) NOT LIKE 'begin%'
              ORDER BY 2 DESC)
      WHERE ROWNUM <= 10;
  l_corr NUMBER;
BEGIN
  FOR cur IN top_sql_execs LOOP
    SELECT ROUND((CORR(val1,val2) * 100),0)
      INTO l_corr
      FROM (SELECT snap_id,
                   instance_number,
                   iowait_delta val1
              FROM dba_hist_sqlstat
              WHERE sql_id = cur.sql_id) a,
           (SELECT snap_id,
                   instance_number,
                   VALUE val2
              FROM dba_hist_osstat
              WHERE stat_name = 'LOAD') b
      WHERE a.snap_id = b.snap_id
        AND a.instance_number = b.instance_number;
    DBMS_OUTPUT.put_line(cur.sql_id || ' ' || l_corr);
  END LOOP;
END;
/

Leave a Reply