Hive performance parser

With data in the hiveserver2.log file, this awk scriptlet prints the timestamp, SQL, and seconds to run. There is an issue where the parser thread hands off to the executor, and you can’t always tie the two together.  However, at a minimum, this tells you the SQL and execution time. One limitation is that it won’t show queries which aren’t completed for whatever reason.

I may see if I can add a plugin to write this to a file in a more usable format.

BEGIN {
  while (++i <= 100) {
    pad = pad"*"
  }
}
{
  if ($0 ~ "Driver.java:execute.*Starting command") {
    r[$5] = $2
    split($0,t,":")
    n = 6
    sql = ""
    while (++n <= length(t)) {
      sql = sql""t[n]
    }

    getline
    while (1) {
      if ($0 !~ "HiveServer2") {
        sql = sql""$0
      }
      else {
        break
      }
      getline;
    }
    s[$5] = sql
  }
  else if ($0 ~ "PerfLogEnd.*PERFLOG method=Driver.run") {
    split($13,end,"=")
    printf("%s\n %-20s %s %.2f\n",pad,r[$5],s[$5],end[2]/1000)
  }
}

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.