While planning for holiday shopper traffic, we realized that we did not record the number of active sessions at any given time during holiday 2013.
To come up with something that would get us close, we wrote what is below to read the access log. If the last request for a given session id had not been issued more recently than 20 minutes ago (our session lifetime defined in web.xml), then it was not considered to be active. Using this, we read the file by minute and populated short term python dictionaries that held active sessions in the previous 20 minutes. After scrubbing them for sessions that were no longer active, we printed the minute and size of the dictionary in that minute.
""" ----------------------------------------------------------------------------------- Author: Steve Howard Date: November 14, 2013 Purpose: Provide count of sessions active in the access log. This is defined as those sessions that have had at least one request in the previous 20 minutes ----------------------------------------------------------------------------------- """ import sys if len(sys.argv) == 2: f = open(sys.argv[1]) else: f = sys.stdin d = dict() #fast to search by key, or JSESSIONID in our case tim = dict() min = 0 #read each line in the provided file... for l in f: #...then split it into an array... l2 = l.split() #...and if it isn't an akamai performance request or a load balancer request, and it is # in the new log format with JSESSIONID... #if l.find("sure_route") == -1 and l2[0] != "-" and len(l2) == 13: if l.find("sure_route") == -1 and len(l2) == 13: t=l2[2].split(":") currmin = (int(t[1]) * 60) + int(t[2]) d[l2[9]] = currmin if currmin != min: for i in d: if currmin - d[i] < 20: tim[i] = d[i] print currmin,len(tim) min = currmin d = tim.copy() tim.clear()