9/13/2008

Parsing Connect:Direct stats (part 1)

 
When you look at the stat files in the work directory (S20080912.001, for example), it is not very human readable because you have to hunt through the text to find the fields you want by name.  And if you want to compare one transmission to another it is difficult.  If you go into the direct prompt and run a select stat detail=yes, it is very readable but now the output is not very machine readable, meaning you can't feed that into a script or spreadsheet.  Wouldn't it be great to be able to parse information out of the stat files and get exactly what you want?
 
I started out wanting to look at stats for a specific set of files that were transmitted but I didn't know the process numbers.  The full file name may be something like procfeed.20080912_140814.input, but I want to see a list of PNUMs for all the files similar to that.

cat S200809* | grep RECI=CTRC | grep procfeed.200809 | \
awk -F"|" '{
  # find the PNUM field and show value
  for (i=1;i<=NF;i++) {
    split($i,A,"=")
    if (A[1]=="PNUM") {
      print A[2]
      break
    }
  }
}'

So I cat the stat file and grep for the first part of the file name and look for just the CTRC copy records.    Then pipe that through a simple awk script.  The -F"|" means consider a pipe character to be the field separator.  Count through all the fields, splitting each one at the equals sign.  If the part before the equals sign is PNUM, we've found the field.  Print the value and break out of the for loop to go to the next record.  The above gives an output like this:

22505
22675
22802
23216
23289

Then I went into the direct prompt and did a "select stat detail=yes pnum=(22505,22675,22802,23216,23289);" to get the details about the transmissions in a human-readable form.
 
I soon found myself doing this sort of thing repeatedly, and I'm looking for just a couple of pieces of info from the stats.  I sure would like to view more than just the PNUM field, that way I could skip the step of going into the direct prompt to do my select stat command.

cat S200809* | grep RECI=CTRC | grep procfeed.200809 | \
awk -F"|" '{
    # populate array B with all values using field names as subscripts
    for (i=1;i<=NF;i++) {
      SS=split($i,A,"=");SUB=A[1]; B[SUB]=A[2];delete A[1];delete A[2]
    }
    # go through all the fields we need to see and show values
    NE=split("PNUM,PNOD,SNOD,SFIL,DFIL,CCOD",F,",")
    for (IX=1;IX<=NE;IX++) {
      FLD=F[IX]
      printf "%s\t",B[FLD]
      delete F[IX]
    }
    print ""
    # clear array B
    for (SUB in B) delete B[SUB]
}'

Now this gives me a nice tab-delimited list of PNUMs with the source and destination servers and file names.  What the above code does is grep through the stat files for the copy records with  the first part of the tranmitted file name in them, and feeds just those lines into the awk program.  In the awk program we are going through all of the fields as separated by pipe characters, and populating an array with the info.  One array element for each field, with the field name used for the subscript.  Then go through a list of just the field names we want and display those array elements.  Separate the output with tabs, print a trailing newline, and clear the array for housekeeping sake.

1 comment:

David Lambert said...

On Solaris, use nawk instead of awk, for compatibility with arrays.