Japanese

top report presentation (PDF) article (Ja.) white list black list watch tool applyers Q&A blog (Ja.) links history contact

Rejection log sorting script

Version up on Oct 27, 2014 (the former version is here.)

   Here is a shell script useful for finding out mail servers which are mistakenly rejected by the S25R anti-spam system. If your computer combines a web server with a mail server, you can easily watch rejection records with a web browser by means of installing this script together with a password under a directory under the cgi-bin directory. You can also run it as a command.

Function
   This script inputs a Postfix mail log and extracts records of rejection with the response code "4XX" (meaning "try again later"; "XX" is a couple of numerals) by client restriction (rejections by other reasons than S25R are also extracted), and displays them sorted so that the records of retry accesses are arranged in a sequence. That is, accesses with the same client IP address, sender address and recipient address are arranged in a sequence. If all of them are not the same, the records are separated by a blank line with each other.
   The reason codes for the rejections are displayed with the following characters:    It also displays the following data at the end:
Practical use
   Legitimate mail servers always retry transfer at appropriate intervals against rejection with the response code "4XX". The records of those rejections are displayed in a sequence with this script. Therefore, it will help you to find out accesses from legitimate mail servers which should be listed in the white list. (Note that you cannot receive blocked messages by S25R whitelisting unless the reason code is "C" or "B".)
   If accesses displayed in a sequence satisfy all of the following conditions, the client may be a legitimate mail server which should be listed in the white list.    Meanwhile, if accesses displayed in a sequence fall under any of the following conditions, the client is probably or surely illegitimate.    In case you are hard to decide whether the client is legitimate or not, you should once list it on the white list, and then unlist it if the recipient complains of spam.

Necessary configuration
   You need to configure the access mode of the mail log files so that they are readable with the authority of the HTTP daemon. In many systems, you can configure it with the commands as follows:
chgrp nobody /var/log/maillog*
chmod g+r /var/log/maillog*
   In case the user/group name of your HTTP daemon is "daemon", type "daemon" instead of "nobody" described above.

Alterations
(You can freely alter this script and publish your alteration.)

Shell script code

(Plain text version; long lines are folded)
#!/bin/sh
echo "Content-Type: text/plain"
echo
echo "Mail rejection log (4XX) - sorted;" \
    "Reason code: C=Client, B=DNSBL, S=Sender, R=Recipient, H=HELO, O=Other"
echo
#
# (1) Input mail log.
#
cat /var/log/maillog.1 /var/log/maillog | \
#
# (2) Extract records indicating "4XX".
#
egrep 'reject: RCPT from [^]]+\]: 4[0-9][0-9] ' | \
#
# (3) Extract essential items.
#
gawk '
{
  client=substr($0, match($0, /from [^]]+\]/)+5, RLENGTH-5)
  sub(/\[/, " [", client)
  sender=substr($0, match($0, /from=<[^>]*>/), RLENGTH)
  rcpt=substr($0, match($0, /to=<[^>]*>/), RLENGTH)
  helo=substr($0, match($0, /helo=<[^>]*>/), RLENGTH)
  if (match($0, /Client host rejected/))
    reason="C"
  else if (match($0, /Client host .+ blocked/))
    reason="B"
  else if (match($0, /Sender address rejected/))
    reason="S"
  else if (match($0, /Recipient address rejected/))
    reason="R"
  else if (match($0, /Helo command rejected/))
    reason="H"
  else
    reason="O"
  printf "%s %2d %s %s %s %s %s %s\n", \
      $1, $2, $3, reason, client, sender, rcpt, helo
}
' | \
#
# (4) Convert month names into month numbers.
#
gawk '
BEGIN {
  month_num["Jan"]=1
  month_num["Feb"]=2
  month_num["Mar"]=3
  month_num["Apr"]=4
  month_num["May"]=5
  month_num["Jun"]=6
  month_num["Jul"]=7
  month_num["Aug"]=8
  month_num["Sep"]=9
  month_num["Oct"]=10
  month_num["Nov"]=11
  month_num["Dec"]=12
  max_month_num=0
}
{
  $1=month_num[$1]
  if ($1>max_month_num)
    max_month_num=$1
  else if ($1<max_month_num)
    $1+=12
  printf "%3d %2d %s %s %s %s %s %s %s\n", $1, $2, $3, $4, $5, $6, $7, $8, $9
}
' | \
#
# (5) Sort according to IP address, sender address and recipient address.
#
sort -k 6,8 | \
#
# (6) Insert a blank line between records with a different triplet.
#
gawk '
BEGIN {
  prev_triplet=""
}
{
  if (prev_triplet!="") {
    if (prev_triplet!=$6 $7 $8)
      print ""
  }
  print
  prev_triplet=$6 $7 $8
}
' | \
#
# (7) Convert retry records in a sequence into one line.
#
gawk '
BEGIN {
  RS=""
}
{
  gsub(/\n/, "\036")
  print
}
' | \
#
# (8) Sort according to date and time.
#
sort -k 1,3 | \
#
# (9) Reconvert retry records in a sequence into multiple lines.
#
gawk '
{
  gsub(/\036/, "\n")
  print
  print ""
}
' | \
#
# (10) Reconvert month numbers into month names.
#
gawk '
BEGIN {
  month_name[1]="Jan"
  month_name[2]="Feb"
  month_name[3]="Mar"
  month_name[4]="Apr"
  month_name[5]="May"
  month_name[6]="Jun"
  month_name[7]="Jul"
  month_name[8]="Aug"
  month_name[9]="Sep"
  month_name[10]="Oct"
  month_name[11]="Nov"
  month_name[12]="Dec"
}
{
  if ($0!="") {
    $1=month_name[($1-1)%12+1]
    printf "%s %2d %s %s %s %s %s %s %s\n", $1, $2, $3, $4, $5, $6, $7, $8, $9
  }
  else
    print ""
}
' | \
#
# (11) Output sorted records with counting.
#
gawk '
BEGIN {
  Suppress_single_access_records=0
  RS=""
  acc_count=0
  host_and_rcpt=""
  msg_count=0
  seq_count=0
}
{
  retry_count=gsub(/\n/, "\n")
  acc_count+=1+retry_count
  if (index(host_and_rcpt, $6 $8)==0) {
    ++msg_count
    host_and_rcpt=$6 $8 host_and_rcpt
  }
  if (retry_count>0)
    ++seq_count
  if (!(retry_count==0 && Suppress_single_access_records)) {
    print
    print ""
  }
}
END {
  print "access count =", acc_count, \
      ", estimated message count =", msg_count, \
      ", retry sequence count =", seq_count
}
'

(HTML <PRE> tag verision; long lines should be horizontally scrolled)
#!/bin/sh
echo "Content-Type: text/html"
echo
echo "<html><body><pre>"
echo "Mail rejection log (4XX) - sorted;" \
    "Reason code: C=Client, B=DNSBL, S=Sender, R=Recipient, H=HELO, O=Other"
echo
#
# (1) Input mail log.
#
cat /var/log/maillog.1 /var/log/maillog | \
#
# (2) Extract records indicating "4XX".
#
egrep 'reject: RCPT from [^]]+\]: 4[0-9][0-9] ' | \
#
# (3) Extract essential items.
#
gawk '
{
  client=substr($0, match($0, /from [^]]+\]/)+5, RLENGTH-5)
  sub(/\[/, " [", client)
  sender=substr($0, match($0, /from=<[^>]*>/), RLENGTH)
  rcpt=substr($0, match($0, /to=<[^>]*>/), RLENGTH)
  helo=substr($0, match($0, /helo=<[^>]*>/), RLENGTH)
  if (match($0, /Client host rejected/))
    reason="C"
  else if (match($0, /Client host .+ blocked/))
    reason="B"
  else if (match($0, /Sender address rejected/))
    reason="S"
  else if (match($0, /Recipient address rejected/))
    reason="R"
  else if (match($0, /Helo command rejected/))
    reason="H"
  else
    reason="O"
  printf "%s %2d %s %s %s %s %s %s\n", \
      $1, $2, $3, reason, client, sender, rcpt, helo
}
' | \
#
# (4) Convert month names into month numbers.
#
gawk '
BEGIN {
  month_num["Jan"]=1
  month_num["Feb"]=2
  month_num["Mar"]=3
  month_num["Apr"]=4
  month_num["May"]=5
  month_num["Jun"]=6
  month_num["Jul"]=7
  month_num["Aug"]=8
  month_num["Sep"]=9
  month_num["Oct"]=10
  month_num["Nov"]=11
  month_num["Dec"]=12
  max_month_num=0
}
{
  $1=month_num[$1]
  if ($1>max_month_num)
    max_month_num=$1
  else if ($1<max_month_num)
    $1+=12
  printf "%3d %2d %s %s %s %s %s %s %s\n", $1, $2, $3, $4, $5, $6, $7, $8, $9
}
' | \
#
# (5) Sort according to IP address, sender address and recipient address.
#
sort -k 6,8 | \
#
# (6) Insert a blank line between records with a different triplet.
#
gawk '
BEGIN {
  prev_triplet=""
}
{
  if (prev_triplet!="") {
    if (prev_triplet!=$6 $7 $8)
      print ""
  }
  print
  prev_triplet=$6 $7 $8
}
' | \
#
# (7) Convert retry records in a sequence into one line.
#
gawk '
BEGIN {
  RS=""
}
{
  gsub(/\n/, "\036")
  print
}
' | \
#
# (8) Sort according to date and time.
#
sort -k 1,3 | \
#
# (9) Reconvert retry records in a sequence into multiple lines.
#
gawk '
{
  gsub(/\036/, "\n")
  print
  print ""
}
' | \
#
# (10) Reconvert month numbers into month names.
#
gawk '
BEGIN {
  month_name[1]="Jan"
  month_name[2]="Feb"
  month_name[3]="Mar"
  month_name[4]="Apr"
  month_name[5]="May"
  month_name[6]="Jun"
  month_name[7]="Jul"
  month_name[8]="Aug"
  month_name[9]="Sep"
  month_name[10]="Oct"
  month_name[11]="Nov"
  month_name[12]="Dec"
}
{
  if ($0!="") {
    $1=month_name[($1-1)%12+1]
    printf "%s %2d %s %s %s %s %s %s %s\n", $1, $2, $3, $4, $5, $6, $7, $8, $9
  }
  else
    print ""
}
' | \
#
# (11) Output sorted records with counting.
#
gawk '
BEGIN {
  Suppress_single_access_records=0
  RS=""
  acc_count=0
  host_and_rcpt=""
  msg_count=0
  seq_count=0
}
{
  retry_count=gsub(/\n/, "\n")
  acc_count+=1+retry_count
  if (index(host_and_rcpt, $6 $8)==0) {
    ++msg_count
    host_and_rcpt=$6 $8 host_and_rcpt
  }
  if (retry_count>0)
    ++seq_count
  if (!(retry_count==0 && Suppress_single_access_records)) {
    print
    print ""
  }
}
END {
  print "access count =", acc_count, \
      ", estimated message count =", msg_count, \
      ", retry sequence count =", seq_count
}
' | \
#
# (12) Convert "<" and ">" into entities.
#
gawk '
{
  gsub(/</, "\\&lt;")
  gsub(/>/, "\\&gt;")
  print
}
'
echo "</pre></body></html>"
top report presentation (PDF) article (Ja.) white list black list watch tool applyers Q&A blog (Ja.) links history contact