view xml/dnsbl.in @ 92:505e77188317

optimize verification step, cleanup documentation
author carl
date Wed, 21 Sep 2005 08:00:08 -0700
parents 962a1f8f1d9f
children e107ade3b1c0
line wrap: on
line source

<html>

<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<title>DNSBL Sendmail milter - Version 5.5</title>
</head>

<center>Introduction</center>
<p>This milter is released under the GPL license version 2 included in
the LICENSE file in the distribution, and also available at
<a href="http://www.gnu.org/licenses/gpl.html">http://www.gnu.org/licenses/gpl.html</a>

<p>Consider the case of a mail server that is acting as secondary MX for
a collection of clients, each of which has a collection of mail domains.
Each client may use their own collection of DNSBLs on their primary mail
server.  We present here a mechanism whereby the backup mail server can
use the correct set of DNSBLs for each recipient for each message.  As a
side-effect, it gives us the ability to customize the set of DNSBLs on a
per-recipient basis, so that fred@example.com could use SPEWS and the
SBL, where all other users @example.com use only the SBL.

<p>This milter can also verify the envelope from/recipient pairs with
the primary MX server.  This allows the backup mail servers to properly
reject mail sent to invalid addresses.  Otherwise, the backup mail
servers will accept that mail, and then generate a bounce message when
the message is forwarded to the primary server (and rejected there with
no such user).

<p>This milter will also decode (uuencode, base64, mime, html entity,
url encodings) and scan for HTTP and HTTPS URLs and bare hostnames in
the body of the mail.  If any of those host names have A or NS records
on the SBL (or a single configurable DNSBL), the mail will be rejected
unless previously whitelisted.  This milter also counts the number of
invalid HTML tags, and can reject mail if that count exceeds your
specified limit.

<p>The DNSBL milter reads a text configuration file (dnsbl.conf) on
startup, and whenever the config file (or any of the referenced include
files) is changed.  The entire configuration file is case insensitive.
If the configuration cannot be loaded due to a syntax error, the milter
will log the error and quit.  If the configuration cannot be reloaded
after being modified, the milter will log the error and send an email to
root from dnsbl@$hostname.  You probably want to added dnsbl@$hostname
to your /etc/mail/virtusertable since otherwise sendmail will reject
that message.

<hr> <center>DCC Issues</center>
<p>If you are also using the <a
href="http://www.rhyolite.com/anti-spam/dcc/">DCC</a> milter, there are
a few considerations.  You may need to whitelist senders from the DCC
bulk detector, or from the DNS based lists.  Those are two very
different reasons for whitelisting.  The former is done thru the DCC
whiteclnt config file, the later is done thru the DNSBL milter config
file.

<p>You may want to blacklist some specific senders or sending domains.
This could be done thru either the DCC (on a global basis, or for a
specific single recipient).  We prefer to do such blacklisting via the
DNSBL milter config, since it can be done for a collection of recipient
mail domains.  The DCC approach has the feature that you can capture the
entire message in the DCC log files.  The DNSBL milter approach has the
feature that the mail is rejected earlier (at RCPT TO time), and the
sending machine just gets a generic "550 5.7.1 no such user" message.

<p>The DCC whiteclnt file can be included in the DNSBL milter config by
the dcc_to and dcc_from statements.  This will import the (env_to,
env_from, and substitute mail_host) entries from the DCC config into the
DNSBL config.  This allows using the DCC config as the single point for
white/blacklisting.

<p>Consider the case where you have multiple clients, each with their
own mail servers, and each running their own DCC milters.  Each client
is using the DCC facilities for envelope from/to white/blacklisting.
Presumably you can use rsync or scp to fetch copies of your clients DCC
whiteclnt files on a regular basis.  Your mail server, acting as a
backup MX for your clients, can use the DNSBL milter, and include those
client DCC config files.  The envelope from/to white/blacklisting will
be appropriately tagged and used only for the domains controlled by each
of those clients.

<hr> <center>Definitions</center>

<p>CONTEXT - a collection of parameters that defines the filtering
context to be used for a collection of envelope recipient addresses.
The context includes such things as the list of DNSBLs to be used, and
the various content filtering parameters.

<p>DNSBL - a named DNS based blocking list is defined by a dns suffix
(e.g. sbl-xbl.spamhaus.org) and a message string that is used to
generate the "550 5.7.1" smtp error return code.  The names of these
DNSBLs will be used to define the DNSBL-LISTs.

<p>DNSBL-LIST - a named list of DNSBLs that will be used for specific
recipients or recipient domains.

<hr> <center>Filtering Procedure</center>

<p>If the client has authenticated with sendmail, the mail is accepted,
the filtering contexts are not used, the dns lists are not checked, and
the body content is not scanned.  Otherwise, we follow these steps for
each recipient.

<ol>

<li>The envelope to email address is used to find an initial filtering
context.  We first look for a context that specified the full email
address in the env_to statement.  If that is not found, we look for a
context that specified the entire domain name of the envelope recipient
in the env_to statement.  If that is not found, we look for a context
that specified the user@ part of the envelope recipient in the env_to
statement.  If that is not found, we use the first top level context
defined in the config file.

<br><br><li>The initial filtering context may redirect to a child
context based on the values in the initial context's env_from statement.
We look for [1) the full envelope from email address, 2) the domain name
part of the envelope from address, 3) the user@ part of the envelope
from address] in that context's env_from statement, with values that
point to a child context.  If such an entry is found, we switch to that
child filtering context.

<br><br><li>We lookup [1) the full envelope from email address, 2) the
domain name part of the envelope from address, 3) the user@ part of the
envelope from address] in the filtering context env_from statement.
That results in one of (white, black, unknown, inherit).

<br><br><li>If the answer is black, mail to this recipient is rejected
with "no such user", and the dns lists are not checked.

<br><br><li>If the answer is white, mail to this recipient is accepted
and the dns lists are not checked.

<br><br><li>If the answer is unknown, we don't reject yet, but the dns
lists will be checked, and the content may be scanned.

<br><br><li>If the answer is inherit, we repeat the envelope from search
in the parent context.

<br><br><li>The dns lists specified in the filtering context are checked
and the mail is rejected if any list has an A record for the standard
dns based lookup scheme (reversed octets of the client followed by the
dns suffix).

<br><br><li>If the mail has not been accepted or rejected yet, and the
filtering context specifies a verification host, and the envelope to
email address is covered by this filtering context, and the verification
host is not our own hostname, we open an smtp conversation with that
verification host.  The current envelope from and recipient to values
are passed to that verification host.  If we receive a 5xy response
those commands, we reject the current recipient with "no such user".

<br><br><li>If the mail has not been accepted or rejected yet, and the
filtering context enables content filtering, and this is the first such
recipient in this smtp transaction, we set the content filtering
parameters from this context, and enable content filtering for the body
of this message.

</ol>

<p>If content filtering is enabled for this body, the mail text is
decoded (uuencode, base64, mime, html entity, url encodings), scanned
for HTTP and HTTPS URLs, and the first &lt;configurable&gt; host names
are checked for their presence on the single &lt;configurable&gt; DNSBL.
The only known list that is suitable for this purpose is the SBL.  If
any of those host names are on that DNSBL (or have nameservers that are
on that list), and it is not on the &lt;configurable&gt; ignore list,
the mail is rejected.  We also scan for excessive bad html tags, and if
a &lt;configurable&gt; limit is exceeded, the mail is rejected.

<hr> <center>Sendmail access vs. DNSBL</center>
<p>With the standard sendmail.mc dnsbl FEATURE, the dnsbl checks may be
suppressed by entries in the /etc/mail/access database.  For example,
suppose you control a /18 of address space, and have allocated some /24s
to some clients.  You have access entries like

<pre>
192.168.4   OK
192.168.17  OK
</pre>

<p>to allow those clients to smarthost thru your mail server.  Now if
one of those clients happens get infected with a virus that turns a
machine into an open proxy, and their 192.168.4.45 lands on the SBL-XBL,
you will still wind up allowing that infected machine to smarthost thru
your mail servers.

<p>With this DNSBL milter, the sendmail access database cannot override
the dnsbl checks, so that machine won't be able to send mail to or thru
your smarthost mail server (unless the virus/proxy can use smtp-auth).

<p>Using the standard sendmail features, you would add access entries to
allow hosts on your local network to relay thru your mail server.  Those
OK entries in the sendmail access database will override all the dnsbl
checks.  With this DNSBL milter, you will need to have the local users
authenticate with smtp-auth to get the same effect.  You might find <a
href="http://www.ists.dartmouth.edu/classroom/sendmail-ssl-how-to.php">
these directions</a> helpful for setting up smtp-auth if you are on RH
Linux.

<hr> <center>Installation and configuration</center>
<p>Usage:  Note that this has ONLY been tested on Linux, specifically
RedHat Linux.  In particular, this milter makes no attempt to understand
IPv6.  Your mileage will vary.  You will need at a minimum a C++
compiler with a minimally thread safe STL implementation.  The
distribution includes a test.cpp program.  If it fails this milter won't
work.  If it passes, this milter might work.

Fetch <a href="http://www.five-ten-sg.com/util/dnsbl.tar.gz">dnsbl.tar.gz</a>
and

<pre>
tar xfvz dnsbl.tar.gz
bash install.bash
</pre>

Read and understand the contents of that install.bash script before you
run it.  It may not be suitable for your system.  Modify your
sendmail.mc by removing all the "FEATURE(dnsbl" lines, add the following
line in your sendmail.mc and rebuild the .cf file

<pre>
INPUT_MAIL_FILTER(`dnsbl', `S=local:/var/run/dnsbl/dnsbl.sock, F=T, T=C:30s;S:5m;R:5m;E:5m')
</pre>

Read the sample <a
href="http://www.five-ten-sg.com/dnsbl.conf">/etc/dnsbl/dnsbl.conf</a>
file and modify it to fit your configuration.  You can test your
configuration files, and see a readable internal dump of them on stdout
with

<pre>
cd /etc/dnsbl
/usr/sbin/dnsbl -c
</pre>

You can check a specific envelope from/to pair with

<pre>
cd /etc/dnsbl
from="$1" # or your from address
to="$2"   # or your to address
/usr/sbin/dnsbl -e "$from"'|'"$to"
</pre>

<hr> <center>Performance issues</center>

<p>Consider a high volume high performance machine running sendmail.
Each sendmail process can do its own dns resolution.  Typically, such
dns resolver libraries are not thread safe, and so must be protected by
some sort of mutex in a threaded environment.  When we add a milter to
sendmail, we now have a collection of sendmail processes, and a
collection of milter threads.

<p>We will be doing a lot of dns lookups per mail message, and at least
some of those will take many tens of seconds.  If all this dns work is
serialized inside the milter, we have an upper limit of about 25K mail
messages per day.  That is clearly not sufficient for many sites.

<p>Since we want to do parallel dns resolution across those milter
threads, we add another collection of dns resolver processes.  Each
sendmail process is talking to a milter thread over a socket, and each
milter thread is talking to a dns resolver process over another socket.

<p>Suppose we are processing 20 messages per second, and each message
requires 20 seconds of dns work.  Then we will have 400 sendmail
processes, 400 milter threads, and 400 dns resolver processes.  Of
course that steady state is very unlikely to happen.

<hr> <center>Rejected Ideas</center>

<p>The following ideas have been considered and rejected.

<p>Add max_recipients for each mail domain to the configuration.
Recipients in excess of that limit will be rejected, and all the
recipients in that domain will be removed if there are some other
whitelisted recipients.  Current spammers *very* rarely send more than
ten recipients in a single smtp transaction, so this won't stop
any significant amount of spam.

<p>Add poison addresses to the configuration.  If any recipient is
poison, all recipients are rejected even if they would be whitelisted,
and the data is rejected if sent.  I have a collection of spam trap
addresses that would be suitable for such use.  Based on my log files,
any mail to those spam trap addresses is rejected based on either dnsbl
lookups or the DCC.  So this won't result in blocking any additional
spam.

<p>Add an option to only allow one recipient if the return path is
empty.  Based on my log files, there is no mail that violates this
check.

<p>Reject the mail if the envelope from domain name contains any MX
records pointing to 127.0.0.0/8. I don't see any significant amount of spam
sent with such domain names.


<pre>
$Id$
</pre>
</body>
</html>