This is mailfromd.info, produced by makeinfo version 6.7 from
mailfromd.texi.

Published by the Free Software Foundation, 51 Franklin Street, Fifth
Floor, Boston, MA 02110-1301 USA

   Copyright (C) 2005-2020 Sergey Poznyakoff

   Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with no
Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.  A
copy of the license is included in the section entitled "GNU Free
Documentation License".
INFO-DIR-SECTION Email
START-INFO-DIR-ENTRY
* Mailfromd: (mailfromd).          General-purpose mail-filtering software.
* mailfromd: (mailfromd) Invocation.  Mail Filtering and Real-time Modification daemon.
* calloutd: (mailfromd) calloutd.  A Stand-Alone Callout Daemon.
* mfdbtool: (mailfromd) mfdbtool.  Database Management Tool.
* mtasim: (mailfromd) mtasim.      MTA simulator.
* pmult: (mailfromd) pmult.        Pmilter multiplexer program.
END-INFO-DIR-ENTRY

    
     Dedico aquest treball a Lluis Llach, per obrir els nous horitzons.
    
    

File: mailfromd.info,  Node: Top,  Next: Preface,  Up: (dir)

Mailfromd
*********

This edition of the 'Mailfromd Manual', last updated 26 July 2020,
documents 'mailfromd' Version 8.8.

* Menu:

* Preface::                 Short description of this manual; brief
                            history and acknowledgments.
* Intro::                   Introduction to Mailfromd.
* Building::                Building the Package.
* Tutorial::                Mailfromd Tutorial.
* MFL::                     The Mail Filtering Language.
* Library::                 The MFL Library Functions.
* Using MFL Mode::          Using the GNU Emacs MFL Mode.
* Mailfromd Configuration:: Configuring 'mailfromd'.
* Invocation::              How to Start and Stop 'mailfromd'.
* MTA Configuration::       Using 'mailfromd' with Various MTAs
* calloutd::                A Stand-Alone Callout Daemon.
* mfdbtool::                A Database Management Tool.
* mtasim::                  An MTA simulator.
* pmult::                   Pmilter multiplexer program.
* Reporting Bugs::          How to Report a Bug.

Appendices

* Gacopyz::
* Time and Date Formats::
* s-expression::
* Upgrading::

* Copying This Manual::  The GNU Free Documentation License.
* Concept Index::        Index of Concepts.

 -- The Detailed Node Listing --

Preface

* History::                 Short 'mailfromd' history.
* Acknowledgments::         Acknowledgments.

Introduction to 'mailfromd'

* Conventions::             Typographical conventions.
* Overview::                Mailfromd at a first glance
* SAV::                     Principles of Sender Address Verification.
* Rate Limit::              Controlling Mail Sending Rate.
* SPF::                     SPF, DKIM, and others.

Sender Address Verification.

* Limitations::

Tutorial

* Start Up::
* Simplest Configurations::
* Conditional Execution::
* Functions and Modules::
* Domain Name System::
* Checking Sender Address::
* SMTP Timeouts::
* Avoiding Verification Loops::
* HELO Domain::
* rset::
* Controlling Number of Recipients::
* Sending Rate::
* Greylisting::
* Local Account Verification::
* Databases::
* Testing Filter Scripts::
* Run Mode::
* Logging and Debugging::
* Runtime errors::
* Notes::

Databases

* Database Formats::
* Basic Database Operations::
* Database Maintenance::

Run Mode

* top-block::   The Top of a Script File.
* getopt::      Parsing Command Line Arguments.

Mail Filtering Language

* Comments::                    Comments.
* Pragmas::                     Pragmatic comments.
* Data Types::
* Numbers::
* Literals::
* Here Documents::
* Sendmail Macros::
* Constants::
* Variables::
* Back references::
* Handlers::
* begin/end::
* Functions::                   Functions.
* Expressions::                 Expressions.
* Shadowing::                   Variable and Constant Shadowing.
* Statements::
* Conditionals::                Conditional Statements.
* Loops::                       Loop Statements.
* Exceptions::                  Exceptional Conditions and their Handling.
* Polling::                     Sender Verification Tests.
* Modules::                     Modules are Collections of Useful Functions.
* Preprocessor::                Input Text Is Preprocessed.
* Filter Script Example::       A Working Filter Script Explained.
* Reserved Words::              A Reference List of Reserved Words.

Pragmatic comments

* prereq::          Pragma prereq.
* stacksize::       Pragma stacksize.
* regex::           Pragma regex.
* dbprop::          Pragma dbprop.
* greylist::        Pragma greylist.
* miltermacros::    Pragma miltermacros.
* provide-callout:: Pragma provide-callout.

Constants

* Built-in constants::

Variables

* Predefined variables::

Functions

* Some Useful Functions::

Expressions

* Constant expressions::      String and Numeric Constants.
* Function calls::            A Function Call is an Expression.
* Concatenation::             String Concatenation.
* Arithmetic operations::     '+', '-', etc.
* Bitwise shifts::            '<<' and '>>'.
* Relational expressions::    '=', '<', etc.
* Special comparisons::       'matches', 'mx matches', etc.
* Boolean expressions::       'and', 'or', 'not'.
* Precedence::                How various operators nest.
* Type casting::

Statements

* Actions::                     Actions control the handling of the mail.
* Assignments::
* Pass::
* Echo::

Exceptional Conditions

* Built-in Exceptions::
* User-defined Exceptions::
* Catch and Throw::

Modules

* module structure::    Declaring Modules
* scope of visibility::
* import::              Require and Import

The MFL Library Functions

* Macro access::
* String manipulation::
* String formatting::
* Character Type::
* Email processing functions::
* Envelope modification functions::
* Header modification functions::
* Body Modification Functions::
* Message modification queue::
* Mail header functions::
* Mail body functions::
* EOM Functions::
* Current Message Functions::
* Mailbox functions::
* Message functions::
* Quarantine functions::
* SMTP Callout functions::
* Compatibility Callout functions::
* Internet address manipulation functions::
* DNS functions::
* Geolocation functions::
* Database functions::
* I/O functions::
* System functions::
* Passwd functions::
* Sieve Interface::
* Interfaces to Third-Party Programs::
* Rate limiting functions::
* Greylisting functions::
* Special test functions::
* Mail Sending Functions::
* Blacklisting Functions::
* SPF Functions::
* DKIM::
* Sockmaps::
* NLS Functions::
* Syslog Interface::
* Debugging Functions::

Message Functions

* Header functions::
* Message body functions::
* MIME functions::
* Message digest functions::

Interfaces to Third-Party Programs

* SpamAssassin::
* DSPAM::
* ClamAV::

DSPAM

* flags-dspam::       DSPAM Operation Modes and Flags.
* class-dspam::       DSPAM Class and Source Bits.
* vars-dspam::        DSPAM Global Variables.

DKIM

* Setting up a DKIM record::

Configuring 'mailfromd'

* conf-types::      Special Configuration Data Types
* conf-base::       Base Mailfromd Configuration
* conf-server::     Server Configuration
* conf-milter::     Milter Connection Configuration
* conf-debug::      Logging and Debugging configuration
* conf-timeout::    Timeout Configuration
* conf-callout::    Call-out Configuration
* conf-priv::       Privilege Configuration
* conf-database::   Database Configuration
* conf-runtime::    Runtime Constants
* conf-mailutils::  Standard Mailutils Statements

'Mailfromd' Command Line Syntax

* options::                     Command Line Options.
* Starting and Stopping::       How to Start and Shut Down the Daemon.

Command Line Options.

* Operation Modifiers::
* General Settings::
* Preprocessor Options::
* Timeout Control::
* Logging and Debugging Options::
* Informational Options::

Using 'mailfromd' with Various MTAs

* Sendmail::
* MeTA1::
* Postfix::

'calloutd'

* config-calloutd::     Calloutd Configuration.
* invocation-calloutd:: Calloutd Command-Line Options.
* protocol-calloutd::   The Callout Protocol.

Calloutd Configuration

* conf-calloutd-setup:: 'calloutd' General Setup.
* conf-calloutd-server:: The 'server' Statement.
* conf-calloutd-log:: 'calloutd' Logging.

'mfdbtool'

* Invoking mfdbtool::
* Configuring mfdbtool::

'mtasim' -- a testing tool

* interactive mode::
* expect commands::
* traces::
* daemon mode::
* command summary::
* option summary::

Pmilter multiplexer program.

* pmult configuration::
* pmult example::
* pmult invocation::

Pmult Configuration

* pmult-conf::     Multiplexer Configuration.
* pmult-macros::   Translating MeTA1 macros.
* pmult-client::   Pmult Client Configuration.
* pmult-debug::    Debugging Pmult.

Upgrading

* 870-880::  Upgrading from 8.7 to 8.8
* 850-860::  Upgrading from 8.5 to 8.6
* 820-830::  Upgrading from 8.2 to 8.3 (or 8.4)
* 700-800::  Upgrading from 7.0 to 8.0
* 600-700::  Upgrading from 6.0 to 7.0
* 5x0-600::  Upgrading from 5.x to 6.0
* 500-510::  Upgrading from 5.0 to 5.1
* 440-500::  Upgrading from 4.4 to 5.0
* 43x-440::  Upgrading from 4.3.x to 4.4
* 420-43x::  Upgrading from 4.2 to 4.3.x
* 410-420::  Upgrading from 4.1 to 4.2
* 400-410::  Upgrading from 4.0 to 4.1
* 31x-400::  Upgrading from 3.1.x to 4.0
* 30x-31x::  Upgrading from 3.0.x to 3.1
* 2x-30x::   Upgrading from 2.x to 3.0.x
* 1x-2x::    Upgrading from 1.x to 2.x



File: mailfromd.info,  Node: Preface,  Next: Intro,  Prev: Top,  Up: Top

Preface
*******

Simple Mail Transfer Protocol (SMTP) which is the standard for email
transmissions across the Internet was designed in the good old days when
nobody could even think of the possibility of e-mail being abused to
send tons of unsolicited messages of dubious contents.  Therefore it
lacks mechanisms that could have prevented this abuse ("spamming"), or
at least could have made it difficult.  Attempts to introduce such
mechanisms (such as SMTP-AUTH extension
(http://tools.ietf.org/html/rfc2554)) are being made, but they are not
in wide use yet and, probably, their introduction will not be enough to
stop the e-mail abuse.  Spamming is today's grim reality and developers
spend lots of time and efforts designing new protection measures against
it.  'Mailfromd' is one of such attempts.

   The package is designed to work with any MTA supporting 'Milter' or
'Pmilter' protocol, such as 'Sendmail', 'MeTA1' or 'Postfix'.  It allows
you to:

   * Control whether messages come from trustworthy senders, using so
     called "callout" or "Sender Address Verification" (*note SAV::)
     mechanism.

   * Prevent emails coming from forged addresses by use of SPF mechanism
     (*note SPF Functions::).

   * Limit connection and/or sending rates (*note Rate Limit::).

   * Use "black-", "white-" and "greylisting" techniques.

   * Invoke external programs or other mail filters.

* Menu:

* History::                 Short 'mailfromd' history.
* Acknowledgments::         Acknowledgments.


File: mailfromd.info,  Node: History,  Next: Acknowledgments,  Up: Preface

Short history of 'mailfromd'.
=============================

The idea of the utility appeared in 2005, and its first version appeared
soon afterward.  Back then it was a simple implementation of Sender
Address Verification (*note SAV::) for 'Sendmail' (hence its name -
'mailfromd') with rudimentary tuning possibilities.

   After a short run on my mail servers, I discovered that the utility
was not flexible enough.  It took less than a month to implement a
configuration file that allowed the user to control program and data
flow during the 'envfrom' SMTP state.  The new version, 1.0, appeared in
June, 2005.

   Next major release, 1.2 (1.1 contained mostly bugfixes), appeared two
months later, and introduced "mail sending rate" control (*note Rate
Limit::).

   The program evolved during the next year, and the version 2.0 was
released in September, 2006.  This version was a major change in the
main idea of the program.  Configuration file become a flexible filter
script allowing the operator to control almost all SMTP states.  The
program supplied in the script file was compiled into a pseudo-code at
startup, this code being subsequently evaluated each time the filter was
invoked.  This caused a considerable speed-up in comparison with the
previous versions, where the run-time evaluator was traversing the parse
tree.  This version also introduced (implicitly, at the time), two
separate data types for the entities declared in the script, which also
played its role in the speed improvement (in the previous versions all
data were considered strings).  Lots of improvements were made in the
filter language (MFL, *note MFL::) itself, such as user-defined
functions, the 'switch' statement, the 'catch' statement for handling
run-time errors, etc.  The set of built-in functions extended
considerably.  A testsuite (using DejaGNU) was introduced in this
version.

   During this initial development period the limitations imposed by
'libmilter' implementation became obvious.  Finally, I felt they were
stopping further development, and decided that 'mailfromd' should use
its own 'Milter' implementation.  This new library, 'libgacopyz' was the
main new feature of the 3.0 release, which was released in November,
2006.  Another major feature was the '--dump-macros' option and 'macros'
to 'rc.mailfromd' script, that were intended to facilitate the
configuration on 'Sendmail' side.

   The development of 3.x (more properly, 3.1.x) series concentrated
mainly on bug-fixes, while the main development was done on the next
branch.

   The version 4.0 appeared on May 12, 2007.  A full list of changes in
this release is more than 500 lines long, so it is impractical to list
them here.  In particular, this version introduced lots of new features
in MFL syntax and the library of useful MFL functions.  The runtime
engine was also improved, in particular, stack space become expandable
which eliminated many run-time errors.  This version also provided a
foundation for MFL module system.  The code generation was
re-implemented to facilitate introduction of object files in future
versions.  Another new features in this release include SPF support and
'mtasim' utility -- an MTA simulator designed for testing 'mailfromd'
scripts (*note mtasim::).  The test suite in this version was made
portable by rewriting it in Autotest.

   Another big leap forward was the 5.0 release, which appeared on
December 26, 2008.  It largely enriched a set of available functions (61
new functions were introduced, which amounts to 41% of all the available
functions in 5.0 release) and introduced several improvements in the MFL
itself.  Among others, function aliases and optional arguments in
user-defined functions were introduced in this release.  The new "run
operation mode" allowed to execute arbitrary MFL functions from the
command line.  This release also raised the Mailutils version
requirements to at least 2.0.

   Version 6.0, which was released in on 12 December, 2009, introduced a
full-fledged modular system, akin to that of Python, and quite a few
improvements to the language.  such as explicit type casts,
concatenation operator, static variables, etc.

   Starting from version 7.0, the focus of further development of
'mailfromd' has shifted.  While previously it had been regarded as a
mail-filtering server, since then it was developed as a system for
extending MTA functionality in the broad sense, mail filtering being
only one of features it provides.

   Version 7.0 makes the MFL syntax more consistent and the language
itself more powerful.  For example, it is no longer necessary to use
prefixes before variables to dereference them.  The new 'try--catch'
construct allows for elegant handling of exceptions and errors.
User-defined exceptions provide a way for programming complex loops and
recursions with non-local exits.

   This version introduces a concept of dedicated callout server.  This
allows 'mailfromd' to defer verifications for a later time if the remote
server does not response within a reasonably short period of time (*note
SMTP Timeouts::).

   Six years later the version 8.0 was released.  This version was a
major rewrite of the mailfromd codebase.  It introduced a separate
callout daemon that made it possible to separate the mailfromd server
machine from machines performing callout checks.  The MFL language was
extended by a number of built-in functions.

   Since version 8.3 (2017-11-02) 'mailfromd' uses 'adns'(1) for DNS
queries.

   The version 8.7 released in July, 2020 introduced DKIM support.

   ---------- Footnotes ----------

   (1) <https://www.gnu.org/software/adns>


File: mailfromd.info,  Node: Acknowledgments,  Prev: History,  Up: Preface

Acknowledgments
===============

Many people need to be thanked for their assistance in developing and
debugging 'mailfromd'.  After S. C. Johnson, I can say that this program
"owes much to a most stimulating collection of users, who have goaded me
beyond my inclination, and frequently beyond my ability in their endless
search for "one more feature".  Their irritating unwillingness to learn
how to do things my way has usually led to my doing things their way;
most of the time, they have been right."

   A real test for a program like 'mailfromd' cannot be done but in
conditions of production environment.  A decision to try it in these
conditions is by no means an easy one, it requires courage and good
faith in the intentions and abilities of the author.  To begin with, I
would like to thank my contributors for these virtues.

   Jan Rafaj has intrepidly been using 'mailfromd' since its early
releases and invested lots of efforts in improving the program and its
documentation.  He is the author of many of the MFL library functions,
shipped with the package.  Some of his ideas are still waiting in my
implementation queue, while new ones are consistently arriving.

   Peter Markeloff patiently tested every 'mailfromd' release and helped
discover and fix many bugs.

   Zeus Panchenko contributed many ideas and gave lots of helpful
comments.  He offered invaluable help in debugging and testing
'mailfromd' on FreeBSD platform.

   Sergey Afonin proposed many improvements and new ideas.  He also
invested a lot of his time in finding bugs and testing bugfixes.

   John McEleney and Ben McKeegan contributed the token bucket filter
implementation (*note TBF::).

   Con Tassios helped to find and fix various bugs and contributed the
new implementation of the 'greylist' function (*note greylisting
types::).

   The following people (in alphabetical order) provided bug reports and
helpful comments for various versions of the program: Alan Dobkin, Brent
Spencer, Jeff Ballard, Nacho González López, Phil Miller, Simon
Christian, Thomas Lynch.


File: mailfromd.info,  Node: Intro,  Next: Building,  Prev: Preface,  Up: Top

1 Introduction to 'mailfromd'
*****************************

'Mailfromd' is a general-purpose mail filtering daemon and a suite of
accompanying utilities for 'Sendmail'(1), 'MeTA1'(2), 'Postfix'(3) or
any other MTA that supports 'Milter' (or 'Pmilter') protocol.  It is
able to filter both incoming and outgoing messages using a filter
program, written in "mail filtering language" (MFL).  The daemon
interfaces with the MTA using 'Milter' protocol.

   The name 'mailfromd' can be thought of as an abbreviation for '_Mail_
_F_iltering and _R_untime _M_odification' _D_aemon, with an 'o' for
itself.  Historically, it stemmed from the fact that the original
implementation was a simple filter implementing the "sender address
verification" technique.  Since then the program has changed
dramatically, and now it is actually a language translator and run-time
evaluator providing a set of built-in and library functions for
filtering electronic mail.

   The first part of this manual is an overview, describing the features
'mailfromd' offers in general.

   The second part is a tutorial, which provides an introduction for
those who have not used 'mailfromd' previously.  It moves from topic to
topic in a logical, progressive order, building on information already
explained.  It offers only the principal information needed to master
basic practical usage of 'mailfromd', while omitting many subtleties.

   The other parts are meant to be used as a reference for those who
know 'mailfromd' well enough, but need to look up some notions from time
to time.  Each chapter presents everything that needs to be said about a
specific topic.

   The manual assumes that the reader has a good knowledge of the SMTP
protocol and the mail transport system he uses ('Sendmail' , 'Postfix'
or 'MeTA1').

* Menu:

* Conventions::             Typographical conventions.
* Overview::                Mailfromd at a first glance
* SAV::                     Principles of Sender Address Verification.
* Rate Limit::              Controlling Mail Sending Rate.
* SPF::                     SPF, DKIM, and others.

   ---------- Footnotes ----------

   (1) See <http://www.sendmail.org>

   (2) See <http://www.meta1.org>

   (3) See <http://www.postfix.org>


File: mailfromd.info,  Node: Conventions,  Next: Overview,  Up: Intro

1.1 Typographical conventions
=============================

This manual is written using Texinfo, the GNU documentation formatting
language.  The same set of Texinfo source files is used to produce both
the printed and online versions of the documentation.  This section
briefly documents the typographical conventions used in this manual.

   Examples you would type at the command line are preceded by the
common shell primary prompt, '$'.  The command itself is printed 'in
this font', and the output it produces 'in this font', for example:

     $ mailfromd --version
     mailfromd (mailfromd 8.8)

   In the text, the command names are printed 'like this', command line
options are displayed in 'this font'.  Some notions are emphasized _like
this_, and if a point needs to be made strongly, it is done *this way*.
The first occurrence of a new term is usually its "definition" and
appears in the same font as the previous occurrence of "definition" in
this sentence.  File names are indicated like this: '/path/to/ourfile'.

   The variable names are represented LIKE THIS, keywords and fragments
of program text are written in 'this font'.


File: mailfromd.info,  Node: Overview,  Next: SAV,  Prev: Conventions,  Up: Intro

1.2 Overview of Mailfromd
=========================

In contrast to the most existing milter filters, 'mailfromd' does not
implement any default filtering policies.  Instead, it depends entirely
on a "filter script", supplied to it by the administrator.  The script,
written in a specialized and simple to use language, called MFL (*note
MFL::), is supposed to run a set of tests and to decide whether the
message should be accepted by the MTA or not.  To perform the tests, the
script can examine the values of 'Sendmail' macros, use an extensive set
of built-in and library functions, and invoke user-defined functions.


File: mailfromd.info,  Node: SAV,  Next: Rate Limit,  Prev: Overview,  Up: Intro

1.3 Sender Address Verification.
================================

"Sender address verification", or "callout", is one of the basic mail
verification techniques, implemented by 'mailfromd'.  It consists in
probing each MX server for the given address, until one of them gives a
definite (positive or negative) reply.  Using this technique you can
block a sender address if it is not deliverable, thereby cutting off a
large amount of spam.  It can also be useful to block mail for
undeliverable recipients, for example on a mail relay host that does not
have a list of all the valid recipient addresses.  This prevents
undeliverable junk mail from entering the queue, so that your MTA
doesn't have to waste resources trying to send 'MAILER-DAEMON' messages
back.

   Let's illustrate how it works on an example:

   Suppose that the user '<jsmith@somedomain.net>' is trying to send
mail to one of your local users.  The remote machine connects to your
MTA and issues 'MAIL FROM: <jsmith@somedomain.net>' command.  However,
your MTA does not have to take its word for it, so it uses 'mailfromd'
to verify the sender address validity.  'Mailfromd' strips the domain
name from the address ('somedomain.net') and queries DNS about 'MX'
records for that domain.  Suppose, it receives the following list

10             relay1.somedomain.net
20             relay2.somedomain.net

   It then connects to first MX server, using SMTP protocol, as if it
were going to send a message to '<jsmith@somedomain.net>'.  This is
called sending a "probe message".  If the server accepts the recipient
address, the 'mailfromd' accepts the incoming mail.  Otherwise, if the
server rejects the address, the mail is rejected as well.  If the MX
server cannot be connected, 'mailfromd' selects next server from the
list and continues this process until it finds the answer or the list of
servers is exhausted.

   The "probe message" is like a normal mail except that no data are
ever being sent.  The probe message transaction in our example might
look as follows ('S:' meaning messages sent by remote MTA, 'C:' meaning
those sent by 'mailfromd'):

     C: HELO mydomain.net
     S: 220 OK, nice to meet you
     C: MAIL FROM: <>
     S: 220 <>: Sender OK
     C: RCPT TO: <jsmith@somedomain.net>
     S: 220 <jsmith@remote.net>: Recipient OK
     C: QUIT

   Probe messages are never delivered, deferred or bounced; they are
always discarded.

   The described method of address verification is called a "standard"
method throughout this document.  'Mailfromd' also implements a method
we call "strict".  When using strict method, 'mailfromd' first resolves
IP address of sender machine to a fully qualified domain name.  Then it
obtains 'MX' records for this machine, and then proceeds with probing as
described above.

   So, the difference between the two methods is in the set of 'MX'
records that are being probed: standard method queries 'MX's based on
the sender email domain, strict method works with 'MX's for the sender
IP address.

   Strict method allows to cut off much larger amount of spam, although
it does have many drawbacks.  Returning to our example above, consider
the following situation: '<jsmith@somedomain.net>' is a perfectly normal
address, but it is being used by a spammer from some other domain, say
'otherdomain.com'.  The standard method is not able to cope with such
cases, whereas the strict one is.

   An alert reader will ask: what happens if 'mailfromd' is not able to
get a definite answer from any of MX servers?  Actually, it depends
entirely on how you will instruct it to act in this case, but the
general practice is to return temporary failure, which will urge the
remote party to retry sending their message later.

   After receiving a definite answer, 'mailfromd' will cache it in its
database, so that next time your MTA receives a message from that
address (or from the sender IP/email address pair, for strict method),
it will not waste its time trying to reach MX servers again.  The
records remain in the cache database for a certain time, after which
they are discarded.

* Menu:

* Limitations::


File: mailfromd.info,  Node: Limitations,  Up: SAV

1.3.1 Limitations of Sender Address Verification
------------------------------------------------

Before deciding whether and how to use sender address verification, you
should be aware of its limitations.

   Both standard and strict methods suffer from the following
limitations:

   * The sender verification methods will perform poorly on highly
     loaded sites.  The traffic and/or resource usage overhead may not
     be feasible for you.  However, you may experiment with various
     'mailfromd' options to find an optimal configuration.

   * Some sites may blacklist your MTA if it probes them too often.
     'Mailfromd' eliminates this drawback by using a "cache database",
     which keeps results of the recent callouts.

   * When verifying the remote address, no attempt to actually deliver
     the message is made.  If MTA accepts the address, 'mailfromd'
     assumes it is OK. However in reality, a mail for a remote address
     can bounce _after_ the nearest MTA accepts the recipient address.

     This drawback can often be avoided by combining sender address
     verification with greylisting (*note Greylisting::).

   * If the remote server rejects the address, no attempt is being made
     to discern between various reasons for rejection (client rejected,
     'HELO rejected', 'MAIL FROM' rejected, etc.)

   * Some major sites such as 'yahoo.com' do not reject unknown
     addresses in reply to the 'RCPT TO' command, but report a delivery
     failure in response to end of 'DATA' after a message is
     transferred.  Of course, sender address verification does not work
     with such sites.  However, a combination of address verification
     and greylisting (*note Greylisting::) may be a good choice in such
     cases.

   In addition, strict verification breaks forward mail delivery.  This
is obvious, since mail forwarding is based on delivering unmodified
message to another location, so the sender address domain will most
probably not be the same as that of the MTA doing the forwarding.


File: mailfromd.info,  Node: Rate Limit,  Next: SPF,  Prev: SAV,  Up: Intro

1.4 Controlling Mail Sending Rate.
==================================

"Mail Sending Rate" for a given identity is defined as the number of
messages with this identity received within a predefined interval of
time.

   MFL offers a set of functions for limiting mail sending rate (*note
Rate limiting functions::), and for controlling broader rate aspects,
such as data transfer rates (*note TBF::).


File: mailfromd.info,  Node: SPF,  Prev: Rate Limit,  Up: Intro

1.5 SPF, DKIM, and others
=========================

"Sender Policy Framework", or SPF for short, is an extension to SMTP
protocol that allows to identify forged identities supplied with the
'MAIL FROM' and 'HELO' commands.  The framework is explained in detail
in RFC 4408 (<http://tools.ietf.org/html/rfc4408>) and on the SPF
Project Site (http://www.openspf.org/).

   Mailfromd provides a set of functions for using SPF to control mail
flow.  These are described in *note SPF Functions::.

   "DomainKeys Identified Mail" (DKIM) is an email authentication method
designed to detect forged sender addresses in emails.  Mailfromd
supports both DKIM signing and verification.  *Note DKIM::, for a
detailed description of these features.

   Mailfromd also provides support for several third-party
spam-abatement programs, in particular 'SpamAssassin', 'ClamAV', and
DSPAM.  These are discussed in *note Interfaces to Third-Party
Programs::.


File: mailfromd.info,  Node: Building,  Next: Tutorial,  Prev: Intro,  Up: Top

2 Building the Package
**********************

This chapter contains a detailed list of steps you need to undertake in
order to configure and build the package.

  1. Make sure you have the necessary software installed.

     To build 'mailfromd' you will need to have following packages on
     your machine:

       A. GNU mailutils version 3.3 or newer.

          GNU mailutils is a general-purpose library for handling
          electronic mail.  It is available from <http://mailutils.org>.

       B. GNU adns library, version 1.5.1 or newer.

          GNU adns is an advanced DNS client library.  The recent
          version can be downloaded from
          <http://www.chiark.greenend.org.uk/~ian/adns/adns.tar.gz>.
          Visit <http://www.gnu.org/software/adns>, for more
          information.

       C. A DBM library.  'Mailfromd' is able to link with any flavor of
          DBM supported by GNU mailutils.  As of version 8.8 it will
          refuse to build without DBM.  By default, 'configure' will try
          to find the best implementation installed on your machine
          (preference is given to Berkeley DB) and will use it.  You
          can, however, explicitly specify which implementation you want
          to use.  To do so, use the '--with-dbm' configure option.  Its
          argument specifies the "type" of database to use.  It must be
          one of the types supported by GNU mailutils.  At the time of
          this writing, these are:

          bdb
               Berkeley DB (versions 2 to 6).
          gdbm
               GNU DBM.
          kc
               Kyoto Cabinet
          tc
               Tokyo Cabinet
          ndbm
               NDBM

          To check what database types are supported by your version of
          mailutils, run the following command:

               $ mailutils dbd gdbm kc tc ndbm

          For backward compatibility, 'configure' accepts the following
          two options:

          '--with-gdbm'
               Same as '--with-dbm=gdbm'.
          '--with-berkeley-db'
               Same as '--with-dbm=bdb'.

          For 'Sendmail' users, it often makes sense to configure
          'mailfromd' to use the same database flavor as 'sendmail'.
          The following table will help you do that.  The column 'DB
          type' lists types of DBM databases supported by 'mailfromd'.
          The column 'confMAPDEF' lists the value of 'confMAPDEF'
          Sendmail configuration macro corresponding to that database
          type.  The column 'configure option' contains the
          corresponding option to configure.

          DB type            confMAPDEF         configure option
          ---------------------------------------------------------------------------
          NDBM               '-NNDBM'           '--with-dbm=ndbm'
          Berkeley DB        '-NNEWDB'          '--with-dbm=bdb'
          GDBM               N/A                '--with-dbm=gdbm'

  2. Decide what user privileges will be used to run 'mailfromd'

     After startup, the program drops root privileges.  By default, it
     switches to the privileges of user 'mail', group 'mail'.  If there
     is no such user on your system, or you wish to use another user
     account for this purpose, override it using DEFAULT_USER
     environment variable.  For example for 'mailfromd' to run as user
     'nobody', use

          ./configure DEFAULT_USER=nobody

     The user name can also be changed at run-time (*note --user::).

  3. Decide where to install 'mailfromd' and where its filter script and
     data files will be located.

     As usual, the default value for the installation prefix is
     '/usr/local'.  If it does not suit you, specify another location
     using '--prefix' option, e.g.: '--prefix=/usr'.

     During installation phase, the build system will install several
     files.  These files are:

     'PREFIX/sbin/mailfromd'
          Main daemon.  *Note mailfromd: Invocation.

     'PREFIX/etc/mailfromd.mf'
          Default main filter script file.  It is installed only if it
          is not already there.  Thus, if you are upgrading to a newer
          version of 'mailfromd', your old script file will be preserved
          with all your changes.

          *Note MFL::, for a description of the mail filtering language.

     'PREFIX/share/mailfromd/8.8/*.mf'
          MFL modules.  *Note Modules::.

     'PREFIX/info/mailfromd.info*'
          Documentation files.

     'PREFIX/bin/mtasim'
          MTA simulator program for testing 'mailfromd' scripts.  *Note
          mtasim::.

     'PREFIX/sbin/pmult'
          Pmilter multiplexor for 'MeTA1'.  *Note pmult::.  It is build
          only if 'MeTA1' version 'PreAlpha29.0' or newer is installed
          on the system.  You may disable it by using the
          '--disable-pmilter' command line option.

          When testing for 'MeTA1' presence, 'configure' assumes its
          default location.  If it is not found there, inform
          'configure' about its actual location by using the following
          option:

               --enable-pmilter=PREFIX

          where PREFIX stands for the 'MeTA1' installation prefix.

     It is advisable to use the same settings for file name prefixes as
     those you used when configuring 'mailutils'.  In particular, try to
     use the same '--sysconfdir', since it will facilitate configuring
     the whole system.

     Another important point is location of "local state directory",
     i.e.  a directory where 'mailfromd' keeps its data files (e.g.
     communication socket, PID-file and database files).  By default,
     its full name is 'LOCALSTATEDIR/mailfromd'.  You can change it by
     setting 'DEFAULT_STATE_DIR' configuration variable.  This value can
     be changed at run-time using the 'state-directory' configuration
     statement (*note state-directory: conf-base.).

  4. Select default communication socket.  This is the socket used to
     communicate with MTA, in the usual 'Milter' port notation (*note
     milter port specification::).  If the socket name does not begin
     with a protocol or directory separator, it is assumed to be a UNIX
     socket, located in the local state directory.  The default value is
     'mailfrom', which is equivalent to
     'unix:LOCALSTATEDIR/mailfromd/mailfrom'.

     To alter this, use 'DEFAULT_SOCKET' environment variable, e.g.:

          ./configure DEFAULT_SOCKET=inet:999@localhost

     The communication socket can be changed at run time using '--port'
     command line option (*note --port::) or the 'listen' configuration
     statement (*note listen: conf-server.).

  5. Select default expiration interval.  "Expiration interval" defines
     the period of time during which a record in the 'mailfromd'
     database is considered valid.  It is described in more detail in
     *note Databases::.  The default value is 86400 seconds, i.e.  24
     hours.  It is OK for most sites.  If, however, you wish to change
     it, use DEFAULT_EXPIRE_INTERVAL environment variable.

     The 'DEFAULT_EXPIRE_RATES_INTERVAL' variable sets default
     expiration time for mail rate database (*note Rate limiting
     functions::).

     Expiration settings can be changed at run time using 'database'
     statement in the 'mailfromd' configuration file (*note
     conf-database::).

  6. Select a 'syslog' implementation to use.

     'Mailfromd' uses 'syslog' for diagnostics output.  The default
     'syslog' implementation on most systems (most notably, on
     GNU/Linux) uses blocking 'AF_UNIX SOCK_DGRAM' sockets.  As a
     result, when an application calls 'syslog()', and 'syslogd' is not
     responding and the socket buffers get full, the application will
     hang.

     For 'mailfromd', as for any daemon, it is more important that it
     continue to run, than that it continue to log.  For this purpose,
     'mailfromd' is shipped with a non-blocking 'syslog' implementation
     by Simon Kelley.  This implementation, instead of blocking, buffers
     log lines in memory.  When the buffer log overflows, some lines are
     lost, but the daemon continues to run.  When lines are lost, this
     fact is logged with a message of the form:

             async_syslog overflow: 5 log entries lost

     To enable this implementation, configure the package with
     '--enable-syslog-async' option, e.g.:

          ./configure --enable-syslog-async

     Additionally, you can instruct 'mailfromd' to use asynchronous
     syslog by default.  To do so, set 'DEFAULT_SYSLOG_ASYNC' to 1, as
     shown in example below:

          ./configure --enable-syslog-async DEFAULT_SYSLOG_ASYNC=1

     You will be able to override these defaults at run-time by using
     the '--logger' command line option (*note Logging and Debugging::).

  7. Run 'configure' with all the desired options.

     For example, the following command:

          ./configure DEFAULT_SOCKET=inet:999@localhost --with-berkeley-db=3

     will configure the package to use Berkeley DB database, version 2,
     and 'inet:999@localhost' as the default communication socket.

     At the end of its run 'configure' will print a concise summary of
     its configuration settings.  It looks like that (with the long
     lines being split for readability):

          *******************************************************************
          Mailfromd configured with the following settings:

          External preprocessor..................... /usr/bin/m4 -s
          DBM version............................... Berkeley DB v. 3
          Default user.............................. mail
          State directory...........................
                     $(localstatedir)/$(PACKAGE)
          Socket.................................... mailfrom
          Expiration interval....................... 86400
          Negative DNS answer expiration interval... 3600
          Rates expire interval..................... 300
          Default syslog implementation............. blocking
          Readline (for mtasim)..................... yes
          Documentation rendition type.............. PROOF
          Enable pmilter support.................... no
          Enable GeoIP support...................... no
          *******************************************************************

     Make sure these settings satisfy your needs.  If they do not,
     reconfigure the package with the right options.

  8. Run 'make'.

  9. Run 'make' install.

  10. Make sure 'LOCALSTATEDIR/mailfromd' has the right owner and mode.

  11. Examine filter script file ('SYSCONFDIR/mailfromd.mf') and edit
     it, if necessary.

  12. If you are upgrading from an earlier release of Mailfromd, refer
     to *note Upgrading::, for detailed instructions.


File: mailfromd.info,  Node: Tutorial,  Next: MFL,  Prev: Building,  Up: Top

3 Tutorial
**********

This chapter contains a tutorial introduction, guiding you through
various 'mailfromd' configurations, starting from the simplest ones and
proceeding up to more advanced forms.  It omits most complicated
details, concentrating mainly on the common practical tasks.

   If you are familiar to 'mailfromd', you can skip this chapter and go
directly to the next one (*note MFL::), which contains detailed
discussion of the mail filtering language and 'mailfromd' interaction
with the Mail Transport Agent.

* Menu:

* Start Up::
* Simplest Configurations::
* Conditional Execution::
* Functions and Modules::
* Domain Name System::
* Checking Sender Address::
* SMTP Timeouts::
* Avoiding Verification Loops::
* HELO Domain::
* rset::
* Controlling Number of Recipients::
* Sending Rate::
* Greylisting::
* Local Account Verification::
* Databases::
* Testing Filter Scripts::
* Run Mode::
* Logging and Debugging::
* Runtime errors::
* Notes::


File: mailfromd.info,  Node: Start Up,  Next: Simplest Configurations,  Up: Tutorial

3.1 Start Up
============

The 'mailfromd' utility runs as a standalone "daemon" program and
listens on a predefined communication channel for requests from the
"Mail Transfer Agent" (MTA, for short).  When processing each message,
the MTA installs communication with 'mailfromd', and goes through
several states, collecting the necessary data from the sender.  At each
state it sends the relevant information to 'mailfromd', and waits for it
to reply.  The 'mailfromd' filter receives the message data through
"Sendmail macros" and runs a "handler program" defined for the given
state.  The result of this run is a "response code", that it returns to
the MTA.  The following response codes are defined:

'continue'
     Continue message processing.

'accept'
     Accept this message for delivery.  After receiving this code the
     MTA continues processing this message without further consulting
     'mailfromd' filter.

'reject'
     Reject this message.  The message processing stops at this stage,
     and the sender receives the reject reply ('5XX' reply code).  No
     further 'mailfromd' handlers are called for this message.

'discard'
     Silently discard the message.  This means that MTA will continue
     processing this message as if it were going to deliver it, but will
     discard it after receiving.  No further interaction with
     'mailfromd' occurs.

'tempfail'
     Temporarily reject the message.  The message processing stops at
     this stage, and the sender receives the 'temporary failure' reply
     ('4XX' reply code).  No further 'mailfromd' handlers are called for
     this message.

   The instructions on how to process the message are supplied to
'mailfromd' in its "filter script file".  It is normally called
'/usr/local/etc/mailfromd.mf' (but can be located elsewhere, *note
Invocation::) and contains a set of "milter state handlers", or
subroutines to be executed in various SMTP states.  Each interaction
state can be supplied its own handling procedure.  A missing procedure
implies 'continue' response code.

   The filter script can define up to nine "milter state handlers",
called after the names of milter states: 'connect', 'helo', 'envfrom',
'envrcpt', 'data', 'header', 'eoh', 'body', and 'eom'.  The 'data'
handler is invoked only if MTA uses Milter protocol version 3 or later.
Two special handlers are available for initialization and clean-up
purposes: 'begin' is called before the processing starts, and 'end' is
called after it is finished.  The diagram below shows the control flow
when processing an SMTP transaction.  Lines marked with 'C:' show SMTP
commands issued by the remote machine (the "client"), those marked with
'=>' show called handlers with their arguments.  An '[R]' appearing at
the start of a line indicates that this part of the transaction can be
repeated any number of times:

     => begin()
     => connect(HOSTNAME, FAMILY, PORT, 'IP address')
     C: HELO DOMAIN
     helo(DOMAIN)
     for each message transaction
     do
             C: MAIL FROM SENDER
             => envfrom(SENDER)

     [R]     C: RCPT TO RECIPIENT
             => envrcpt(RECIPIENT)

             C: DATA
             => data()
     [R]     C: HEADER: VALUE
             => header(HEADER, VALUE)

             C:
             => eoh()

     [R]     C: BODY-LINE
             => /* Collect lines into blocks BLK of
             =>  * at most LEN bytes and for each
             =>  * such block call:
             =>  */
             => body(BLK, LEN)

             C: .
             => eom()
     done
     => end()

Figure 3.1: Mailfromd Control Flow

   This control flow is maintained for as long as each called handler
returns 'continue' (*note Actions::).  Otherwise, if any handler returns
'accept' or 'discard', the message processing continues, but no other
handler is called.  In the case of 'accept', the MTA will accept the
message for delivery, in the case of 'discard' it will silently discard
it.

   If any of the handlers returns 'reject' or 'tempfail', the result
depends on the handler.  If this code is returned by 'envrcpt' handler,
it causes this particular recipient address to be rejected.  When
returned by any other handler, it causes the whole message will be
rejected.

   The 'reject' and 'tempfail' actions executed by 'helo' handler do not
take effect immediately.  Instead, their action is deferred until the
next SMTP command from the client, which is usually 'MAIL FROM'.


File: mailfromd.info,  Node: Simplest Configurations,  Next: Conditional Execution,  Prev: Start Up,  Up: Tutorial

3.2 Simplest Configurations
===========================

The 'mailfromd' script file contains a series of "declarations" of the
handler procedures.  Each declaration has the form:

     prog NAME
     do
       ...
     done

where 'prog', 'do' and 'done' are the "keywords", and NAME is the state
name for this handler.  The dots in the above example represent the
actual "code", or a set of commands, instructing 'mailfromd' how to
process the message.

   For example, the declaration:

     prog envfrom
     do
       accept
     done

installs a handler for 'envfrom' state, which always approves the
message for delivery, without any further interaction with 'mailfromd'.

   The word 'accept' in the above example is an "action".  "Action" is a
special language statement that instructs the run-time engine to stop
execution of the program and to return a response code to the
'Sendmail'.  There are five actions, one for each response code:
'continue', 'accept', 'reject', 'discard', and 'tempfail'.  Among these,
'reject' and 'discard' can optionally take one to three arguments.
There are two ways of supplying the arguments.

   In the first form, called "literal" or "traditional" notation, the
arguments are supplied as additional words after the action name,
separated by whitespace.  The first argument is a three-digit RFC 2821
reply code.  It must begin with '5' for 'reject' and with '4' for
'tempfail'.  If two arguments are supplied, the second argument must be
either an "extended reply code" (RFC 1893/2034) or a textual string to
be returned along with the SMTP reply.  Finally, if all three arguments
are supplied, then the second one must be an extended reply code and the
third one must supply the textual string.  The following examples
illustrate all possible ways of using the 'reject' statement in literal
notation:

     reject
     reject 503
     reject 503 5.0.0
     reject 503 "Need HELO command"
     reject 503 5.0.0 "Need HELO command"

Please note the quotes around the textual string.

   Another form for these action is called "functional" notation,
because it resembles the function syntax.  When used in this form, the
action word is followed by a parenthesized group of exactly three
arguments, separated by commas.  The meaning and ordering of the
argument is the same as in literal form.  Any of three arguments may be
absent, in which case it will be replaced by the default value.  To
illustrate this, here are the statements from the previous example,
written in functional notation:

     reject(,,)
     reject(503,,)
     reject(503, 5.0.0)
     reject(503,, "Need HELO command")
     reject(503, 5.0.0, "Need HELO command")


File: mailfromd.info,  Node: Conditional Execution,  Next: Functions and Modules,  Prev: Simplest Configurations,  Up: Tutorial

3.3 Conditional Execution
=========================

Programs consisting of a single action are rarely useful.  In most cases
you will want to do some checking and decide whether to process the
message depending on its result.  For example, if you do not want to
accept messages from the address '<badguy@some.net>', you could write
the following program:

     prog envfrom
     do
       if $f = "badguy@some.net"
         reject
       else
         accept
       fi
     done

   This example illustrates several important concepts.  First or all,
'$f' in the third line is a "Sendmail macro reference".  Sendmail macros
are referenced the same way as in 'sendmail.cf', with the only
difference that curly braces around macro names are optional, even if
the name consists of several letters.  The value of a macro reference is
always a string.

   The equality operator ('=') compares its left and right arguments and
evaluates to true if the two strings are exactly the same, or to false
otherwise.  Apart from equality, you can use the regular relational
operators: '!=', '>', '>=', '<' and '<='.  Notice that string comparison
in 'mailfromd' is always case sensitive.  To do case-insensitive
comparison, translate both operands to upper or lower case (*Note
tolower::, and *note toupper::).

   The 'if' statement decides what actions to execute depending on the
value its condition evaluates to.  Its usual form is:

     if EXPRESSION THEN-BODY [else ELSE-BODY] fi

   The THEN-BODY is executed if the EXPRESSION evaluates to 'true' (i.e.
to any non-zero value).  The optional ELSE-BODY is executed if the
EXPRESSION yields 'false' (i.e.  zero).  Both THEN-BODY and ELSE-BODY
can contain other 'if' statements, their nesting depth is not limited.
To facilitate writing complex conditional statements, the 'elif' keyword
can be used to introduce alternative conditions, for example:

     prog envfrom
     do
       if $f = "badguy@some.net"
         reject
       elif $f = "other@domain.com"
         tempfail 470 "Please try again later"
       else
         accept
       fi
     done

   *Note switch::, for more elaborate forms of conditional branching.


File: mailfromd.info,  Node: Functions and Modules,  Next: Domain Name System,  Prev: Conditional Execution,  Up: Tutorial

3.4 Functions and Modules
=========================

As any programming language, MFL supports a concept of "function", i.e.
a body of code that is assigned a unique name and can be invoked
elsewhere as many times as needed.

   All functions have a "definition" that introduces types and names of
the formal parameters and the result type, if the function is to return
a meaningful value (function definitions in MFL are discussed in detail
in *note User-Defined Functions: User-defined.).

   A function is invoked using a special construct, a "function call":

      NAME (ARG-LIST)

where NAME is the function name, and ARG-LIST is a comma-separated list
of expressions.  Each expression in ARG-LIST is evaluated, and its type
is compared with that of the corresponding formal argument.  If the
types differ, the expression is converted to the formal argument type.
Finally, a copy of its value is passed to the function as a
corresponding argument.  The order in which the expressions are
evaluated is not defined.  The compiler checks that the number of
elements in ARG-LIST match the number of mandatory arguments for
function NAME.

   If the function does not deliver a result, it should only be called
as a statement.

   Functions may be recursive, even mutually recursive.

   'Mailfromd' comes with a rich set of predefined functions for various
purposes.  There are two basic function classes: "built-in" functions,
that are implemented by the MFL runtime environment in 'mailfromd', and
"library" functions, that are implemented in MFL.  The built-in
functions are always available and no preparatory work is needed before
calling them.  In contrast, the library functions are defined in
"modules", special MFL source files that contain functions designed for
a particular task.  In order to access a library function, you must
first "require" a module it is defined in.  This is done using 'require'
statement.  For example, the function 'hostname' looks up in the DNS the
name corresponding to the IP address specified as its argument.  This
function is defined in module 'dns.mf', so before calling it you must
require this module:

     require dns

The 'require' statement takes a single argument: the name of the
requested module (without the '.mf' suffix).  It looks up the module on
disk and loads it if it is available.

   For more information about the module system *Note Modules::.


File: mailfromd.info,  Node: Domain Name System,  Next: Checking Sender Address,  Prev: Functions and Modules,  Up: Tutorial

3.5 Domain Name System
======================

Site administrators often do not wish to accept mail from hosts that do
not have a proper reverse delegation in the Domain Name System.  In the
previous section we introduced the library function 'hostname', that
looks up in the DNS the name corresponding to the IP address specified
as its argument.  If there is no corresponding name, the function
returns its argument unchanged.  This can be used to test if the IP was
resolved, as illustrated in the example below:

     require 'dns'

     prog envfrom
     do
       if hostname($client_addr) = $client_addr
         reject
       fi
     done

   The '#require dns' statement loads the module 'dns.mf', after which
the definition of 'hostname' becomes available.

   A similar function, 'resolve', which resolves the symbolic name to
the corresponding IP address is provided in the same 'dns.mf' module.


File: mailfromd.info,  Node: Checking Sender Address,  Next: SMTP Timeouts,  Prev: Domain Name System,  Up: Tutorial

3.6 Checking Sender Address
===========================

A special language construct is provided for verification of sender
addresses ("callout"):

     on poll $f do
     when success:
       accept
     when not_found or failure:
       reject 550 5.1.0 "Sender validity not confirmed"
     when temp_failure:
       tempfail 450 4.1.0 "Try again later"
     done

   The 'on poll' construct runs standard verification (*note standard
verification::) for the email address specified as its argument (in the
example above it is the value of the Sendmail macro '$f').  The check
can result in the following conditions:

'success'
     The address exists.

'not_found'
     The address does not exist.

'failure'
     Some error of permanent nature occurred during the check.  The
     existence of the address cannot be verified.

'temp_failure'
     Some temporary failure occurred during the check.  The existence of
     the address cannot be verified at the moment.

   The 'when' branches of the 'on poll' statement introduce statements,
that are executed depending on the actual return condition.  If any
condition occurs that is not handled within the 'on' block, the run-time
evaluator will signal an "exception"(1) and return temporary failure,
therefore it is advisable to always handle all four conditions.  In
fact, the condition handling shown in the above example is preferable
for most normal configurations: the mail is accepted if the sender
address is proved to exist and rejected otherwise.  If a temporary
failure occurs, the remote party is urged to retry the transaction some
time later.

   The 'poll' statement itself has a number of options that control the
type of the verification.  These are discussed in detail in *note
poll::.

   It is worth noticing that there is one special email address which is
always available on any host, it is the "null address" '<>' used in
error reporting.  It is of no use verifying its existence:

     prog envfrom
     do
       if $f == ""
         accept
       else
         on poll $f do
         when success:
           accept
         when not_found or failure:
           reject 550 5.1.0 "Sender validity not confirmed"
         when temp_failure:
           tempfail 450 4.1.0 "Try again later"
         done
       fi
     done

   ---------- Footnotes ----------

   (1) For more information about exceptions and their handling, please
refer to *note Exceptions::.


File: mailfromd.info,  Node: SMTP Timeouts,  Next: Avoiding Verification Loops,  Prev: Checking Sender Address,  Up: Tutorial

3.7 SMTP Timeouts
=================

When using polling functions, it is important to take into account
possible delays, which can occur in SMTP transactions.  Such delays may
be due to low network bandwidth or high load on the remote server.  Some
sites impose them willingly, as a spam-fighting measure.

   Ideally the callout verification should use the timeout values
defined in the RFC 2822, but this is impossible in practice, because it
would cause a "timeout escalation", which consists in propagating delays
encountered in a callout SMTP session back to the remote client whose
session initiated the callout.

   Consider, for example, the following scenario.  An MFL script
performs a callout on 'envfrom' stage.  The remote server is overloaded
and delays heavily in responding, so that the initial response arrives 3
minutes after establishing the connection, and processing the 'EHLO'
command takes another 3 minutes.  These delays are OK according to the
RFC, which imposes a 5 minute limit for each stage, but while waiting
for the remote reply our SMTP server remains in the 'envfrom' state with
the client waiting for a response to its 'MAIL' command more than 6
minutes, which is intolerable, because of the same 5 minute limit.
Thus, the client will almost certainly break the session.

   To avoid this, 'mailfromd' uses a special instance, called "callout
server", which is responsible for running callout SMTP sessions
asynchronously.  The usual sender verification is performed using
so-called "soft" timeout values, which are set to values short enough to
not disturb the incoming session (e.g.  a timeout for 'HELO' response is
3 seconds, instead of 5 minutes).  If this verification yields a
definite answer, that answer is stored in the cache database and
returned to the calling procedure immediately.  If, however, the
verification is aborted due to a timeout, the caller procedure is
returned an 'e_temp_failure' exception, and the callout is scheduled for
processing by a callout server.  This exception normally causes the
milter session to return a temporary error to the sender, urging it to
retry the connection later.

   In the meantime, the callout server runs the sender verification
again using another set of timeouts, called "hard" timeouts, which are
normally much longer than 'soft' ones (they default to the values
required by RFC 2822).  If it gets a definitive result (e.g.  'email
found' or 'email not found'), the server stores it in the cache
database.  If the callout ends due to a timeout, a 'not_found' result is
stored in the database.

   Some time later, the remote server retries the delivery, and the
'mailfromd' script is run again.  This time, the callout function will
immediately obtain the already cached result from the database and
proceed accordingly.  If the callout server has not finished the request
by the time the sender retries the connection, the latter is again
returned a temporary error, and the process continues until the callout
is finished.

   Usually, callout server is just another instance of 'mailfromd'
itself, which is started automatically to perform scheduled SMTP
callouts.  It is also possible to set up a separate callout server on
another machine.  This is discussed in *note calloutd::.

   For a detailed information about callout timeouts and their
configuration, see *note conf-timeout::.

   For a description of how to configure 'mailfromd' to use callout
servers, see *note conf-server::.


File: mailfromd.info,  Node: Avoiding Verification Loops,  Next: HELO Domain,  Prev: SMTP Timeouts,  Up: Tutorial

3.8 Avoiding Verification Loops
===============================

An 'envfrom' program consisting only of the 'on poll' statement will
work smoothly for incoming mails, but will create infinite loops for
outgoing mails.  This is because upon sending an outgoing message
'mailfromd' will start the verification procedure, which will initiate
an SMTP transaction with the same mail server that runs it.  This
transaction will in turn trigger execution of 'on poll' statement, etc.
ad infinitum.  To avoid this, any properly written filter script should
not run the verification procedure on the email addresses in those
domains that are relayed by the server it runs on.  This can be achieved
using 'relayed' function.  The function returns 'true' if its argument
is contained in one of the predefined "domain list" files.  These files
correspond to 'Sendmail' plain text files used in 'F' class definition
forms (see 'Sendmail Installation and Operation Guide', chapter 5.3),
i.e.  they contain one domain name per line, with empty lines and lines
started with '#' being ignored.  The domain files consulted by 'relayed'
function are defined in the 'relayed-domain-file' configuration file
statement (*note relayed-domain-file: conf-base.):

     relayed-domain-file (/etc/mail/local-host-names,
                          /etc/mail/relay-domains);

or:

     relayed-domain-file /etc/mail/local-host-names;
     relayed-domain-file /etc/mail/relay-domains;

   The above example declares two domain list files, most commonly used
in 'Sendmail' installations to keep hostnames of the server (1) and
names of the domains, relayed by this server(2).

   Given all this, we can improve our filter program:

     require 'dns'

     prog envfrom
     do
       if $f == ""
         accept
       elif relayed(hostname(${client_addr}))
         accept
       else
         on poll $f do
         when success:
           accept
         when not_found or failure:
           reject 550 5.1.0 "Sender validity not confirmed"
         when temp_failure:
           tempfail 450 4.1.0 "Try again later"
         done
       fi
     done

   If you feel that your Sendmail's relayed domains are not restrictive
enough for 'mailfromd' filters (for example you are relaying mails from
some third-party servers), you can use a database of trusted mail server
addresses.  If the number of such servers is small enough, a single 'or'
statement can be used, e.g.:

       elif ${client_addr} = "10.10.10.1"
            or ${client_addr} = "192.168.11.7"
         accept
       ...

otherwise, if the servers' IP addresses fall within one or several
CIDRs, you can use the 'match_cidr' function (*note Internet address
manipulation functions::), e.g.:

       elif match_cidr (${client_addr}, "199.232.0.0/16")
         accept
       ...

or combine both methods.  Finally, you can keep a DBM database of
relayed addresses and use 'dbmap' or 'dbget' function for checking
(*note Database functions::).

       elif dbmap("%__statedir__/relay.db", ${client_addr})
         accept
       ...

   ---------- Footnotes ----------

   (1) class 'w', see 'Sendmail Installation and Operation Guide',
chapter 5.2.

   (2) class 'R'


File: mailfromd.info,  Node: HELO Domain,  Next: rset,  Prev: Avoiding Verification Loops,  Up: Tutorial

3.9 HELO Domain
===============

Some of the mail filtering conditions may depend on the value of "helo
domain" name, i.e.  the argument to the SMTP 'EHLO' (or 'HELO') command.
If you ever need such conditions, take into account the following
caveats.  Firstly, although 'Sendmail' passes the helo domain in '$s'
macro, it does not do this consistently.  In fact, the '$s' macro is
available only to the 'helo' handler, all other handlers won't see it,
no matter what the value of the corresponding 'Milter.macros.HANDLER'
statement.  So, if you wish to access its value from any handler, other
than 'helo', you will have to store it in a "variable" in the 'helo'
handler and then use this variable value in the other handler.  This
approach is also recommended for another MTAs.  This brings us to the
concept of variables in 'mailfromd' scripts.

   A variable is declared using the following syntax:

     TYPE NAME

where VARIABLE is the variable name and TYPE is 'string', if the
variable is to hold a string value, and 'number', if it is supposed to
have a numeric value.

   A variable is assigned a value using the 'set' statement:

     set NAME EXPR

where EXPR is any valid MFL expression.

   The 'set' statement can occur within handler or function declarations
as well as outside of them.

   There are two kinds of 'Mailfromd' variables: "global variables",
that are visible to all handlers and functions, and "automatic
variables", that are available only within the handler or function where
they are declared.  For our purpose we need a global variable (*Note
Variable classes: Variables, for detailed descriptions of both kinds of
variables).

   The following example illustrates an approach that allows to use the
'HELO' domain name in any handler:

     # Declare the helohost variable
     string helohost

     prog helo
     do
       # Save the host name for further use
       set helohost $s
     done

     prog envfrom
     do
       # Reject hosts claiming to be localhost
       if helohost = "localhost"
         reject 570 "Please specify real host name"
       fi
     done

   Notice, that for this approach to work, your MTA must export the 's'
macro (e.g., in case of Sendmail, the 'Milter.macros.helo' statement in
the 'sendmail.cf' file must contain 's'.  *note Sendmail::).  This
requirement can be removed by using the "handler argument" of 'helo'.
Each 'mailfromd' handler is given one or several arguments.  The exact
number of arguments and their meaning are handler-specific and are
described in *note Handlers::, and *note Figure 3.1:
milter-control-flow.  The arguments are referenced by their ordinal
number, using the notation '$N'.  The 'helo' handler takes one argument,
whose value is the helo domain.  Using this information, the 'helo'
handler from the example above can be rewritten as follows:

     prog helo
     do
       # Save the host name for further use
       set helohost $1
     done


File: mailfromd.info,  Node: rset,  Next: Controlling Number of Recipients,  Prev: HELO Domain,  Up: Tutorial

3.10 SMTP RSET and Milter Abort Handling
========================================

In previous section we have used a global variable to hold certain
information and share it between handlers.  In the majority of cases,
such information is session specific, and becomes invalid if the remote
party issues the SMTP 'RSET' command.  Therefore, 'mailfromd' clears all
global variables when it receives a Milter 'abort' request, which is
normally generated by this command.

   However, you may need some variables that retain their values even
across SMTP session resets.  In 'mailfromd' terminology such variables
are called "precious".  Precious variables are declared by prefixing
their declaration with the keyword 'precious'.  Consider, for example,
this snippet of code:

     precious number rcpt_counter

     prog envrcpt
     do
       set rcpt_counter rcpt_counter + 1
     done

   Here, the variable 'rcpt_counter' is declared as precious and its
value is incremented each time the 'envrcpt' handler is called.  This
way, 'rcpt_counter' will keep the total number of SMTP 'RCPT' commands
issued during the session, no matter how many times it was restarted
using the 'RSET' command.


File: mailfromd.info,  Node: Controlling Number of Recipients,  Next: Sending Rate,  Prev: rset,  Up: Tutorial

3.11 Controlling Number of Recipients
=====================================

Any MTA provides a way to limit the number of recipients per message.
For example, in 'Sendmail' you may use the 'MaxRecipientsPerMessage'
option(1).  However, such methods are not flexible, so you are often
better off using 'mailfromd' for this purpose.

   'Mailfromd' keeps the number of recipients collected so far in
variable 'rcpt_count', which can be controlled in 'envrcpt' handler as
shown in the example below:

     prog envrcpt
     do
       if rcpt_count > 10
         reject 550 5.7.1 "Too many recipients"
       fi
     done

   This filter will accept no more than 10 recipients per message.  You
may achieve finer granularity by using additional conditions.  For
example, the following code will allow any number of recipients if the
mail is coming from a domain relayed by the server, while limiting it to
10 for incoming mail from other domains:

     prog envrcpt
     do
       if not relayed(hostname($client_addr)) and rcpt_count > 10
         reject 550 5.7.1 "Too many recipients"
       fi
     done

   There are three important features to notice in the above code.
First of all, it introduces two "boolean" operators: 'and', which
evaluates to 'true' only if both left-side and right-side expressions
are 'true', and 'not', which reverses the value of its argument.

   Secondly, the scope of an operation is determined by its
"precedence", or "binding strength".  'Not' binds more tightly than
'and', so its scope is limited by the next expression between it and
'and'.  Using parentheses to underline the operator scoping, the above
'if' condition can be rewritten as follows:

         if (not (relayed(hostname($client_addr)))) and (%rcpt_count > 10)

   Finally, it is important to notice that all boolean expressions are
computed using "shortcut evaluation".  To understand what it is, let's
consider the following expression: 'X and Y'.  Its value is 'true' only
if both X and Y are 'true'.  Now suppose that we evaluate the expression
from left to right and we find that X is false.  This means that no
matter what the value of Y is, the resulting expression will be 'false',
therefore there is no need to compute Y at all.  So, the boolean
shortcut evaluation works as follows:

'X and Y'
     If 'X => false', do not evaluate Y and return 'false'.

'X or Y'
     If 'X => true', do not evaluate Y and return 'true'.

   Thus, in the expression 'not relayed(hostname($client_addr)) and
rcpt_count > 10', the value of the 'rcpt_count' variable will be
compared with '10' only if the 'relayed' function yielded 'false'.

   To further enhance our sample filter, you may wish to make the
'reject' output more informative, to let the sender know what the
recipient limit is.  To do so, you can use the "concatenation operator"
'.' (a dot):

     set max_rcpt 10
     prog envrcpt
     do
       if not relayed(hostname($client_addr)) and rcpt_count > 10
         reject 550 5.7.1 "Too many recipients, max=" . max_rcpt
       fi
     done

   When evaluating the third argument to 'reject', 'mailfromd' will
first convert 'max_rcpt' to string and then concatenate both strings
together, producing string 'Too many recipients, max=10'.

   ---------- Footnotes ----------

   (1) 'Sendmail (tm) Installation and Operation Guide', chapter 5.6, 'O
-- Set Option'.


File: mailfromd.info,  Node: Sending Rate,  Next: Greylisting,  Prev: Controlling Number of Recipients,  Up: Tutorial

3.12 Sending Rate
=================

We have introduced the notion of mail sending rate in *note Rate
Limit::.  'Mailfromd' keeps the computed rates in the special 'rate'
database (*note Databases::).  Each record in this database consists of
a 'key', for which the rate is computed, and the rate value, in form of
a double precision floating point number, representing average number of
messages per second sent by this 'key' within the last sampling
interval.  In the simplest case, the sender email address can be used as
a 'key', however we recommend to use a conjunction EMAIL-SENDER_IP
instead, so the actual EMAIL owner won't be blocked by actions of some
spammer abusing his/her address.

   Two functions are provided to control and update sending rates.  The
'rateok' function takes three mandatory arguments:

       bool rateok(string KEY, number INTERVAL, number THRESHOLD)

   The KEY meaning is described above.  The INTERVAL is the sampling
interval, or the number of seconds to which the actual sending rate
value is converted.  Remember that it is stored internally as a floating
point number, and thus cannot be directly used in 'mailfromd' filters,
which operate only on integer numbers.  To use the rate value, it is
first converted to messages per given interval, which is an integer
number.  For example, the rate '0.138888' brought to 1-hour interval
gives '500' (messages per hour).

   When the 'rateok' function is called, it recomputes rate record for
the given KEY.  If the new rate value converted to messages per given
INTERVAL is less than THRESHOLD, the function updates the database and
returns 'True'.  Otherwise it returns 'False' and does not update the
database.

   This function must be "required" prior to use, by placing the
following statement somewhere at the beginning of your script:

     require rateok

   For example, the following code limits the mail sending rate for each
'email address'-'IP' combination to 180 per hour.  If the actual rate
value exceeds this limit, the sender is returned a temporary failure
response:

     require rateok

     prog envfrom
     do
       if not rateok($f . "-" . ${client_addr}, 3600, 180)
         tempfail 450 4.7.0 "Mail sending rate exceeded.  Try again later"
       fi
     done

Notice argument concatenation, used to produce the key.

   It is often inconvenient to specify intervals in seconds, therefore a
special 'interval' function is provided.  It converts its argument,
which is a textual string representing time interval in English, to the
corresponding number of seconds.  Using this function, the function
invocation would be:

          rateok($f . "-" . ${client_addr}, interval("1 hour"), 180)

   The 'interval' function is described in *note interval::, and time
intervals are discussed in *note time interval specification::.

   The 'rateok' function begins computing the rate as soon as it has
collected enough data.  By default, it needs at least four mails.  Since
this may lead to a big number of false positives (i.e.  overestimated
rates) at the beginning of sampling interval, there is a way to specify
a minimum number of samples 'rateok' must collect before starting to
actually compute rates.  This number of samples is given as the optional
fourth argument to the function.  For example, the following call will
always return 'True' for the first 10 mails, no matter what the actual
rate:

          rateok($f . "-" . ${client_addr}, interval("1 hour"), 180, 10)

   The 'tbf_rate' function allows to exercise more control over the mail
rates.  This function implements a "token bucket filter" (TBF)
algorithm.

   The token bucket controls when the data can be transmitted based on
the presence of abstract entities called "tokens" in a container called
"bucket".  Each token represents some amount of data.  The algorithm
works as follows:

   * A token is added to the bucket at a constant rate of 1 token per T
     microseconds.
   * A bucket can hold at most M tokens.  If a token arrives when the
     bucket is full, that token is discarded.
   * When N items of data arrive (e.g. N mails), N tokens are removed
     from the bucket and the data are accepted.
   * If fewer than N tokens are available, no tokens are removed from
     the bucket and the data are not accepted.

   This algorithm allows to keep the data traffic at a constant rate T
with bursts of up to M data items.  Such bursts occur when no data was
being arrived for M*T or more microseconds.

   'Mailfromd' keeps buckets in a database 'tbf'.  Each bucket is
identified by a unique "key".  The 'tbf_rate' function is defined as
follows:

      bool tbf_rate(string KEY, number N, number T, number M)

   The KEY identifies the bucket to operate upon.  The rest of arguments
is described above.  The 'tbf_rate' function returns 'True' if the
algorithm allows to accept the data and 'False' otherwise.

   Depending on how the actual arguments are selected the 'tbf_rate'
function can be used to control various types of flow rates.  For
example, to control mail sending rate, assign the arguments as follows:
N to the number of mails and T to the control interval in microseconds:

     prog envfrom
     do
       if not tbf_rate($f . "-" . $client_addr, 1, 10000000, 20)
         tempfail 450 4.7.0 "Mail sending rate exceeded.  Try again later"
       fi
     done

   The example above permits to send at most one mail each 10 seconds.
The burst size is set to 20.

   Another use for the 'tbf_rate' function is to limit the total
delivered mail size per given interval of time.  To do so, the function
must be used in 'prog eom' handler, because it is the only handler where
the entire size of the message is known.  The N argument must contain
the number of bytes in the email (or email bytes * number of
recipients), and the T must be set to the number of bytes per
microsecond a given user is allowed to send.  The M argument must be
large enough to accommodate a couple of large emails.  E.g.:

       prog eom
       do
         if not tbf_rate("$f-$client_addr",
                         message_size(current_message()),
                         10240*1000000,  # At most 10 kb/sec
                         10*1024*1024)
           tempfail 450 4.7.0 "Data sending rate exceeded.  Try again later"
         fi
       done

   *Note Rate limiting functions::, for more information about 'rateok'
and 'tbf_rate' functions.


File: mailfromd.info,  Node: Greylisting,  Next: Local Account Verification,  Prev: Sending Rate,  Up: Tutorial

3.13 Greylisting
================

Greylisting is a simple method of defending against the spam proposed by
Evan Harris.  In few words, it consists in recording the 'sender
IP'-'sender email'-'recipient email' triplet of mail transactions.  Each
time the unknown triplet is seen, the corresponding message is rejected
with the 'tempfail' code.  If the mail is legitimate, this will make the
originating server retry the delivery later, until the destination
eventually accepts it.  If, however, the mail is a spam, it will
probably never be retried, so the users will not be bothered by it.
Even if the spammer will retry the delivery, the "greylisting period"
will give spam-detection systems, such as DNSBLs, enough time to detect
and blacklist it, so by the time the destination host starts accepting
emails from this triplet, it will already be blocked by other means.

   You will find the detailed description of the method in The Next Step
in the Spam Control War: Greylisting
(http://projects.puremagic.com/greylisting/whitepaper.html), the
original whitepaper by Evan Harris.

   The 'mailfromd' implementation of greylisting is based on 'greylist'
function.  The function takes two arguments: the 'key', identifying the
greylisting triplet, and the 'interval'.  The function looks up the key
in the "greylisting database".  If such a key is not found, a new entry
is created for it and the function returns 'true'.  If the key is found,
'greylist' returns 'false', if it was inserted to the database more than
'interval' seconds ago, and 'true' otherwise.  In other words, from the
point of view of the greylisting algorithm, the function returns 'true'
when the message delivery should be blocked.  Thus, the simplest
implementation of the algorithm would be:

     prog envrcpt
     do
      if greylist("${client_addr}-$f-${rcpt_addr}", interval("1 hour"))
        tempfail 451 4.7.1 "You are greylisted"
      fi
     done

   However, the message returned by this example, is not informative
enough.  In particular, it does not tell when the message will be
accepted.  To help you produce more informative messages, 'greylist'
function stores the number of seconds left to the end of the greylisting
period in the global variable 'greylist_seconds_left', so the above
example could be enhanced as follows:

     prog envrcpt
     do
       set gltime interval("1 hour")
       if greylist("${client_addr}-$f-${rcpt_addr}", gltime)
         if greylist_seconds_left = gltime
           tempfail 451 4.7.1
              "You are greylisted for %gltime seconds"
         else
           tempfail 451 4.7.1
              "Still greylisted for %greylist_seconds_left seconds"
         fi
       fi
     done

   In real life you will have to avoid greylisting some messages, in
particular those coming from the '<>' address and from the IP addresses
in your relayed domain.  It can easily be done using the techniques
described in previous sections and is left as an exercise to the reader.

   'Mailfromd' provides two implementations of greylisting primitives,
which differ in the information stored in the database.  The one
described above is called "traditional".  It keeps in the database the
time when the greylisting was activated for the given key, so the
'greylisting' function uses its second argument ('interval') and the
current timestamp to decide whether the key is still greylisted.

   The second implementation is called by the name of its inventor "Con
Tassios".  This implementation stores in the database the time when the
greylisting period is set to expire, computed by the 'greylist' when it
is first called for the given key, using the formula 'current_timestamp
+ interval'.  Subsequent calls to 'greylist' compare the current
timestamp with the one stored in the database and ignore their second
argument.  This implementation is enabled by one of the following
pragmas:

     #pragma greylist con-tassios
or
     #pragma greylist ct

   When Con Tassios implementation is used, yet another function becomes
available.  The function 'is_greylisted' (*note is_greylisted:
Greylisting functions.) returns 'True' if its argument is greylisted and
'False' otherwise.  It can be used to check for the greylisting status
without actually updating the database:

       if is_greylisted("${client_addr}-$f-${rcpt_addr}")
         ...
       fi

   One special case is "whitelisting", which is often used together with
greylisting.  To implement it, 'mailfromd' provides the function
'dbmap', which takes two mandatory arguments: 'dbmap(FILE, KEY)' (it
also allows an optional third argument, see *note dbmap::, for more
information on it).  The first argument is the name of the DBM file
where to search for the key, the second one is the key to be searched.
Assuming you keep your whitelist database in file
'/var/run/whitelist.db', a more practical example will be:

     prog envrcpt
     do
       set gltime interval("1 hour")

       if not ($f = "" or relayed(hostname(${client_addr}))
              or dbmap("/var/run/whitelist.db", ${client_addr}))
         if greylist("${client_addr}-$f-${rcpt_addr}", gltime)
           if greylist_seconds_left = gltime
             tempfail 451 4.7.1
                "You are greylisted for %gltime seconds"
           else
             tempfail 451 4.7.1
                "Still greylisted for %greylist_seconds_left seconds"
           fi
         fi
       fi
     done


File: mailfromd.info,  Node: Local Account Verification,  Next: Databases,  Prev: Greylisting,  Up: Tutorial

3.14 Local Account Verification
===============================

In your filter script you may need to verify if the given user name is
served by your mail server, in other words, to verify if it represents a
"local account".  Notice that in this context, the word "local" does not
necessarily mean that the account is local for the server running
'mailfromd', it simply means any account whose mailbox is served by the
mail servers using 'mailfromd'.

   The 'validuser' function may be used for this purpose.  It takes one
argument, the user name, and returns 'true' if this name corresponds to
a local account.  To verify this, the function relies on 'libmuauth', a
powerful authentication library shipped with GNU 'mailutils'.  More
precisely, it invokes a list of "authorization" functions.  Each
function is responsible for looking up the user name in a particular
source of information, such as system 'passwd' database, an SQL
database, etc.  The search is terminated when one of the functions finds
the name in question or the list is exhausted.  In the former case, the
account is local, in the latter it is not.  This concept is discussed in
detail in *note Authentication: (mailutils)authentication.).  Here we
will give only some practical advices for implementing it in 'mailfromd'
filters.

   The actual list of available authorization modules depends on your
'mailutils' installation.  Usually it includes, apart from traditional
UNIX 'passwd' database, the functions for verifying PAM, RADIUS and SQL
database accounts.  Each of the authorization methods is configured
using special configuration file statements.  For the description of the
Mailutils configuration files, *Note Mailutils Configuration File:
(mailutils)configuration.  You can obtain the template for 'mailfromd'
configuration by running 'mailfromd --config-help'.

   For example, the following 'mailfromd.conf' file:

     auth {
       authorization pam:system;
     }

     pam {
       service mailfromd;
     }

sets up the authorization using PAM and system 'passwd' database.  The
name of PAM service to use is 'mailfromd'.

   The function 'validuser' is often used together with 'dbmap', as in
the example below:

     #pragma dbprop /etc/mail/aliases.db null

     if dbmap("/etc/mail/aliases.db", localpart($rcpt_addr))
        and validuser(localpart($rcpt_addr))
       ...
     fi

   For more information about 'dbmap' function, see *note dbmap::.  For
a description of 'dbprop' pragma, see *note Database functions::.


File: mailfromd.info,  Node: Databases,  Next: Testing Filter Scripts,  Prev: Local Account Verification,  Up: Tutorial

3.15 Databases
==============

Some 'mailfromd' functions use DBM databases to save their persistent
state data.  Each database has a unique "identifier", and is assigned
several pieces of information for its maintenance: the database "file
name" and the "expiration period", i.e.  the time after which a record
is considered expired.

   To obtain the list of available databases along with their
preconfigured settings, run 'mailfromd --show-defaults'.  You will see
an output similar to this:

     version:             8.8
     script file:         /etc/mailfromd.mf
     preprocessor:        /usr/bin/m4 -s
     user:                mail
     statedir:            /var/run/mailfromd
     socket:              unix:/var/run/mailfromd/mailfrom
     pidfile:             /var/run/mailfromd/mailfromd.pid
     default syslog:          blocking
     supported databases:     gdbm, bdb
     default database type:   bdb
     optional features:   GeoIP
     greylist database:      /var/run/mailfromd/greylist.db
     greylist expiration:    86400
     tbf database:        /var/run/mailfromd/tbf.db
     tbf expiration:      86400
     rate database:      /var/run/mailfromd/rates.db
     rate expiration:    86400
     cache database:      /var/run/mailfromd/mailfromd.db
     cache positive expiration: 86400
     cache negative expiration: 43200

   The text below 'optional features' line describes the available
built-in databases.  Notice that the 'cache' database, in contrast to
the rest of databases, has two expiration periods associated with it.
This is explained in the next subsection.

* Menu:

* Database Formats::
* Basic Database Operations::
* Database Maintenance::


File: mailfromd.info,  Node: Database Formats,  Next: Basic Database Operations,  Up: Databases

3.15.1 Database Formats
-----------------------

The version 8.8 runs the following database types (or "formats"):

'cache'
     "Cache database" keeps the information about external emails,
     obtained using sender verification functions (*note Checking Sender
     Address::).  The key entry to this database is an email address or
     EMAIL:SENDER-IP string, for addresses checked using strict
     verification.  The data its stores for each key are:

       1. Address validity.  This field can be either 'success' or
          'not_found', meaning the address is confirmed to exists or it
          is not.

       2. The time when the entry was entered into the database.  It is
          used to check for expired entries.

     The 'cache' database has two expiration periods: a "positive
     expiration" period, that is applied to entries with the first field
     set to 'success', and a "negative expiration" period, applied to
     entries marked as 'not_found'.

'rate'
     The mail sending rate data, maintained by 'rate' function (*note
     Rate limiting functions::).  A record consists of the following
     fields:

     timestamp
          The time when the entry was entered into the database.

     interval
          Interval during which the rate was measured (seconds).

     count
          Number of mails sent during this interval.

'tbf'
     This database is maintained by 'tbf_rate' function (*note TBF::).
     Each record represents a single bucket and consists of the
     following keys:

     timestamp
          Timestamp of most recent token, as a 64-bit unsigned integer
          (microseconds resolution).

     expirytime
          Estimated time when this bucket expires (seconds since epoch).

     tokens
          Number of tokens in the bucket ('size_t').

'greylist'
     This database is maintained by 'greylist' function (*note
     Greylisting::).  Each record holds only the timestamp.  Its
     semantics depends on the greylisting implementation in use (*note
     greylisting types::).  In traditional implementation, it is the
     time when the entry was entered into the database.  In Con Tassios
     implementation, it is the time when the greylisting period expires.


File: mailfromd.info,  Node: Basic Database Operations,  Next: Database Maintenance,  Prev: Database Formats,  Up: Databases

3.15.2 Basic Database Operations
--------------------------------

The 'mfdbtool' utility is provided for performing various operations on
the 'mailfromd' database.

   To list the contents of a database, use '--list' option.  When used
without any arguments it will list the 'cache' database:

     $ mfdbtool --list
     abrakat@mail.com           success Thu Aug 24 15:28:58 2006
     baccl@EDnet.NS.CA          not_found Fri Aug 25 10:04:18 2006
     bhzxhnyl@chello.pl       not_found Fri Aug 25 10:11:57 2006
     brqp@aaanet.ru:24.1.173.165  not_found Fri Aug 25 14:16:06 2006

   You can also list data for any particular key or keys.  To do so,
give the keys as arguments to 'mfdbtool':

     $ mfdbtool --list abrakat@mail.com brqp@aaanet.ru:24.1.173.165
     abrakat@mail.com           success Thu Aug 24 15:28:58 2006
     brqp@aaanet.ru:24.1.173.165  not_found Fri Aug 25 14:16:06 2006

   To list another database, give its format identifier with the
'--format' ('-H') option.  For example, to list the 'rate' database:

     $ mfdbtool --list --format=rate
     sam@mail.net-62.12.4.3 Wed Sep  6 19:41:42 2006  139   3 0.0216 6.82e-06
     axw@rame.com-59.39.165.172 Wed Sep  6 20:26:24 2006  0  1  N/A  N/A

   The '--format' option can be used with any database management
option, described below.

   Another useful operation you can do while listing 'rate' database is
the prediction of "estimated time of sending", i.e.  the time when the
user will be able to send mail if currently his mail sending rate has
exceeded the limit.  This is done using '--predict' option.  The option
takes an argument, specifying the mail sending rate limit, e.g.  (the
second line is split for readability):

     $ mfdbtool --predict="180 per 1 minute"
     ed@fae.net-21.10.1.2 Wed Sep 13 03:53:40 2006  0 1 N/A N/A; free to send
     service@19.netlay.com-69.44.129.19 Wed Sep 13 15:46:07 2006 7 2
        0.286   0.0224; in 46 sec. on Wed Sep 13 15:49:00 2006

Notice, that there is no need to use '--list --format=rate' along with
this option, although doing so is not an error.

   To delete an entry from the database, use '--delete' option, for
example: 'mfdbtool --delete abrakat@mail.com'.  You can give any number
of keys to delete in the command line.


File: mailfromd.info,  Node: Database Maintenance,  Prev: Basic Database Operations,  Up: Databases

3.15.3 Database Maintenance
---------------------------

There are two principal operations of database management: expiration
and compaction.  "Expiration" consists in removing expired entries from
the database.  In fact, it is rarely needed, since the expired entries
are removed in the process of normal 'mailfromd' work.  Nevertheless, a
special option is provided in case an explicit expiration is needed (for
example, before dumping the database to another format, to avoid
transferring useless information).

   The command line option '--expire' instructs 'mfdbtool' to delete
expired entries from the specified database.  As usual, the database is
specified using '--format' option.  If it is not given explicitly,
'cache' is assumed.

   While removing expired entries the space they occupied is marked as
free, so it can be used by subsequent inserts.  The database does not
shrink after expiration is finished.  To actually return the unused
space to the file system you should "compact" your database.

   This is done by running 'mfdbtool --compact' (and, optionally,
specifying the database to operate upon with '--format' option).
Notice, that compacting a database needs roughly as much disk space on
the partition where the database resides as is currently used by the
database.  Database compaction runs in three phases.  First, the
database is scanned and all non-expired records are stored in the
memory.  Secondly, a temporary database is created in the state
directory and all the cached entries are flushed into it.  This database
is named after the PID of the running 'mfdbtool' process.  Finally, the
temporary database is renamed to the source database.

   Both '--compact' and '--expire' can be applied to all databases by
combining them with '--all'.  It is useful, for example, in 'crontab'
files.  For example, I have the following monthly job in my 'crontab':

     0 1 1 * * /usr/bin/mfdbtool --compact --all


File: mailfromd.info,  Node: Testing Filter Scripts,  Next: Run Mode,  Prev: Databases,  Up: Tutorial

3.16 Testing Filter Scripts
===========================

It is important to check your filter script before actually starting to
use it.  There are several ways to do so.

   To test the syntax of your filter script, use the '--lint' option.
It will cause 'mailfromd' to exit immediately after attempting to
compile the script file.  If the compilation succeeds, the program will
exit with code 0.  Otherwise, it will exit with error code 78
('configuration error').  In the latter case, 'mailfromd' will also
print a diagnostic message, describing the error along with the exact
location where the error was diagnosed, for example:

     mailfromd: /etc/mailfromd.mf:39: syntax error, unexpected reject

   The error location is indicated by the name of the file and the
number of the line when the error occurred.  By using the
'--location-column' option you instruct 'mailfromd' to also print the
"column number".  E.g.  with this option the above error message may
look like:

     mailfromd: /etc/mailfromd.mf:39.12 syntax error, unexpected reject

   Here, '39' is the line and '12' is the column number.

   For complex scripts you may wish to obtain a listing of variables
used in the script.  This can be achieved using '--xref' command line
option:

   The output it produces consists of four columns:

Variable name
Data type
     Either 'number' or 'string'.
Offset in data segment
     Measured in words.
References
     A comma-separated list of locations where the variable was
     referenced.  Each location is represented as FILE:LINE.  If several
     locations pertain to the same FILE, the file name is listed only
     once.

Here is an example of the cross-reference output:

     $ mailfromd --xref
     Cross-references:
     -----------------
     cache_used               number 5   /etc/mailfromd.mf:48
     clamav_virus_name        string 9   /etc/mailfromd.mf:240,240
     db                       string 15  /etc/mailfromd.mf:135,194,215
     dns_record_ttl           number 16  /etc/mailfromd.mf:136,172,173
     ehlo_domain              string 11
     gltime                   number 13  /etc/mailfromd.mf:37,219,220,222,223
     greylist_seconds_left    number 1   /etc/mailfromd.mf:220,226,227
     last_poll_host           string 2

   If the script passes syntax check, the next step is often to test if
it works as you expect it to.  This is done with '--test' ('-t') command
line option.  This option runs the 'envfrom' handler (or another one,
see below) and prints the result of its execution.

   When running your script in test mode, you will need to supply the
values of 'Sendmail' macros it needs.  You do this by placing the
necessary assignments in the command line.  For example, this is how to
supply initial values for 'f' and 'client_addr' macros:

     $ mailfromd --test f=gray@gnu.org client_addr=127.0.0.1

   You may also need to alter initial values of some global variables
your script uses.  To do so, use '-v' ('--variable') command line
option.  This option takes a single argument consisting of the variable
name and its initial value, separated by an equals sign.  For example,
here is how to change the value of 'ehlo_domain' global variable:

     $ mailfromd -v ehlo_domain=mydomain.org

   The '--test' option is often useful in conjunction with options
'--debug', '--trace' and '--transcript' (*note Logging and Debugging::.
The following example shows what the author got while debugging the
filter script described in *note Filter Script Example:::

     $ mailfromd --test --debug=50 f=gray@gnu.org client_addr=127.0.0.1
     MX 20 mx20.gnu.org
     MX 10 mx10.gnu.org
     MX 10 mx10.gnu.org
     MX 20 mx20.gnu.org
     getting cache info for gray@gnu.org
     found status: success (0), time: Thu Sep 14 14:54:41 2006
     getting rate info for gray@gnu.org-127.0.0.1
     found time: 1158245710, interval: 29, count: 5, rate: 0.172414
     rate for gray@gnu.org-127.0.0.1 is 0.162162
     updating gray@gnu.org-127.0.0.1 rates
     SET REPLY 450 4.7.0 Mail sending rate exceeded.  Try again later
     State envfrom: tempfail

   To test any handler, other than 'envfrom', give its name as the
argument to '--test' option.  Since this argument is optional, it is
important that it be given immediately after the option, without any
intervening white space, for example 'mailfromd --test=helo', or
'mailfromd -thelo'.

   This method allows to test one handler at a time.  To test the script
as a whole, use 'mtasim' utility.  When started it enters interactive
mode, similar to that of 'sendmail -bs', where it expects SMTP commands
on its standard input and sends answers to the standard output.  The
'--port=auto' command line option instructs it to start 'mailfromd' and
to create a unique socket for communication with it.  For the detailed
description of the program and the ways to use it, *Note mtasim::.


File: mailfromd.info,  Node: Run Mode,  Next: Logging and Debugging,  Prev: Testing Filter Scripts,  Up: Tutorial

3.17 Run Mode
=============

Mailfromd provides a special option that allows to run arbitrary MFL
scripts.  This is an experimental feature, intended for future use of
MFL as a scripting language.

   When given the '--run' command line option, 'mailfromd' loads the
script given in its command line and executes a function called 'main'.

   The function main must be declared as:

     func main(...) returns number

   Mailfromd passes all command line arguments that follow the script
name as arguments to that function.  When the function returns, its
return value is used by 'mailfromd' as exit code.

   As an example, suppose the file 'script.mf' contains the following:

     func main (...)
       returns number
     do
       loop for number i 1,
            while i <= $#,
            set i i + 1
       do
         echo "arg %i=" . $(i)
       done
     done

   This function prints all its arguments (*Note variadic functions::,
for a detailed description of functions with variable number of
arguments).  Now running:

     $ mailfromd --run script.mf 1 file dest

displays the following:

     arg 1=1
     arg 2=file
     arg 3=dest

   Note, that MFL does not have a direct equivalent of shell's '$0'
argument.  If your function needs to know the name of the script that is
being executed, use '__file__' built-in constant instead (*note
__file__: Built-in constants.

   You may name your start function with any name other than the default
'main'.  In this case, give its name as an argument to the '--run'
option.  This argument is optional, therefore it must be separated from
the option by an equals sign (with no whitespace from either side).  For
example, given the command line below, 'mailfromd' loads the file
'script.mf' and execute the function named 'start':

     $ mailfromd --run=start script.mf

* Menu:

* top-block::   The Top of a Script File.
* getopt::      Parsing Command Line Arguments.


File: mailfromd.info,  Node: top-block,  Next: getopt,  Up: Run Mode

3.17.1 The Top of a Script File
-------------------------------

The '--run' option makes it possible to use 'mailfromd' scripts as
standalone programs.  The traditional way to do so was to set the
executable bit on the script file and to begin the script with the
"interpreter selector", i.e.  the characters '#!' followed by the name
of the 'mailfromd' executable, e.g.:

     #! /usr/sbin/mailfromd --run

   This would cause the shell to invoke 'mailfromd' with the command
line constructed from the '--run' option, the name of the invoked script
file itself, and any actual arguments from the invocation.  Once
invoked, 'mailfromd' would treat the initial '#!' line as a usual
single-line comment (*note Comments::).

   However, the interpretation of the '#!' by shells has various
deficiencies, which depend on the actual shell being used.  For example,
some shells pass any characters following the whitespace after the
interpreter name as a single argument, some others silently truncate the
command line after some number of characters, etc.  This often make it
impossible to pass additional arguments to 'mailfromd'.  For example, a
script which begins with the following line would most probably fail to
be executed properly:

     #! /usr/sbin/mailfromd --no-config --run

   To compensate for these deficiencies and to allow for more complex
invocation sequences, 'mailfromd' handles initial '#' in a special way.
If the first line of a source file begins with '#!/' or '#! /' (with a
single space between '!' and '/'), it is treated as a start of a
multi-line comment, which is closed by the two characters '!#' on a line
by themselves.

   Thus, the correct way to begin a 'mailfromd' script is:

     #! /usr/sbin/mailfromd --run
     !#

   Using this feature, you can start the 'mailfromd' with arbitrary
shell code, provided it ends with an 'exec' statement invoking the
interpreter itself.  For example:

     #!/bin/sh
     exec /usr/sbin/mailfromd --no-config --run $0 $@
     !#

     func main(...)
       returns number
     do
       /* actual mfl code goes here */
     done

   Note the use of '$0' and '$@' to pass the actual script file name and
command line arguments to 'mailfromd'.


File: mailfromd.info,  Node: getopt,  Prev: top-block,  Up: Run Mode

3.17.2 Parsing Command Line Arguments
-------------------------------------

A special function is provided to break (parse) options in command
lines, and to check for legal options.  It uses the GNU getopt routines
(*note getopt: (libc)Getopt.).

 -- Built-in Function: string getopt (number ARGC, pointer ARGV, ...)
     The 'getopt' function parses the command line arguments, as
     supplied by ARGC and ARGV.  The ARGC argument is the argument
     count, and ARGV is an opaque data structure, representing the array
     of arguments(1).  The operator 'vaptr' (*note vaptr::) is provided
     to initialize this argument.

     An argument that starts with '-' (and is not exactly '-' or '--'),
     is an option element.  An argument that starts with a '-' is called
     "short" or "traditional" option.  The characters of this element,
     except for the initial '-' are option characters.  Each option
     character represents a separate option.  An argument that starts
     with '--' is called "long" or "GNU" option.  The characters of this
     element, except for the initial '--' form the "option name".

     Options may have arguments.  The argument to a short option is
     supplied immediately after the option character, or as the next
     word in command line.  E.g., if option '-f' takes a mandatory
     argument, then it may be given either as '-farg' or as '-f arg'.
     The argument to a long option is either given immediately after it
     and separated from the option name by an equals sign (as
     '--file=arg'), or is given as the next word in the command line
     (e.g. '--file arg').

     If the option argument is optional, i.e.  it may not necessarily be
     given, then only the first form is allowed (i.e.  either '-farg' or
     '--file=arg'.

     The '--' command line argument ends the option list.  Any arguments
     following it are not considered options, even if they begin with a
     dash.

     If 'getopt' is called repeatedly, it returns successively each of
     the option characters from each of the option elements (for short
     options) and each option name (for long options).  In this case,
     the actual arguments are supplied only to the first invocation.
     Subsequent calls must be given two nulls as arguments.  Such
     invocation instructs 'getopt' to use the values saved on the
     previous invocation.

     When the function finds another option, it returns its character or
     name updating the external variable 'optind' (see below) so that
     the next call to 'getopt' can resume the scan with the following
     option.

     When there are no more options left, or a '--' argument is
     encountered, 'getopt' returns an empty string.  Then 'optind' gives
     the index in ARGV of the first element that is not an option.

     The legitimate options and their characteristics are supplied in
     additional arguments to 'getopt'.  Each such argument is a string
     consisting of two parts, separated by a vertical bar ('|').  Any
     one of these parts is optional, but at least one of them must be
     present.  The first part specifies short option character.  If it
     is followed by a colon, this character takes mandatory argument.
     If it is followed by two colons, this character takes an optional
     argument.  If only the first part is present, the '|' separator may
     be omitted.  Examples:

     "c"
     "c|"
          Short option '-c'.

     "f:"
     "f:|"
          Short option '-f', taking a mandatory argument.

     "f::"
     "f::|"
          Short option '-f', taking an optional argument.

     If the vertical bar is present and is followed by any characters,
     these characters specify the name of a long option, synonymous to
     the short one, specified by the first part.  Any mandatory or
     optional arguments to the short option remain mandatory or optional
     for the corresponding long option.  Examples:

     "f:|file"
          Short option '-f', or long option '--file', requiring an
          argument.

     "f::|file"
          Short option '-f', or long option '--file', taking an optional
          argument.

     In any of the above cases, if this option appears in the command
     line, 'getopt' returns its short option character.

     To define a long option without a short equivalent, begin it with a
     bar, e.g.:

     "|help"

     If this option is to take an argument, this is specified using the
     mechanism described above, except that the short option character
     is replaced with a minus sign.  For example:

     "-:|output"
          Long option '--output', which takes a mandatory argument.

     "-::|output"
          Long option '--output', which takes an optional argument.

     If an option is returned that has an argument in the command line,
     'getopt' stores this argument in the variable 'optarg'.

     After each invocation, 'getopt' sets the variable 'optind' to the
     index of the next ARGV element to be parsed.  Thus, when the list
     of options is exhausted and the function returned an empty string,
     'optind' contains the index of the the first element that is not an
     option.

     When 'getopt' encounters an option that is not described in its
     arguments or if it detects a missing option argument it prints an
     error message using 'mailfromd' logging facilities, stores the
     offending option in the variable 'optopt', and returns '?'.

     If printing error message is not desired (e.g. the application is
     going to take care of error messaging), it can be disabled by
     setting the variable 'opterr' to '0'.

     The third argument to 'getopt', called "controlling argument", may
     be used to control the behavior of the function.  If it is a colon,
     it disables printing the error message for unrecognized options and
     missing option arguments (as setting 'opterr' to '0' does).  In
     this case 'getopt' returns ':', instead of '?' to indicate missing
     option argument.

     If the controlling argument is a plus sign, or the environment
     variable 'POSIXLY_CORRECT' is set, then option processing stops as
     soon as a non-option argument is encountered.  By default, if
     options and non optional arguments are intermixed in ARGV, 'getopt'
     permutes them so that the options go first, followed by
     non-optional arguments.

     If the controlling argument is '-', then each non-option element in
     ARGV is handled as if it were the argument of an option with
     character code 1 ('"\001"', in MFL notation.  This can used by
     programs that are written to expect options and other ARGV-elements
     in any order and that care about the ordering of the two.

     Any other value of the controlling argument is handled as an option
     definition.

   A special language construct is provided to supply the second
argument (ARGV) to 'getopt' and similar functions:

     vaptr(PARAM)

where PARAM is a positional parameter, from which to start the array of
ARGV.  For example:

     func main(...)
       returns number
     do
       set rc getopt($#, vaptr($1), "|help")
       ...

   Here, 'vaptr($1)' constructs the ARGV array from all the arguments,
supplied to the function 'main'.

   To illustrate the use of 'getopt' function, let's suppose you write a
script that takes the following options:

'-f FILE'
'--file=FILE'

'--output[=DIR]'

'--help'

   Then, the corresponding 'getopt' invocation will be:

     func main(...)
       returns number
     do
       loop for string rc getopt($#, vaptr($1),
                                 "f:|file", "-::|output", "h|help"),
            while rc != "",
            set rc getopt(0, 0)
       do
         switch rc
         do
           case "f":
             set file optarg
           case "output"
             set output 1
             set output_dir optarg
           case "h"
             help()
           default:
             return 1
         done
         ...

   ---------- Footnotes ----------

   (1) When MFL has array data type, the second argument will change to
array of strings.


File: mailfromd.info,  Node: Logging and Debugging,  Next: Runtime errors,  Prev: Run Mode,  Up: Tutorial

3.18 Logging and Debugging
==========================

Depending on its operation mode, 'mailfromd' tries to guess whether it
is appropriate to print its diagnostics and informational messages on
standard error or to send them to syslog.  Standard error is assumed if
the program is run with one of the following command line options:

   * '--test' (*note Testing Filter Scripts::)
   * '--run' (*note Run Mode::)
   * '--lint' (*note Testing Filter Scripts::)
   * '--dump-code' (*note Logging and Debugging Options::)
   * '--dump-grammar-trace' (*note Logging and Debugging Options::)
   * '--dump-lex-trace' (*note Logging and Debugging Options::)
   * '--dump-macros' (*note Logging and Debugging Options::)
   * '--dump-tree' (*note Logging and Debugging Options::)
   * '--xref' or '--dump-xref') (*note Testing Filter Scripts::)

   If none of these are used, 'mailfromd' switches to syslog as soon as
it finishes its startup.  There are two ways to communicate with the
'syslogd' daemon: using the 'syslog' function from the system 'libc'
library, which is a "blocking" implementation in most cases, or via
internal, "asynchronous", syslog implementation.  Whether the latter is
compiled in and which of the implementation is used by default is
determined while compiling the package, as described in *note Using
non-blocking syslog: syslog-async.

   The '--logger' command line option allows you to manually select the
diagnostic channel:

'--logger=stderr'
     Log everything to the standard error.

'--logger=syslog'
     Log to syslog.

'--logger=syslog:async'
     Log to syslog using the asynchronous syslog implementation.

   Another way to select the diagnostic channel is by using the 'logger'
statement in the configuration file.  The statement takes the same
argument as its command line counterpart.

   The rest of details regarding diagnostic output are controlled by the
'logging' configuration statement.

   The default syslog facility is 'mail'; it can be changed using the
'--log-facility' command line option or 'facility' statement.  Argument
in both cases is a valid facility name, i.e.  one of: 'user', 'daemon',
'auth', 'authpriv', 'mail', and 'local0' through 'local7'.  The argument
can be given in upper, lower or mixed cases, and it can be prefixed with
'log_':

   Another syslog-related parameter that can be configured is the "tag",
which identifies 'mailfromd' messages.  The default tag is the program
name.  It is changed by the '--log-tag' ('-L' command line option and
the 'tag' logging statement.

   The following example configures both the syslog facility and tag:

     logging {
       facility local7;
       tag "mfd";
     }

   As any other UNIX utility, 'mailfromd' is very quiet unless it has
something important to communicate, such as, e.g. an error condition.  A
set of command line options is provided for controlling the verbosity of
its output.

   The '--trace' option enables tracing Sendmail actions executed during
message verifications.  When this option is given, any 'accept',
'discard', 'continue', etc.  triggered during execution of your filter
program will leave their traces in the log file.  Here is an example of
how it looks like (syslog time stamp, tag and PID removed for
readability):

     k8DHxvO9030656: /etc/mailfromd.mf:45: reject 550 5.1.1 Sender validity
     not confirmed

This shows that while verifying the message with ID 'k8DHxvO9030656' the
'reject' action was executed by filter script '/etc/mailfromd.mf' at
line 45.

   The use of message ID in the log deserves a special notice.  The
program will always identify its log messages with the 'Message-Id',
when it is available.  Your responsibility as an administrator is to
make sure it is available by configuring your MTA to export the macro
'i' to 'mailfromd'.  The rule of thumb is: make 'i' available to the
very first handler 'mailfromd' executes.  It is not necessary to export
it to the rest of the handlers, since 'mailfromd' will cache it.  For
example, if your filter script contains 'envfrom' and 'envrcpt'
handlers, export 'i' for 'envfrom'.  The exact instructions on how to
ensure it depend on the MTA you use.  For 'Sendmail', refer to *note
Sendmail::.  For MeTA1, see *note MeTA1::, and *note pmult-macros::.
For 'Postfix', see *note Postfix::.

   To push log verbosity further, use the 'debug' configuration
statement (*note conf-debug::) or its command line equivalent, '--debug'
('-d', *note --debug::).  Its argument is a "debugging level", whose
syntax is described in <http://mailutils.org/wiki/Debug_level>.

   The debugging output is controlled by a set of levels, each of which
can be set independently of others.  Each debug level consists of a
category name, which identifies the part of package for which additional
debugging is desired, and a level number, which indicates how verbose
should its output be.

   Valid debug levels are:

error
     Displays error conditions which are normally not reported, but
     passed to the caller layers for handling.

trace0 through trace9
     Ten levels of verbosity, 'trace0' producing less output, 'trace9'
     producing the maximum amount of output.

prot
     Displays network protocol interaction, where applicable.

   The overall debugging level is specified as a list of individual
levels, delimited with semicolons.  Each individual level can be
specified as one of:

!CATEGORY
     Disables all levels for the specified category.

CATEGORY
     Enables all levels for the specified category.

CATEGORY.LEVEL
     For this category, enables all levels from 'error' to LEVEL,
     inclusive.

CATEGORY.=LEVEL
     Enables only the given LEVEL in this CATEGORY.

CATEGORY.!LEVEL
     Disables all levels from 'error' to LEVEL, inclusive, in this
     CATEGORY.

CATEGORY.!=LEVEL
     Disables only the given LEVEL in this CATEGORY.

CATEGORY.LEVELA-LEVELB
     Enables all levels in the range from LEVELA to LEVELB, inclusive.

CATEGORY.!LEVELA-LEVELB
     Disables all levels in the range from LEVELA to LEVELB, inclusive.

   Additionally, a comma-separated list of level specifications is
allowed after the dot.  For example, the following specification:

     acl.prot,!=trace9,!trace2

enables in category acl all levels, except trace9, trace0, trace1, and
trace2.

   Implementation and applicability of each level of debugging differs
between various categories.  Categories built-in to mailutils are
described in <http://mailutils.org/wiki/Debug_level>.  Mailfromd
introduces the following additional categories:

db
     trace0
          Detailed debugging info about expiration and compaction.
     trace5
          List records being removed.

dns
     trace8
          Verbose information about attempted DNS queries and their
          results.
     trace9
          Enables 'libadns' internal debugging.

srvman
     trace0
          Additional information about normal conditions, such as
          subprocess exiting successfully or a remote party being
          allowed access by ACL.
     trace1
          Detailed transcript of server manager actions: startup,
          shutdown, subprocess cleanups, etc.
     trace3
          Additional info about fd sets.
     trace4
          Individual subserver status information.
     trace5
          Subprocess registration.

pmult
     trace1
          Verbosely list incoming connections, functions being executed
          and erroneous conditions: missing headers in SMFIR_CHGHEADER,
          undefined macros, etc.
     trace2
          List milter requests being processed.
     trace7
          List SMTP body content in SMFIR_REPLBODY requests.
     error
          Verbosely list mild errors encountered: bad recipient
          addresses, etc.

callout
     trace0
          Verification session transcript.
     trace1
          MX servers checks.
     trace5
          List emails being checked.
     trace9
          Additional info.

main
     trace5
          Info about hostnames in relayed domain list

engine
     Debugging of the virtual engine.
     trace5
          Message modification lists.
     trace6
          Debug message modification operations and Sendmail macros
          registered.
     trace7
          List SMTP stages ('xxfi_*' calls).
     trace9
          Cleanup calls.

pp
     Preprocessor.

     trace1
          Show command line of the preprocessor being run.

prog
     trace8
          Stack operations
     trace9
          Debug exception state save/restore operations.

spf
     error
          Mild errors.
     trace0
          List calls to 'spf_eval_record', 'spf_test_record',
          'spf_check_host_internal', etc.
     trace1
          General debug info.
     trace6
          Explicitly list A records obtained when processing the 'a' SPF
          mechanism.

   Categories starting with 'bi_' debug built-in modules:

bi_db
     Database functions.
     trace5
          List database look-ups.
     trace6
          Trace operations on the greylisting database.

bi_sa
     SpamAssassin and ClamAV API.
     trace1
          Report the findings of the 'clamav' function.
     trace9
          Trace payload in interactions with 'spamd'.

bi_io
     I/O functions.
     trace1
          Debug the following functions: 'open', 'spawn', 'write'.
     trace2
          Report stderr redirection.
     trace3
          Report external commands being run.

bi_mbox
     Mailbox functions.
     trace1
          Report opened mailboxes.

bi_other
     Other built-ins.
     trace1
          Report results of checks for existence of usernames.

   For example, the following invocation enables levels up to 'trace2'
in category 'engine', all levels in category 'savsrv' and levels up to
'trace0' in category 'srvman':

     $ mailfromd --debug='engine.trace2;savsrv;srvman.trace0'

   You need to have sufficient knowledge about 'mailfromd' internal
structure to use this form of the '--debug' option.

   To control the execution of the sender verification functions (*note
SMTP Callout functions::), you may use '--transcript' ('-X') command
line option which enables transcripts of SMTP sessions in the logs.
Here is an example of the output produced running 'mailfromd
--transcript':

     k8DHxlCa001774: RECV: 220 spf-jail1.us4.outblaze.com ESMTP Postfix
     k8DHxlCa001774: SEND: HELO mail.gnu.org.ua
     k8DHxlCa001774: RECV: 250 spf-jail1.us4.outblaze.com
     k8DHxlCa001774: SEND: MAIL FROM: <>
     k8DHxlCa001774: RECV: 250 Ok
     k8DHxlCa001774: SEND: RCPT TO: <t1Kmx17Q@malaysia.net>
     k8DHxlCa001774: RECV: 550 <>: No thank you rejected: Account
      Unavailable: Possible Forgery
     k8DHxlCa001774: poll exited with status: not_found; sent
      "RCPT TO: <t1Kmx17Q@malaysia.net>", got "550 <>: No thank you
      rejected: Account Unavailable: Possible Forgery"
     k8DHxlCa001774: SEND: QUIT


File: mailfromd.info,  Node: Runtime errors,  Next: Notes,  Prev: Logging and Debugging,  Up: Tutorial

3.19 Runtime Errors
===================

A "runtime error" is a special condition encountered during execution of
the filter program, that makes further execution of the program
impossible.  There are two kinds of runtime errors: fatal errors, and
uncaught exceptions.  Whenever a runtime error occurs, 'mailfromd'
writes into the log file the following message:

     RUNTIME ERROR near FILE:LINE: TEXT

where FILE:LINE indicates approximate source file location where the
error occurred and TEXT gives the textual description of the error.

Fatal runtime errors
--------------------

Fatal runtime errors are caused by a condition that is impossible to fix
at run time.  For version 8.8 these are:

Not enough memory
     There is not enough memory for the execution of the program.  Try
     to make more memory available for 'mailfromd' or to reduce its
     memory requirements by rewriting your filter script.

Out of stack space; increase #pragma stacksize
Heap overrun; increase #pragma stacksize
memory chunk too big to fit into heap
     These errors are reported when there is not enough space left on
     stack to perform the requested operation, and the attempt to resize
     the stack has failed.  Usually 'mailfromd' expands the stack when
     the need arises (*note automatic stack resizing::).  This runtime
     error indicates that there were no more memory available for stack
     expansion.  Try to make more memory available for 'mailfromd' or to
     reduce its memory requirements by rewriting your filter script.

Stack underflow
     Program attempted to pop a value off the stack but the stack was
     already empty.  This indicates an internal error in the MFL
     compiler or 'mailfromd' runtime engine.  If you ever encounter this
     error, please report it to <bug-mailfromd@gnu.org.ua>.  Include the
     log fragment (about 10-15 lines before and after this log message)
     and your filter script.  *Note Reporting Bugs::, for more
     information about bug reporting.

pc out of range
     The "program counter" is out of allowed range.  This is a severe
     error, indicating an internal inconsistency in 'mailfromd' runtime
     engine.  If you encounter it, please report it to
     <bug-mailfromd@gnu.org.ua>.  Include the log fragment (about 10-15
     lines before and after this log message) and your filter script.
     *Note Reporting Bugs::, for more information about how to report a
     bug.

Programmatic runtime errors
---------------------------

These indicate a programmatic error in your filter script, which the MFL
compiler was unable to discover at compilation stage:

Invalid exception number: N
     The 'throw' statement used a not existent exception number N.  Fix
     the statement and restart 'mailfromd'.  *Note throw::, for the
     information about 'throw' statement and see *note Exceptions::, for
     the list of available exception codes.

No previous regular expression
     You have used a back-reference (*note Back references::), where
     there is no previous regular expression to refer to.  Fix this line
     in your code and restart the program.

Invalid back-reference number
     You have used a back-reference (*note Back references::), with a
     number greater than the number of available groups in the previous
     regular expression.  For example:

            if $f matches "(.*)@gnu.org"
              # Wrong: there is only one group in the regexp above!
              set x \2
            ...

     Fix your code and restart the daemon.

Uncaught exceptions
-------------------

Another kind of runtime errors are "uncaught exceptions", i.e.
exceptional conditions for which no handler was installed (*Note
Exceptions::, for information on exceptions and on how to handle them).
These errors mean that the programmer (i.e.  you), made no provision for
some specific condition.  For example, consider the following code:

     prog envfrom
     do
       if $f mx matches "yahoo.com"
         foo()
       fi
     done

It is syntactically correct, but it overlooks the fact that 'mx matches'
may generate 'e_temp_failure' exception, if the underlying DNS query has
timed out (*note Special comparisons::).  If this happens, 'mailfromd'
has no instructions on what to do next and reports an error.  This can
easily be fixed using a 'catch' statement, e.g.:

     prog envfrom
     do
       # Catch DNS errors
       catch e_temp_failure or e_failure
       do
         tempfail 451 4.1.1 "MX verification failed"
       done

       if $f mx matches "yahoo.com"
         foo()
       fi
     done

   Another common case are undefined Sendmail macros.  In this case the
'e_macroundef' exception is generated:

     RUNTIME ERROR near foo.c:34: Macro not defined: {client_adr}

These can be caused either by misspelling the macro name (as in the
example message above) or by failing to export the required name in
Sendmail milter configuration (*note exporting macros::).  This error
should be fixed either in your source code or in 'sendmail.cf' file, but
if you wish to provide a special handling for it, you can use the
following catch statement:

     catch e_macroundef
     do
       ...
     done

   Sometimes the location indicated with the runtime error message is
not enough to trace the origin of the error.  For example, an error can
be generated explicitly with 'throw' statement (*note throw::):

     RUNTIME ERROR near match_cidr.mf:30: invalid CIDR (text)

   If you look in module 'match_cidr.mf', you will see the following
code (line numbers added for reference):

     23 func match_cidr(string ipstr, string cidr) returns number
     24 do
     25   number netmask
     26
     27   if cidr matches '^(([0-9]{1,3}\.){3}[0-9]{1,3})/([0-9][0-9]?)'
     28     return inet_aton(ipstr) & len_to_netmask(\3) = inet_aton(\1)
     29   else
     30     throw invcidr "invalid CIDR (%cidr)"
     31   fi
     32   return 0
     33 done

   Now, it is obvious that the value of 'cidr' argument to 'match_cidr'
was wrong, but how to find the caller that passed the wrong value to it?
The special command line option '--stack-trace' is provided for this.
This option enables dumping "stack traces" when a fatal error occurs.
The traces contain information about function calls.  Continuing our
example, using the '--stack-trace' option you will see the following
diagnostics:

     RUNTIME ERROR near match_cidr.mf:30: invalid CIDR (127%)
     mailfromd: Stack trace:
     mailfromd: 0077: match_cidr.mf:30: match_cidr
     mailfromd: 0096: test.mf:13: bar
     mailfromd: 0110: mailfromd.mf:18: foo
     mailfromd: Stack trace finishes
     mailfromd: Execution of the configuration program was not finished

   Each trace line describes one stack frame.  The lines appear in the
order of most recently called to least recently called.  Each frame
consists of:

  1. Value of the program counter at the time of its execution;
  2. Source code location, if available;
  3. Name of the function called.

   Thus, the example above can be read as: "the function 'match_cidr'
was called by the function 'bar' in file 'test.mf' at line 13.  This
function was called from the function 'bar', in file 'test.mf' at line
13.  In its turn, 'bar' was called by the function 'foo', in file
'mailfromd.mf' at line 18".

   Examining caller functions will help you localize the source of the
error and fix it.

   You can also request a stack trace any place in your code, by calling
the 'stack_trace' function.  This can be useful for debugging, or in
your 'catch' statements.


File: mailfromd.info,  Node: Notes,  Prev: Runtime errors,  Up: Tutorial

3.20 Notes and Cautions
=======================

This section discusses some potential culprits in the MFL.

   It is important to execute special caution when writing format
strings for 'sprintf' (*note String formatting::) and 'strftime' (*note
strftime::) functions.  They use '%' as a character introducing
conversion specifiers, while the same character is used to expand a MFL
variable within a string.  To prevent this misinterpretation, always
enclose format specification in _single quotes_ (*note
singe-vs-double::).  To illustrate this, let's consider the following
example:

     echo sprintf ("Mail from %s", $f)

   If a variable 's' is not declared, this line will produce the
'Variable s is not defined' error message, which will allow you to
identify and fix the bug.  The situation is considerably worse if 's' is
declared.  In that case you will see no warning message, as the
statement is perfectly valid, but at the run-time the variable 's' will
be interpreted within the format string, and its value will replace
'%s'.  To prevent this from happening, single quotes must be used:

     echo sprintf ('Mail from %s', $f)

   This does not limit the functionality, since there is no need to fall
back to variable interpretation in format strings.

   Yet another dangerous feature of the language is the way to refer to
variable and constant names within literal strings.  To expand a
variable or a constant the same notation is used (*Note Variables::, and
*note Constants::).  Now, lets consider the following code:

     const x 2
     string x "X"

     prog envfrom
     do
       echo "X is %x"
     done

   Does '%x' in 'echo' refers to the variable or to the constant?  The
correct answer is 'to the variable'.  When executed, this code will
print 'X is X'.

   As of version 8.8, 'mailfromd' will always print a diagnostic message
whenever it stumbles upon a variable having the same name as a
previously defined constant or vice versa.  The resolution of such name
clashes is described in detail in *Note variable--constant shadowing::.

   Future versions of the program may provide a non-ambiguous way of
referring to variables and constants from literal strings.


File: mailfromd.info,  Node: MFL,  Next: Library,  Prev: Tutorial,  Up: Top

4 Mail Filtering Language
*************************

The "mail filtering language", or MFL, is a special language designed
for writing filter scripts.  It has a simple syntax, similar to that of
Bourne shell.  In contrast to the most existing programming languages,
MFL does not have any special terminating or separating characters
(like, e.g.  newlines and semicolons in shell)(1).  All syntactical
entities are separated by any amount of white-space characters (i.e.
spaces, tabulations or newlines).

   The following sections describe MFL syntax in detail.

* Menu:

* Comments::                    Comments.
* Pragmas::                     Pragmatic comments.
* Data Types::
* Numbers::
* Literals::
* Here Documents::
* Sendmail Macros::
* Constants::
* Variables::
* Back references::
* Handlers::
* begin/end::
* Functions::                   Functions.
* Expressions::                 Expressions.
* Shadowing::                   Variable and Constant Shadowing.
* Statements::
* Conditionals::                Conditional Statements.
* Loops::                       Loop Statements.
* Exceptions::                  Exceptional Conditions and their Handling.
* Polling::                     Sender Verification Tests.
* Modules::                     Modules are Collections of Useful Functions.
* Preprocessor::                Input Text Is Preprocessed.
* Filter Script Example::       A Working Filter Script Explained.
* Reserved Words::              A Reference List of Reserved Words.

   ---------- Footnotes ----------

   (1) There are two noteworthy exceptions: 'require' and 'from ...
import' statements, which must be terminated with a period.  *Note
import::.


File: mailfromd.info,  Node: Comments,  Next: Pragmas,  Up: MFL

4.1 Comments
============

Two types of comments are allowed: C-style, enclosed between '/*' and
'*/', and shell-style, starting with '#' character and extending up to
the end of line:

     /* This is
        a comment. */
     # And this too.

   There are, however, several special cases, where the characters
following '#' are not ignored.

   If the first line begins with '#!/' or '#! /', this is treated as a
start of a multi-line comment, which is closed by the characters '!#' on
a line by themselves.  This feature allows for writing sophisticated
scripts.  *Note top-block::, for a detailed description.

   If '#' is followed by word 'include' (with optional whitespace
between them), this statement requires inclusion of the specified file,
as in C.  There are two forms of the '#include' statement:

  1. '#include <FILE>'
  2. '#include "FILE"'

   The quotes around FILE in the second form quotes are optional.

   Both forms are equivalent if FILE is an absolute file name.
Otherwise, the first form will look for FILE in the "include search
path".  The second one will look for it in the current working directory
first, and, if not found there, in the include search path.

   The default include search path is:

  1. 'PREFIX/share/mailfromd/8.8/include'
  2. 'PREFIX/share/mailfromd/include'
  3. '/usr/share/mailfromd/include'
  4. '/usr/local/share/mailfromd/include'

     Where PREFIX is the installation prefix.

   New directories can be appended in front of it using '-I'
('--include') command line option, or 'include-path' configuration
statement (*note include-path: conf-base.).

   For example, invoking

     $ mailfromd -I/var/mailfromd -I/com/mailfromd

creates the following include search path

  1. '/var/mailfromd'
  2. '/com/mailfromd'
  3. 'PREFIX/share/mailfromd/8.8/include'
  4. 'PREFIX/share/mailfromd/include'
  5. '/usr/share/mailfromd/include'
  6. '/usr/local/share/mailfromd/include'

   Along with '#include', there is also a special form '#include_once',
that has the same syntax:

     #include_once <FILE>
     #include_once "FILE"

   This form works exactly as '#include', except that, if the FILE has
already been included, it will not be included again.  As the name
suggests, it will be included only once.

   This form should be used to prevent re-inclusions of a code, which
can cause problems due to function redefinitions, variable reassignments
etc.

   A line in the form

     #line NUMBER "IDENTIFIER"

causes the MFL compiler to believe, for purposes of error diagnostics,
that the line number of the next source line is given by NUMBER and the
current input file is named by IDENTIFIER.  If the identifier is absent,
the remembered file name does not change.


File: mailfromd.info,  Node: Pragmas,  Next: Data Types,  Prev: Comments,  Up: MFL

4.2 Pragmatic comments
======================

If '#' is immediately followed by word 'pragma' (with optional
whitespace between them), such a construct introduces a "pragmatic
comment", i.e.  an instruction that controls some configuration setting.

   The available pragma types are described in the following
subsections.

* Menu:

* prereq::          Pragma prereq.
* stacksize::       Pragma stacksize.
* regex::           Pragma regex.
* dbprop::          Pragma dbprop.
* greylist::        Pragma greylist.
* miltermacros::    Pragma miltermacros.
* provide-callout:: Pragma provide-callout.


File: mailfromd.info,  Node: prereq,  Next: stacksize,  Up: Pragmas

4.2.1 Pragma prereq
-------------------

The '#pragma prereq' statement ensures that the correct 'mailfromd'
version is used to compile the source file it appears in.  It takes
version number as its arguments and produces a compilation error if the
actual 'mailfromd' version number is earlier than that.  For example,
the following statement:

     #pragma prereq 7.0.94

results in error if compiled with 'mailfromd' version 7.0.93 or prior.


File: mailfromd.info,  Node: stacksize,  Next: regex,  Prev: prereq,  Up: Pragmas

4.2.2 Pragma stacksize
----------------------

The 'stacksize' pragma sets the initial size of the run-time stack and
may also define the policy of its growing, in case it becomes full.  The
default stack size is 4096 words.  You may need to increase this number
if your configuration program uses recursive functions or does an
excessive amount of string manipulations.

 -- pragma: stacksize size [incr [max]]
     Sets stack size to SIZE units.  Optional INCR and MAX define stack
     growth policy (see below).  The default "units" are words.  The
     following example sets the stack size to 7168 words:

          #pragma stacksize 7168

     The SIZE may end with a "unit size" suffix:

     Suffix                 Meaning
     -------------------------------------------------------------------
     k                      Kiloword, i.e.  1024 words
     m                      Megawords, i.e.  1048576 words
     g                      Gigawords,
     t                      Terawords (ouch!)

     Table 4.1: Unit Size Suffix

     File suffixes are case-insensitive, so the following two pragmas
     are equivalent and set the stack size to '7*1048576 = 7340032'
     words:

          #pragma stacksize 7m
          #pragma stacksize 7M

     When the MFL engine notices that there is no more stack space
     available, it attempts to expand the stack.  If this attempt
     succeeds, the operation continues.  Otherwise, a runtime error is
     reported and the execution of the filter stops.

     The optional INCR argument to '#pragma stacksize' defines growth
     policy for the stack.  Two growth policies are implemented: "fixed
     increment policy", which expands stack in a fixed number of
     "expansion chunks", and "exponential growth policy", which
     duplicates the stack size until it is able to accommodate the
     needed number of words.  The fixed increment policy is the default.
     The default chunk size is 4096 words.

     If INCR is the word 'twice', the duplicate policy is selected.
     Otherwise INCR must be a positive number optionally suffixed with a
     size suffix (see above).  This indicates the expansion chunk size
     for the fixed increment policy.

     The following example sets initial stack size to 10240, and
     expansion chunk size to 2048 words:

          #pragma stacksize 10M 2K

     The pragma below enables exponential stack growth policy:

          #pragma stacksize 10240 twice

     In this case, when the run-time evaluator hits the stack size
     limit, it expands the stack to twice the size it had before.  So,
     in the example above, the stack will be sequentially expanded to
     the following sizes: 20480, 40960, 81920, 163840, etc.

     The optional MAX argument defines the maximum size of the stack.
     If stack grows beyond this limit, the execution of the script will
     be aborted.

   If you are concerned about the execution time of your script, you may
wish to avoid stack reallocations.  To help you find out the optimal
stack size, each time the stack is expanded, 'mailfromd' issues a
warning in its log file, which looks like this:

     warning: stack segment expanded, new size=8192

   You can use these messages to adjust your stack size configuration
settings.


File: mailfromd.info,  Node: regex,  Next: dbprop,  Prev: stacksize,  Up: Pragmas

4.2.3 Pragma regex
------------------

The '#pragma regex', controls compilation of expressions.  You can use
any number of such pragma directives in your 'mailfromd.mf'.  The scope
of '#pragma regex' extends to the next occurrence of this directive or
to the end of the script file, whichever occurs first.

 -- pragma: regex [push|pop] flags
     The optional PUSH|POP parameter is one of the words 'push' or 'pop'
     and is discussed in detail below.  The FLAGS parameter is a
     whitespace-separated list of "regex flags".  Each regex-flag is a
     word specifying some regex feature.  It can be preceded by '+' to
     enable this feature (this is the default), by '-' to disable it or
     by '=' to reset regex flags to its value.  Valid regex-flags are:

     'extended'
          Use POSIX Extended Regular Expression syntax when interpreting
          regex.  If not set, POSIX Basic Regular Expression syntax is
          used.

     'icase'
          Do not differentiate case.  Subsequent regex searches will be
          case insensitive.

     'newline'
          "Match-any-character" operators don't match a newline.

          A non-matching list ('[^...]') not containing a newline does
          not match a newline.

          "Match-beginning-of-line" operator ('^') matches the empty
          string immediately after a newline.

          "Match-end-of-line" operator ('$') matches the empty string
          immediately before a newline.

     For example, the following pragma enables POSIX extended, case
     insensitive matching (a good thing to start your 'mailfromd.mf'
     with):

          #pragma regex +extended +icase

   Optional modifiers 'push' and 'pop' can be used to maintain a stack
of regex flags.  The statement

     #pragma regex push [FLAGS]

saves current regex flags on stack and then optionally modifies them as
requested by FLAGS.

   The statement

     #pragma regex pop [FLAGS]

does the opposite: restores the current regex flags from the top of
stack and applies FLAGS to it.

   This statement is useful in module and include files to avoid
disturbing user regex settings.  E.g.:

     #pragma regex push +extended +icase
      .
      .
      .
     #pragma regex pop


File: mailfromd.info,  Node: dbprop,  Next: greylist,  Prev: regex,  Up: Pragmas

4.2.4 Pragma dbprop
-------------------

 -- pragma: dbprop pattern prop ...
     This pragma configures properties for a DBM database.  *Note
     Database functions::, for its detailed description.


File: mailfromd.info,  Node: greylist,  Next: miltermacros,  Prev: dbprop,  Up: Pragmas

4.2.5 Pragma greylist
---------------------

 -- pragma: greylist type
     Selects the greylisting implementation to use.  Allowed values for
     TYPE are:

     traditional
     gray
          Use the traditional greylisting implementation.  This is the
          default.

     con-tassios
     ct
          Use Con Tassios greylisting implementation.

     *Note greylisting types::, for a detailed description of these
     greylisting implementations.

   Notice, that this pragma can be used only once.  A second use of this
pragma would constitute an error, because you cannot use both
greylisting implementations in the same program.


File: mailfromd.info,  Node: miltermacros,  Next: provide-callout,  Prev: greylist,  Up: Pragmas

4.2.6 Pragma miltermacros
-------------------------

 -- pragma: miltermacros handler macro ...
     Declare that the Milter stage HANDLER uses MTA macro listed as the
     rest of arguments.  The HANDLER must be a valid handler name (*note
     Handlers::).

   The 'mailfromd' parser collects the names of the macros referred to
by a '$NAME' construct within a handler (*note Sendmail Macros::) and
declares them automatically for corresponding handlers.  It is, however,
unable to track macros used in functions called from handler as well as
those referred to via 'getmacro' and 'macro_defined' functions.  Such
macros should be declared using '#pragma miltermacros'.

   During initial negotiation with the MTA, 'mailfromd' will ask it to
export the macro names declared automatically or by using the '#pragma
miltermacros'.  The MTA is free to honor or to ignore this request.  In
particular, Sendmail versions prior to 8.14.0 and Postfix versions prior
to 2.5 do not support this feature.  If you use one of these, you will
need to export the needed macros explicitly in the MTA configuration.
For more details, refer to the section in *note MTA Configuration::
corresponding to your MTA type.


File: mailfromd.info,  Node: provide-callout,  Prev: miltermacros,  Up: Pragmas

4.2.7 Pragma provide-callout
----------------------------

The '#pragma provide-callout' statement is used in the 'callout' module
to inform 'mailfromd' that the module has been loaded.

   Do not use this pragma.


File: mailfromd.info,  Node: Data Types,  Next: Numbers,  Prev: Pragmas,  Up: MFL

4.3 Data Types
==============

The 'mailfromd' filter script language operates on entities of two
types: numeric and string.

   The "numeric" type is represented internally as a signed long
integer.  Depending on the machine architecture, its size can vary.  For
example, on machines with Intel-based CPUs it is 32 bits long.

   A "string" is a string of characters of arbitrary length.  Strings
can contain any characters except ASCII NUL.

   There is also a "generic pointer", which is designed to facilitate
certain operations.  It appears only in 'body' handler.  *Note body
handler::, for more information about it.


File: mailfromd.info,  Node: Numbers,  Next: Literals,  Prev: Data Types,  Up: MFL

4.4 Numbers
===========

A "decimal number" is any sequence of decimal digits, not beginning with
'0'.

   An "octal number" is '0' followed by any number of octal digits ('0'
through '7'), for example: '0340'.

   A "hex number" is '0x' or '0X' followed by any number of hex digits
('0' through '9' and 'a' through 'f' or 'A' through 'F'), for example:
'0x3ef1'.


File: mailfromd.info,  Node: Literals,  Next: Here Documents,  Prev: Numbers,  Up: MFL

4.5 Literals
============

A literal is any sequence of characters enclosed in single or double
quotes.

   After 'tempfail' and 'reject' actions two special kinds of literals
are recognized: three-digit numeric values represent RFC 2821 reply
codes, and literals consisting of tree digit groups separated by dots
represent an extended reply code as per RFC 1893/2034.  For example:

     510   # A reply code
     5.7.1 # An extended reply code

Double-quoted strings
---------------------

String literals enclosed in double quotation marks ("double-quoted
strings") are subject to "backslash interpretation", "macro expansion",
"variable interpretation" and "back reference interpretation".

   "Backslash interpretation" is performed at compilation time.  It
consists in replacing the following "escape sequences" with the
corresponding single characters:

Sequence               Replaced with
\a                     Audible bell character (ASCII 7)
\b                     Backspace character (ASCII 8)
\f                     Form-feed character (ASCII 12)
\n                     Newline character (ASCII 10)
\r                     Carriage return character (ASCII
                       13)
\t                     Horizontal tabulation character
                       (ASCII 9)
\v                     Vertical tabulation character
                       (ASCII 11)

Table 4.2: Backslash escapes

   In addition, the sequence '\NEWLINE' has the same effect as '\n', for
example:

     "a string with\
      embedded newline"
     "a string with\n embedded newline"

   Any escape sequence of the form '\xHH', where H denotes any hex digit
is replaced with the character whose ASCII value is HH.  For example:

     "\x61nother" => "another"

   Similarly, an escape sequence of the form '\0OOO', where O is an
octal digit, is replaced with the character whose ASCII value is OOO.

   Macro expansion and variable interpretation occur at run-time.
During these phases all Sendmail macros (*note Sendmail Macros::),
'mailfromd' variables (*note Variables::), and constants (*note
Constants::) referenced in the string are replaced by their actual
values.  For example, if the Sendmail macro 'f' has the value
'postmaster@gnu.org.ua' and the variable 'last_ip' has the value
'127.0.0.1', then the string(1)

     "$f last connected from %last_ip;"

will be expanded to

     "postmaster@gnu.org.ua last connected from 127.0.0.1;"

   A "back reference" is a sequence '\D', where D is a decimal number.
It refers to the Dth parenthesized subexpression in the last 'matches'
statement(2).  Any back reference occurring within a double-quoted
string is replaced by the value of the corresponding subexpression.
*Note Special comparisons::, for a detailed description of this process.
Back reference interpretation is performed at run time.

Single-quoted strings
---------------------

Any characters enclosed in single quotation marks are read unmodified.

   The following examples contain pairs of equivalent strings:

     "a string"
     'a string'

     "\\(.*\\):"
     '\(.*\):'

   Notice the last example.  Single quotes are particularly useful in
writing regular expressions (*note Special comparisons::).

   ---------- Footnotes ----------

   (1) Implementation note: actually, the references are not interpreted
within the string, instead, each such string is split at compilation
time into a series of concatenated atoms.  Thus, our sample string will
actually be compiled as:

     $f . " last connected from " . last_ip . ";"

   *Note Concatenation::, for a description of this construct.  You can
easily see how various strings are interpreted by using '--dump-tree'
option (*note --dump-tree::).  In this case, it will produce:

       CONCAT:
         CONCAT:
           CONCAT:
             SYMBOL: f
             CONSTANT: " last connected from "
           VARIABLE last_ip (13)
         CONSTANT: ";"

   (2) The subexpressions are numbered by the positions of their opening
parentheses, left to right.


File: mailfromd.info,  Node: Here Documents,  Next: Sendmail Macros,  Prev: Literals,  Up: MFL

4.6 Here Documents
==================

"Here-document" is a special form of a string literal is, allowing to
specify multiline strings without having to use backslash escapes.  The
format of here-documents is:

     <<[FLAGS]WORD
     ...
     WORD

   The '<<WORD' construct instructs the parser to read all the following
lines up to the line containing only WORD, with possible trailing
blanks.  The lines thus read are concatenated together into a single
string.  For example:

     set str <<EOT
     A multiline
     string
     EOT

   The body of a here-document is interpreted the same way as
double-quoted strings (*note Double-quoted strings::).  For example, if
Sendmail macro 'f' has the value 'jsmith@some.com' and the variable
'count' is set to '10', then the following string:

     set s <<EOT
     <$f> has tried to send %count mails.
     Please see docs for more info.
     EOT

will be expanded to:

     <jsmith@some.com> has tried to send 10 mails.
     Please see docs for more info.

   If the WORD is quoted, either by enclosing it in single quote
characters or by prepending it with a backslash, all interpretations and
expansions within the document body are suppressed.  For example:

     set s <<'EOT'
     The following line is read verbatim:
     <$f> has tried to send %count mails.
     Please see docs for more info.
     EOT

   Optional FLAGS in the here-document construct control the way leading
white space is handled.  If FLAGS is '-' (a dash), then all leading tab
characters are stripped from input lines and the line containing WORD.
Furthermore, if '-' is followed by a single space, all leading
whitespace is stripped from them.  This allows here-documents within
configuration scripts to be indented in a natural fashion.  Examples:

     <<- TEXT
         <$f> has tried to send %count mails.
         Please see docs for more info.
     TEXT

   Here-documents are particularly useful with 'reject' actions (*note
reject::.


File: mailfromd.info,  Node: Sendmail Macros,  Next: Constants,  Prev: Here Documents,  Up: MFL

4.7 Sendmail Macros
===================

Sendmail macros are referenced exactly the same way they are in
'sendmail.cf' configuration file, i.e. '$NAME', where NAME represents
the macro name.  Notice, that the notation is the same for both
single-character and multi-character macro names.  For consistency with
the 'Sendmail' configuration the '${NAME}' notation is also accepted.

   Another way to reference Sendmail macros is by using function
'getmacro' (*note Macro access::).

   Sendmail macros evaluate to string values.

   Notice, that to reference a macro, you must properly export it in
your MTA configuration.  Attempt to reference a not exported macro will
result in raising a 'e_macroundef' exception at the run time (*note
uncaught exceptions::).


File: mailfromd.info,  Node: Constants,  Next: Variables,  Prev: Sendmail Macros,  Up: MFL

4.8 Constants
=============

A "constant" is a symbolic name for an MFL value.  Constants are defined
using 'const' statement:

     [QUALIFIER] const NAME EXPR

where NAME is an identifier, and EXPR is any valid MFL expression
evaluating immediately to a constant literal or numeric value.  Optional
QUALIFIER defines the scope of visibility for that constant (*note scope
of visibility::): either 'public' or 'static'.

   Once defined, any appearance of NAME in the program text is replaced
by its value.  For example:

     const x 10/5
     const text "X is "

defines the numeric constant 'x' with the value '5', and the literal
constant 'text' with the value 'X is '.

   A special construct is provided to define a series of numeric
constants (an "enumeration"):

     [QUALIFIER] const
     do
       NAME0 [EXPR0]
       NAME1 [EXPR1]
       ...
       NAMEN [EXPRN]
     done

Each EXPRN, if present, must evaluate to a constant numeric expression.
The resulting value will be assigned to constant NAMEN.  If EXPRN is not
supplied, the constant will be defined to the value of the previons
constant plus one.  If EXPR0 is not supplied, 0 is assumed.

   For example, consider the following statement

     const
     do
       A
       B
       C 10
       D
     done

This defines 'A' to 0, 'B' to 1, 'C' to 10 and 'D' to 11.

   As a matter of fact, EXPRN may also evaluate to a constant string
expression, provided that all expressions in the enumeration 'const'
statement are provided.  That is, the following is correct:

     const
     do
       A "one"
       B "two"
       C "three"
       D "four"
     done

whereas the following is not:

     const
     do
       A "one"
       B
       C "three"
       D "four"
     done

   Trying to compile the latter example will produce:

     mailfromd: FILENAME:5.3: initializer element is not numeric

which means that 'mailfromd' was trying to create constant 'B' with the
value of 'A' incremented by one, but was unable to do so, because the
value in question was not numeric.

   Constants can be used in normal MFL expressions as well as in
literals.  To expand a constant within a literal string, prepend a
percent sign to its name, e.g.:

     echo "New %text %x" => "New X is 2"

   This way of expanding constants creates an ambiguity if there happen
to be a variable of the same name as the constant.  *Note
variable--constant clashes::, for more information of this case and ways
to handle it.

* Menu:

* Built-in constants::


File: mailfromd.info,  Node: Built-in constants,  Up: Constants

4.8.1 Built-in constants
------------------------

Several constants are built into the MFL compiler.  To discern them from
user-defined ones, their names start and end with two underscores
('__').

   The following constants are defined in 'mailfromd' version 8.8:

 -- Built-in constant: string __file__
     Expands to the name of the current source file.

 -- Built-in constant: string __function__
     Expands to the name of the current lexical context, i.e.  the
     function or handler name.

 -- Built-in constant: string __git__
     This built-in constant is defined for alpha versions only.  Its
     value is the Git tag of the recent commit corresponding to that
     version of the package.  If the release contains some uncommitted
     changes, the value of the '__git__' constant ends with the suffix
     '-dirty'.

 -- Built-in constant: number __line__
     Expands to the current line number in the input source file.

 -- Built-in constant: number __major__
     Expands to the major version number.

     The following example uses '__major__' constant to determine if
     some version-dependent feature can be used:

          if __major__ > 2
            # Use some version-specific feature
          fi

 -- Built-in constant: number __minor__
     Expands to the minor version number.

 -- Built-in constant: string __module__
     Expands to the name of the current module (*note Modules::).

 -- Built-in constant: string __package__
     Expands to the package name ('mailfromd')

 -- Built-in constant: number __patch__
     For alpha versions and maintenance releases expands to the version
     patch level.  For stable versions, expands to '0'.

 -- Built-in constant: string __defpreproc__
     Expands to the default external preprocessor command line, if the
     preprocessor is used, or to an empty string if it is not, e.g.:

          __defpreproc__ => "/usr/bin/m4 -s"

     *Note Preprocessor::, for information on preprocessor and its
     features.

 -- Built-in constant: string __preproc__
     Expands to the current external preprocessor command line, if the
     preprocessor is used, or to an empty string if it is not.  Notice,
     that it equals '__defpreproc__', unless the preprocessor was
     redefined using '--preprocessor' command line option (*note
     -preprocessor: Preprocessor.).

 -- Built-in constant: string __version__
     Expands to the textual representation of the program version (e.g.
     '3.0.90')

 -- Built-in constant: string __defstatedir__
     Expands to the default state directory (*note statedir::).

 -- Built-in constant: string __statedir__
     Expands to the current value of the program state directory (*note
     statedir::).  Notice, that it is the same as '__defstatedir__'
     unless the state directory was redefined at run time.

   Built-in constants can be used as variables, this allows to expand
them within strings or here-documents.  The following example
illustrates the common practice used for debugging configuration
scripts:

     func foo(number x)
     do
       echo "%__file__:%__line__: foo called with arg %x"
       ...
     done

   If the function 'foo' were called in line 28 of the script file
'/etc/mailfromd.mf', like this: 'foo(10)', you will see the following
string in your logs:

     /etc/mailfromd.mf:28: foo called with arg 10


File: mailfromd.info,  Node: Variables,  Next: Back references,  Prev: Constants,  Up: MFL

4.9 Variables
=============

Variables represent regions of memory used to hold variable data.  These
memory regions are identified by "variable names".  A variable name must
begin with a letter or underscore and must consist of letters, digits
and underscores.

   Each variable is associated with its "scope of visibility", which
defines the part of source code where it can be used (*note scope of
visibility::).  Depending on the scope, we discern three main classes of
variables: public, static and automatic (or local).

   "Public variables" have indefinite lexical scope, so they may be
referred to anywhere in the program.  "Static" are variables visible
only within their module (*note Modules::).  "Automatic" or "local
variables" are visible only within the given function or handler.

   Public and static variables are sometimes collectively called
"global".

   These variable classes occupy separate "namespaces", so that an
automatic variable can have the same name as an existing public or
static one.  In this case this variable is said to "shadow" its global
counterpart.  All references to such a name will refer to the automatic
variable until the end of its scope is reached, where the global one
becomes visible again.

   Likewise, a static variable may have the same name as a static
variable defined in another module.  However, it may not have the same
name as a public variable.

   A variable is "declared" using the following syntax:

     [QUALIFIERS] TYPE NAME

where NAME is the variable name, TYPE is the type of the data it is
supposed to hold.  It is 'string' for string variables and 'number' for
numeric ones.

   For example, this is a declaration of a string variable 'var':

     string var

   Optional QUALIFIERS are allowed only in global declarations, i.e.  in
the variable declarations that appear outside of functions.  They
specify the scope of the variable.  The 'public' qualifier declares the
variable as public and the 'static' qualifier declares it as static.
The default scope is 'public', unless specified otherwise in the module
declaration (*note module structure::).

   Additionally, QUALIFIERS may contain the word 'precious', which
instructs the compiler to mark this variable as "precious".  (*note
precious variables: rset.).  The value of the precious variable is not
affected by the SMTP 'RSET' command.  If both scope qualifier and
'precious' are used, they may appear in any order, e.g.:

     static precious string rcpt_list

or

     precious static string rcpt_list

   The declaration can be followed by any valid MFL expression, which
supplies the "initial value" for the variable, for example:

     string var "test"

   If a variable declaration occurs within a function (*note
User-defined: Functions.) or handler (*note Handlers::), it declares an
automatic variable, local to this function or handler.  Otherwise, it
declares a global variable.

   A variable is assigned a value using 'set' statement:

     set NAME EXPR

where NAME is the variable name and EXPR is a 'mailfromd' expression
(*note Expressions::).  The effect of this statement is that the EXPR is
evaluated and the value it yields is assigned to the variable NAME.

   If the 'set' statement is located outside a function or handler
definition, the EXPR must be a constant expression, i.e.  the compiler
should be able to evaluate it immediately.  See optimizer.

   It is not an error to assign a value to a variable that is not
declared.  In this case the assignment first declares a global or
automatic variable having the type of EXPR and then assigns a value to
it.  Automatic variable is created if the assignment occurs within a
function or handler, global variable is declared if it occurs at topmost
lexical level.  This is called "implicit variable declaration".

   Variables are referenced using the notation '%NAME'.  The variable
being referenced must have been declared earlier (either explicitly or
implicitly).

* Menu:

* Predefined variables::


File: mailfromd.info,  Node: Predefined variables,  Up: Variables

4.9.1 Predefined Variables
--------------------------

Several variables are predefined.  In 'mailfromd' version 8.8 these are:

 -- Variable: Predefined Variable number cache_used
     This variable is set by 'stdpoll' and 'strictpoll' built-ins (and,
     consequently, by the 'on poll' statement).  Its value is '1' if the
     function used the cached data instead of directly polling the host,
     and '0' if the polling took place.  *Note SMTP Callout functions::.

     You can use this variable to make your reject message more
     informative for the remote party.  The common paradigm is to define
     a function, returning empty string if the result was obtained from
     polling, or some notice if cached data were used, and to use the
     function in the 'reject' text, for example:

          func cachestr() returns string
          do
            if cache_used
              return "[CACHED] "
            else
              return ""
            fi
          done

     Then, in 'prog envfrom' one can use:

          on poll $f
          do
          when not_found or failure:
            reject 550 5.1.0 cachestr() . "Sender validity not confirmed"
          done

 -- Predefined Variable: string clamav_virus_name
     Name of virus identified by 'ClamAV'.  Set by 'clamav' function
     (*note ClamAV::).

 -- Predefined Variable: number greylist_seconds_left
     Number of seconds left to the end of greylisting period.  Set by
     'greylist' and 'is_greylisted' functions (*note Special test
     functions::).

 -- Predefined Variable: string ehlo_domain
     Name of the domain used by polling functions in SMTP 'EHLO' or
     'HELO' command.  Default value is the fully qualified domain name
     of the host where 'mailfromd' is run.  *Note Polling::.

 -- Variable: Predefined Variable string last_poll_greeting
     Callout functions (*note SMTP Callout functions::) set this
     variable before returning.  It contains the initial SMTP reply from
     the last polled host.

 -- Variable: Predefined Variable string last_poll_helo
     Callout functions (*note SMTP Callout functions::) set this
     variable before returning.  It contains the reply to the 'HELO'
     ('EHLO') command, received from the last polled host.

 -- Variable: Predefined Variable string last_poll_host
     Callout functions (*note SMTP Callout functions::) set this
     variable before returning.  It contains the host name or IP address
     of the last polled host.

 -- Variable: Predefined Variable string last_poll_recv
     Callout functions (*note SMTP Callout functions::) set this
     variable before returning.  It contains the last SMTP reply
     received from the remote host.  In case of multi-line replies, only
     the first line is stored.  If nothing was received the variable
     contains the string 'nothing'.

 -- Variable: Predefined Variable string last_poll_sent
     Callout functions (*note SMTP Callout functions::) set this
     variable before returning.  It contains the last SMTP command sent
     to the polled host.  If nothing was sent, 'last_poll_sent' contains
     the string 'nothing'.

 -- Predefined Variable: string mailfrom_address
     Email address used by polling functions in SMTP 'MAIL FROM' command
     (*note Polling::.).  Default is '<>'.  Here is an example of how to
     change it:

          set mailfrom_address "postmaster@my.domain.com"

     You can set this value to a comma-separated list of email
     addresses, in which case the probing will try each address until
     either the remote party accepts it or the list of addresses is
     exhausted, whichever happens first.

     It is not necessary to enclose emails in angle brackets, as they
     will be added automatically where appropriate.  The only exception
     is null return address, when used in a list of addresses.  In this
     case, it should always be written as '<>'.  For example:

          set mailfrom_address "postmaster@my.domain.com, <>"

 -- Predefined Variable: number sa_code
     Spam score for the message, set by 'sa' function (*note sa::).

 -- Predefined Variable: number rcpt_count
     The variable 'rcpt_count' keeps the number of recipients given so
     far by 'RCPT TO' commands.  It is defined only in 'envrcpt'
     handlers.

 -- Predefined Variable: number sa_threshold
     Spam threshold, set by 'sa' function (*note sa::).

 -- Predefined Variable: string sa_keywords
     Spam keywords for the message, set by 'sa' function (*note sa::).

 -- Predefined Variable: number safedb_verbose
     This variable controls the verbosity of the exception-safe database
     functions.  *Note safedb_verbose::.


File: mailfromd.info,  Node: Back references,  Next: Handlers,  Prev: Variables,  Up: MFL

4.10 Back references
====================

A "back reference" is a sequence '\D', where D is a decimal number.  It
refers to the Dth parenthesized subexpression in the last 'matches'
statement(1).  Any back reference occurring within a double-quoted
string is replaced with the value of the corresponding subexpression.
For example:

     if $f matches '.*@\(.*\)\.gnu\.org\.ua'
       set host \1
     fi

   If the value of 'f' macro is 'smith@unza.gnu.org.ua', the above code
will assign the string 'unza' to the variable 'host'.

   Notice, that each occurrence of 'matches' will reset the table of
back references, so try to use them as early as possible.  The following
example illustrates a common error, when the back reference is used
after the reference table has been reused by another matching:

     # Wrong!
     if $f matches '.*@\(.*\)\.gnu\.org\.ua'
       if $f matches 'some.*'
         set host \1
       fi
     fi

   This will produce the following run time error:

     mailfromd: RUNTIME ERROR near file.mf:3: Invalid back-reference number

because the inner match ('some.*') does not have any parenthesized
subexpressions.

   *Note Special comparisons::, for more information about 'matches'
operator.

   ---------- Footnotes ----------

   (1) The subexpressions are numbered by the positions of their opening
parentheses, left to right.


File: mailfromd.info,  Node: Handlers,  Next: begin/end,  Prev: Back references,  Up: MFL

4.11 Handlers
=============

"Milter stage handler" (or "handler", for short) is a subroutine
responsible for processing a particular milter state.  There are eight
handlers available.  Their order of invocation and arguments are
described in *note Figure 3.1: milter-control-flow.

   A handler is defined using the following construct:

     prog HANDLER-NAME
     do
       HANDLER-BODY
     done

where HANDLER-NAME is the name of the handler (*note handler names::),
HANDLER-BODY is the list of filter statements composing the handler
body.  Some handlers take arguments, which can be accessed within the
HANDLER-BODY using the notation $N, where N is the ordinal number of the
argument.  Here we describe the available handlers and their arguments:

 -- Handler: connect (string $1, number $2, number $3, string $4)
     Invocation:
          This handler is called once at the beginning of each SMTP
          connection.

     Arguments:
            1. 'string'; The host name of the message sender, as
               reported by MTA.  Usually it is determined by a reverse
               lookup on the host address.  If the reverse lookup fails,
               '$1' will contain the message sender's IP address
               enclosed in square brackets (e.g. '[127.0.0.1]').

            2. 'number'; Socket address family.  You need to require the
               'status' module to get symbolic definitions for the
               address families.  Supported families are:

               Constant           Value   Meaning
               ------------------------------------------------------------
               FAMILY_STDIO       0       Standard input/output (the MTA
                                          is run with '-bs' option)
               FAMILY_UNIX        1       UNIX socket
               FAMILY_INET        2       IPv4 protocol
               FAMILY_INET6       3       IPv6 protocol

               Table 4.3: Supported socket families

            3. 'number'; Port number if '$2' is 'FAMILY_INET'.

            4. 'string'; Remote IP address if '$2' is 'FAMILY_INET' or
               full file name of the socket if '$2' is 'FAMILY_UNIX'.
               If '$2' is 'FAMILY_STDIO', '$4' is an empty string.

     The actions (*note Actions::) appearing in this handler are handled
     by Sendmail in a special way.  First of all, any textual message is
     ignored.  Secondly, the only action that immediately closes the
     connection is 'tempfail 421'.  Any other reply codes result in
     Sendmail switching to "nullserver" mode, where it accepts any
     commands, but answers with a failure to any of them, except for the
     following: 'QUIT', 'HELO', 'NOOP', which are processed as usual.

     The following table summarizes the Sendmail behavior depending on
     the action used:

     'tempfail 421 EXCODE MESSAGE'
          The caller is returned the following error message:

               421 4.7.0 HOSTNAME closing connection

          Both EXCODE and MESSAGE are ignored.

     'tempfail 4XX EXCODE MESSAGE'
          (where XX represents any digits, except '21') Both EXCODE and
          MESSAGE are ignored.  Sendmail switches to nullserver mode.
          Any subsequent command, excepting the ones listed above, is
          answered with

               454 4.3.0 Please try again later

     'reject 5XX EXCODE MESSAGE'
          (where XX represents any digits).  All arguments are ignored.
          Sendmail switches to nullserver mode.  Any subsequent command,
          excepting ones listed above, is answered with

               550 5.0.0 Command rejected

     Regarding reply codes, this behavior complies with RFC 2821
     (section 3.9), which states:

          An SMTP server _must not_ intentionally close the connection
          except:
          [...]
          - After detecting the need to shut down the SMTP service and
          returning a 421 response code.  This response code can be
          issued after the server receives any command or, if necessary,
          asynchronously from command receipt (on the assumption that
          the client will receive it after the next command is issued).

     However, the RFC says nothing about textual messages and extended
     error codes, therefore Sendmail's ignoring of these is, in my
     opinion, absurd.  My practice shows that it is often reasonable,
     and even necessary, to return a meaningful textual message if the
     initial connection is declined.  The opinion of 'mailfromd' users
     seems to support this view.  Bearing this in mind, 'mailfromd' is
     shipped with a patch for Sendmail, which makes it honor both
     extended return code and textual message given with the action.
     Two versions are provided: 'etc/sendmail-8.13.7.connect.diff', for
     Sendmail versions 8.13.x, and 'etc/sendmail-8.14.3.connect.diff',
     for Sendmail versions 8.14.3.

 -- Handler: helo (string $1)
     Invocation:
          This handler is called whenever the SMTP client sends 'HELO'
          or 'EHLO' command.  Depending on the actual MTA configuration,
          it can be called several times or even not at all.

     Arguments:
            1. 'string'; Argument to 'HELO' ('EHLO') commands.

     Notes:
          According to RFC 28221, '$1' must be domain name of the
          sending host, or, in case this is not available, its IP
          address enclosed in square brackets.  Be careful when taking
          decisions based on this value, because in practice many hosts
          send arbitrary strings.  We recommend to use 'heloarg_test'
          function (*note heloarg_test::) if you wish to analyze this
          value.

 -- Handler: envfrom (string $1, string $2)
     Invocation:
          Called when the SMTP client sends 'MAIL FROM' command, i.e.
          once at the beginning of each message.

     Arguments:
            1. 'string'; First argument to the 'MAIL FROM' command, i.e.
               the email address of the sender.
            2. 'string'; Rest of arguments to 'MAIL FROM' separated by
               space character.  This argument can be '""'.

     Notes
            1. '$1' is not the same as '$f' Sendmail variable, because
               the latter contains the sender email after address
               rewriting and normalization, while '$1' contains exactly
               the value given by sending party.

            2. When the array type is implemented, '$2' will contain an
               array of arguments.

 -- Handler: envrcpt (string $1, string $2)
     Invocation:
          Called once for each 'RCPT TO' command, i.e.  once for each
          recipient, immediately after 'envfrom'.
     Arguments:
            1. 'string'; First argument to the 'RCPT TO' command, i.e.
               the email address of the recipient.
            2. 'string'; Rest of arguments to 'RCPT TO' separated by
               space character.  This argument can be '""'.

     Notes:
          When the array type is implemented, '$2' will contain an array
          of arguments.

 -- Handler: data ()
     Invocation:
          Called after the MTA receives SMTP 'DATA' command.  Notice
          that this handler is not supported by Sendmail versions prior
          to 8.14.0 and Postfix versions prior to 2.5.
     Arguments:
          None

 -- Handler: header (string $1, string $2)
     Invocation:
          Called once for each header line received after SMTP 'DATA'
          command.
     Arguments:
            1. 'string'; Header field name.
            2. 'string'; Header field value.  The content of the header
               may include folded white space, i.e., multiple lines with
               following white space where lines are separated by LF
               (ASCII 10).  The trailing line terminator (CR/LF) is
               removed.

 -- Handler: eoh
     Invocation:
          This handler is called once per message, after all headers
          have been sent and processed.
     Arguments:
          None.

 -- Handler: body (pointer $1, number $2)
     Invocation:
          This header is called zero or more times, for each piece of
          the message body obtained from the remote host.
     Arguments:
            1. 'pointer'; Piece of body text.  See 'Notes' below.
            2. 'number'; Length of data pointed to by '$1', in bytes.
     Notes:
          The first argument points to the body chunk.  Its size may be
          quite considerable and passing it as a string may be costly
          both in terms of memory and execution time.  For this reason
          it is not passed as a string, but rather as a "generic
          pointer", i.e.  an object having the same size as 'number',
          which can be used to retrieve the actual contents of the body
          chunk if the need arises.

          A special function 'body_string' is provided to convert this
          object to a regular MFL string (*note Mail body functions::).
          Using it you can collect the entire body text into a single
          global variable, as illustrated by the following example:

               string text

               prog body
               do
                 set text text . body_string($1,$2)
               done

   The text collected this way can then be used in the 'eom' handler
(see below) to parse and analyze it.

   If you wish to analyze both the headers and mail body, the following
code fragment will do that for you:

     string text

     # Collect all headers.
     prog header
     do
       set text text . $1 . ": " . $2 . "\n"
     done

     # Append terminating newline to the headers.
     prog eoh
     do
       set text "%text\n"
     done

     # Collect message body.
     prog body
     do
       set text text . body_string($1, $2)
     done

 -- Handler: eom
     Invocation:
          This handler is called once per message, when the terminating
          dot after 'DATA' command has been received.
     Arguments:
          None
     Notes:
          This handler is useful for calling "message capturing"
          functions, such as 'sa' or 'clamav'.  For more information
          about these, refer to *note Interfaces to Third-Party
          Programs::.

   For your reference, the following table shows each handler with its
arguments:

Handler        $1             $2             $3             $4
---------------------------------------------------------------------------
connect        Hostname       Socket         Port           Remote
                              Family                        address
helo           'HELO'         N/A            N/A            N/A
               domain
envfrom        Sender email   Rest of        N/A            N/A
               address        arguments
envrcpt        Recipient      Rest of        N/A            N/A
               email          arguments
               address
header         Header name    Header value   N/A            N/A
eoh            N/A            N/A            N/A            N/A
body           Body segment   Length of      N/A            N/A
               (pointer)      the segment
                              (numeric)
eom            N/A            N/A            N/A            N/A

Table 4.4: State Handler Arguments


File: mailfromd.info,  Node: begin/end,  Next: Functions,  Prev: Handlers,  Up: MFL

4.12 The 'begin' and 'end' special handlers
===========================================

Apart from the milter handlers described in the previous section, MFL
defines two special handlers, called 'begin' and 'end', which supply
startup and cleanup instructions for the filter program.

   The 'begin' special handler is executed once for each SMTP session,
after the connection has been established but before the first milter
handler has been called.  Similarly, the 'end' handler is executed
exactly once, after the connection has been closed.  Neither of them
takes any arguments.

   The two handlers are defined using the following syntax:

     # Begin handler
     begin
     do
       ...
     done

     # End handler
     end
     do
       ...
     done

where '...' represent any MFL statements.

   An MFL program may have multiple 'begin' and 'end' definitions.  They
can be intermixed with other definitions.  The compiler combines all
'begin' statements into a single one, in the order they appear in the
sources.  Similarly, all 'end' blocks are concatenated together.  The
resulting 'begin' is called once, at the beginning of each SMTP session,
and 'end' is called once at its termination.

   Multiple 'begin' and 'end' handlers are a useful feature for writing
modules (*note Modules::), because each module can thus have its own
initialization and cleanup blocks.  Notice, however, that in this case
the order in which subsequent 'begin' and 'end' blocks are executed is
not defined.  It is only warranted that all 'begin' blocks are executed
at startup and all 'end' blocks are executed at shutdown.  It is also
warranted that all 'begin' and 'end' blocks defined within a compilation
unit (i.e.  a single abstract source file, with all '#include' and
'#include_once' statements expanded in place) are executed in order of
their appearance in the unit.

   Due to their special nature, the startup and cleanup blocks impose
certain restrictions on the statements that can be used within them:

  1. 'return' cannot be used in 'begin' and 'end' handlers.

  2. The following Sendmail actions cannot be used in them: 'accept',
     'continue', 'discard', 'reject', 'tempfail'.  They can, however, be
     used in 'catch' statements, declared in 'begin' blocks (see example
     below).

  3. Header manipulation actions (*note header manipulation::) cannot be
     used in 'end' handler.

   The 'begin' handlers are the usual place to put global initialization
code to.  For example, if you do not want to use DNS caching, you can do
it this way:

     begin
     do
       db_set_active("dns", 0)
     done

   Additionally, you can set up global exception handling routines
there.  For example, the following 'begin' statement disables DNS cache
and, for all exceptions not handled otherwise, installs a handler that
logs the exception along with the stack trace and continues processing
the message:

     begin
     do
       db_set_active("dns", 0)
       catch *
       do
         echo "Caught exception $1: $2"
         stack_trace()
         continue
       done
     done


File: mailfromd.info,  Node: Functions,  Next: Expressions,  Prev: begin/end,  Up: MFL

4.13 Functions
==============

A "function" is a named 'mailfromd' subroutine, which takes zero or more
"parameters" and optionally returns a certain value.  Depending on the
return value, functions can be subdivided into "string functions" and
"number functions".  A function may have "mandatory" and "optional
parameters".  When invoked, the function must be supplied exactly as
many "actual arguments" as the number of its mandatory parameters.

   Functions are invoked using the following syntax:

       NAME (ARGS)

where NAME is the function name and ARGS is a comma-separated list of
expressions.  For example, the following are valid function calls:

       foo(10)
       interval("1 hour")
       greylist("/var/my.db", 180)

   The number of parameters a function takes and their data types
compose the "function signature".  When actual arguments are passed to
the function, they are converted to types of the corresponding formal
parameters.

   There are two major groups of functions: "built-in" functions, that
are implemented in the 'mailfromd' binary, and "user-defined" functions,
that are written in MFL.  The invocation syntax is the same for both
groups.

   'Mailfromd' is shipped with a rich set of "library functions".  These
are described in *note Library::.  In addition to these you can define
your own functions.

   Function definitions can appear anywhere between the handler
declarations in a filter program, the only requirement being that the
function definition occur before the place where the function is
invoked.

   The syntax of a function definition is:

     [QUALIFIER] func NAME (PARAM-DECL) returns DATA-TYPE
     do
       FUNCTION-BODY
     done

where NAME is the name of the function to define, PARAM-DECL is a
comma-separated list of parameter declarations.  The syntax of the
latter is the same as that of variable declarations (*note Variable
declarations: Variables.), i.e.:

     TYPE NAME

declares the parameter NAME having the type TYPE.  The TYPE is 'string'
or 'number'.

   Optional QUALIFIER declares the scope of visibility for that function
(*note scope of visibility::).  It is similar to that of variables,
except that functions cannot be local (i.e.  you cannot declare function
within another function).

   The 'public' qualifier declares a function that may be referred to
from any module, whereas the 'static' qualifier declares a function that
may be called only from the current module (*note Modules::).  The
default scope is 'public', unless specified otherwise in the module
declaration (*note module structure::).

   For example, the following declares a function 'sum', that takes two
numeric arguments and returns a numeric value:

     func sum(number x, number y) returns number

   Similarly, the following is a declaration of a static function:

     static func sum(number x, number y) returns number

   Parameters are referenced in the FUNCTION-BODY by their name, the
same way as other variables.  Similarly, the value of a parameter can be
altered using 'set' statement.

   A function can be declared to take a certain number of "optional
arguments".  In a function declaration, optional abstract arguments must
be placed after the mandatory ones, and must be separated from them with
a semicolon.  The following example is a definition of function 'foo',
which takes two mandatory and two optional arguments:

     func foo(string msg, string email; number x, string pfx)

Mandatory parameters are: 'msg' and 'email'.  Optional parameters are:
'x' and 'pfx'.  The actual number of arguments supplied to the function
is returned by a special construct '$#'.  In addition, the special
construct '@ARG' evaluates to the ordinal number of variable ARG in the
list of formal parameters (the first argument has number '0').  These
two constructs can be used to verify whether an argument is supplied to
the function.

   When an actual argument for parameter 'n' is supplied, the number of
actual arguments ('$#') is greater than the ordinal number of that
parameter in the declaration list ('@N').  Thus, the following construct
can be used to check if an optional argument ARG is actually supplied:

     func foo(string msg, string email; number x, string arg)
     do
       if $# > @arg
         ...
       fi

   The default 'mailfromd' installation provides a special macro for
this purpose: *note defined::.  Using it, the example above could be
rewritten as:

     func foo(string msg, string email; number x, string arg)
     do
       if defined(arg)
         ...
       fi

   Within a function body, optional arguments are referenced exactly the
same way as the mandatory ones.  Attempt to dereference an optional
argument for which no actual parameter was supplied, results in an
undefined value, so be sure to check whether a parameter is passed
before dereferencing it.

   A function can also take variable number of arguments (such functions
are called "variadic").  This is indicated by the use of ellipsis as the
last abstract parameter.  The statement below defines a function 'foo'
taking one mandatory, one optional and any number of additional
arguments:

     func foo (string a ; string b, ...)

   All actual arguments passed in a list of variable arguments are
coerced to string data type.  To refer to these arguments in the
function body, the following construct is used:

     $(EXPR)

where EXPR is any valid MFL expression, evaluating to a number N.  This
construct refers to the value of Nth actual parameter from the variable
argument list.  Parameters are numbered from '1', so the first variable
parameter is '$(1)', and the last one is '$($# - NM - NO)', where NM and
NO are numbers of mandatory and optional parameters to the function.

   For example, the function below prints all its arguments:

     func pargs (string text, ...)
     do
       echo "text=%text"
       loop for number i 1,
            while i <= $# - 1,
            set i i + 1
       do
         echo "arg %i=" . $(i)
       done
     done

Note the loop limits.  The last variable argument has number '$# - 1',
because the function takes one mandatory argument.

   The FUNCTION-BODY is any list of valid 'mailfromd' statements.  In
addition to the statements discussed below (*note Statements::) it can
also contain the 'return' statement, which is used to return a value
from the function.  The syntax of the return statement is

       return VALUE

   As an example of this, consider the following code snippet that
defines the function 'sum' to return a sum of its two arguments:

     func sum(number x, number y) returns number
     do
             return x + y
     done

   The 'returns' part in the function declaration is optional.  A
declaration lacking it defines a "procedure", or "void function", i.e.
a function that is not supposed to return any value.  Such functions
cannot be used in expressions, instead they are used as statements
(*note Statements::).  The following example shows a function that emits
a customized temporary failure notice:

     func stdtf()
     do
       tempfail 451 4.3.5 "Try again later"
     done

   A function may have several names.  An alternative name (or "alias")
can be assigned to a function by using 'alias' keyword, placed after
PARAM-DECL part, for example:

     func foo()
     alias bar
     returns string
     do
       ...
     done

   After this declaration, both 'foo()' and 'bar()' will refer to the
same function.

   The number of function aliases is unlimited.  The following fragment
declares a function having three names:

     func foo()
     alias bar
     alias baz
     returns string
     do
       ...
     done

   Although this feature is rarely needed, there are sometimes cases
when it may be necessary.

   A variable declared within a function becomes a local variable to
this function.  Its lexical scope ends with the terminating 'done'
statement.

   Parameters, local variables and global variables are using separate
namespaces, so a parameter name can coincide with the name of a global,
in which case a parameter is said to "shadow" the global.  All
references to its name will refer to the parameter, until the end of its
scope is reached, where the global one becomes visible again.  Consider
the following example:

     number x

     func foo(string x)
     do
       echo "foo: %x"
     done

     prog envfrom
     do
       set x "Global"
       foo("Local")
       echo x
     done

Running 'mailfromd --test' with this configuration will display:

     foo: Local
     Global

* Menu:

* Some Useful Functions::


File: mailfromd.info,  Node: Some Useful Functions,  Up: Functions

4.13.1 Some Useful Functions
----------------------------

To illustrate the concept of user-defined functions, this subsection
shows the definitions of some of the library functions shipped with
'mailfromd'(1).  These functions are contained in modules installed
along with the 'mailfromd' binary.  To use any of them in your code,
require the appropriate module as described in *note import::, e.g.  to
use the 'revip' function, do 'require 'revip''.

   Functions and their definitions:

  1. 'revip'

     The function 'revip' (*note revip::) is implemented as follows:

          func revip(string ip) returns string
          do
            return inet_ntoa(ntohl(inet_aton(ip)))
          done

     Previously it was implemented using regular expressions.  Below we
     include this variant as well, as an illustration for the use of
     regular expressions:

          #pragma regex push +extended
          func revip(string ip) returns string
          do
            if ip matches '([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)'
              return "\4.\3.\2.\1"
            fi
            return ip
          done
          #pragma regex pop

  2. 'strip_domain_part'

     This function returns at most N last components of the domain name
     DOMAIN (*note strip_domain_part::).

          #pragma regex push +extended

          func strip_domain_part(string domain, number n) returns string
          do
            if n > 0 and
              domain matches '.*((\.[^.]+){' . $2 . '})'
              return substring(\1, 1, -1)
            else
              return domain
            fi
          done
          #pragma regex pop

  3. 'valid_domain'

     *Note valid_domain::, for a description of this function.  Its
     definition follows:

          require dns

          func valid_domain(string domain) returns number
          do
            return not (resolve(domain) = "0" and not hasmx(domain))
          done

  4. 'match_dnsbl'

     The function 'match_dnsbl' (*note match_dnsbl::) is defined as
     follows:

          require dns
          require match_cidr
          #pragma regex push +extended

          func match_dnsbl(string address, string zone, string range)
              returns number
          do
            string rbl_ip
            if range = 'ANY'
              set rbl_ip '127.0.0.0/8'
            else
              set rbl_ip range
              if not range matches '^([0-9]{1,3}\.){3}[0-9]{1,3}$'
                return 0
              fi
            fi

            if not (address matches '^([0-9]{1,3}\.){3}[0-9]{1,3}$'
                    and address != range)
              return 0
            fi

            if address matches
                  '^([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})$'
              if match_cidr (resolve ("\4.\3.\2.\1", zone), rbl_ip)
                return 1
              else
                return 0
              fi
            fi
            # never reached
          done

   ---------- Footnotes ----------

   (1) Notice that these are intended for educational purposes and do
not necessarily coincide with the actual definitions of these functions
in Mailfromd version 8.8.


File: mailfromd.info,  Node: Expressions,  Next: Shadowing,  Prev: Functions,  Up: MFL

4.14 Expressions
================

Expressions are language constructs, that evaluate to a value, that can
subsequently be echoed, tested in a conditional statement, assigned to a
variable or passed to a function.

* Menu:

* Constant expressions::      String and Numeric Constants.
* Function calls::            A Function Call is an Expression.
* Concatenation::             String Concatenation.
* Arithmetic operations::     '+', '-', etc.
* Bitwise shifts::            '<<' and '>>'.
* Relational expressions::    '=', '<', etc.
* Special comparisons::       'matches', 'mx matches', etc.
* Boolean expressions::       'and', 'or', 'not'.
* Precedence::                How various operators nest.
* Type casting::


File: mailfromd.info,  Node: Constant expressions,  Next: Function calls,  Up: Expressions

4.14.1 Constant Expressions
---------------------------

Literals and numbers are "constant expressions".  They evaluate to
string and numeric types.


File: mailfromd.info,  Node: Function calls,  Next: Concatenation,  Prev: Constant expressions,  Up: Expressions

4.14.2 Function Calls
---------------------

A function call is an expression.  Its type is the return type of the
function.


File: mailfromd.info,  Node: Concatenation,  Next: Arithmetic operations,  Prev: Function calls,  Up: Expressions

4.14.3 Concatenation
--------------------

Concatenation operator is '.' (a dot).  For example, if '$f' is 'smith',
and '$client_addr' is '10.10.1.1', then:

     $f . "-" . $client_addr => "smith-10.10.1.1"

   Any two adjacent literal strings are concatenated, producing a new
string, e.g.

     "GNU's" " not " "UNIX" => "GNU's not UNIX"


File: mailfromd.info,  Node: Arithmetic operations,  Next: Bitwise shifts,  Prev: Concatenation,  Up: Expressions

4.14.4 Arithmetic Operations
----------------------------

The filter script language offers the common arithmetic operators: '+',
'-', '*' and '/'.  In addition, the '%' is a "modulo" operator, i.e.  it
computes the remainder of division of its operands.

   All of them follow usual precedence rules and work as you would
expect them to.


File: mailfromd.info,  Node: Bitwise shifts,  Next: Relational expressions,  Prev: Arithmetic operations,  Up: Expressions

4.14.5 Bitwise shifts
---------------------

The '<<' represents a "bitwise shift left" operation, which shifts the
binary representation of the operand on its left by the number of bits
given by the operand on its right.

   Similarly, the '>>' represents a "bitwise shift right".


File: mailfromd.info,  Node: Relational expressions,  Next: Special comparisons,  Prev: Bitwise shifts,  Up: Expressions

4.14.6 Relational Expressions
-----------------------------

Relational expressions are:

Expression         Result
--------------------------------------------------------------------------
X '<' Y            True if X is less than Y.
X '<=' Y           True if X is less than or equal to Y.
X '>' Y            True if X is greater than Y.
X '>=' Y           True if X is greater than or equal to Y.
X '=' Y            True if X is equal to Y.
X '!=' Y           True if X is not equal to Y.

Table 4.5: Relational Expressions

   The relational expressions apply to string as well as to numbers.
When a relational operation applies to strings, case-sensitive
comparison is used, e.g.:

     "String" = "string" => False
     "String" < "string" => True


File: mailfromd.info,  Node: Special comparisons,  Next: Boolean expressions,  Prev: Relational expressions,  Up: Expressions

4.14.7 Special Comparisons
--------------------------

In addition to the traditional relational operators, described above,
'mailfromd' provides two operators for regular expression matching:

Expression         Result
--------------------------------------------------------------------------
X 'matches' Y      True if the string X matches the regexp denoted by
                   Y.
X 'fnmatches' Y    True if the string X matches the globbing pattern
                   denoted by Y.

Table 4.6: Regular Expression Matching

   The type of the regular expression used by 'matches' operator is
controlled by '#pragma regex' (*note pragma regex::).  For example:

     $f => "gray@gnu.org.ua"
     $f matches '.*@gnu\.org\.ua' => true
     $f matches '.*@GNU\.ORG\.UA' => false
     #pragma regex +icase
     $f matches '.*@GNU\.ORG\.UA' => true

   The 'fnmatches' operator compares its left-hand operand with a
globbing pattern (see 'glob(7)') given as its right-hand side operand.
For example:

     $f => "gray@gnu.org.ua"
     $f fnmatches "*ua" => true
     $f fnmatches "*org" => false
     $f fnmatches "*org*" => true

   Both operators have a special form, for "'MX' pattern matching".  The
expression:

       X mx matches Y

is evaluated as follows: first, the expression X is analyzed and, if it
is an email address, its domain part is selected.  If it is not, its
value is used verbatim.  Then the list of 'MX's for this domain is
looked up.  Each of 'MX' names is then compared with the regular
expression Y.  If any of the names matches, the expression returns true.
Otherwise, its result is false.

   Similarly, the expression:

       X mx fnmatches Y

returns true only if any of the 'MX's for (domain or email) X match the
globbing pattern Y.

   Both 'mx matches' and 'mx fnmatches' can signal the following
exceptions: 'e_temp_failure', 'e_failure'.

   The value of any parenthesized subexpression occurring within the
right-hand side argument to 'matches' or 'mx matches' can be referenced
using the notation '\D', where D is the ordinal number of the
subexpression (subexpressions are numbered from left to right, starting
at 1).  This notation is allowed in the program text as well as within
double-quoted strings and here-documents, for example:

     if $f matches '.*@\(.*\)\.gnu\.org\.ua'
       set message "Your host name is \1;"
     fi

   Remember that the grouping symbols are '\(' and '\)' for basic
regular expressions, and '(' and ')' for extended regular expressions.
Also make sure you properly escape all special characters (backslashes
in particular) in double-quoted strings, or use single-quoted strings to
avoid having to do so (*note singe-vs-double::, for a comparison of the
two forms).


File: mailfromd.info,  Node: Boolean expressions,  Next: Precedence,  Prev: Special comparisons,  Up: Expressions

4.14.8 Boolean Expressions
--------------------------

A "boolean expression" is a combination of relational or matching
expressions using the boolean operators 'and', 'or' and 'not', and,
eventually, parentheses to control nesting:

Expression         Result
--------------------------------------------------------------------------
X 'and' Y          True only if both X and Y are true.
X 'or' Y           True if any of X or Y is true.
'not' X            True if X is false.

table 4.1: Boolean Operators

   Binary boolean expressions are computed using "shortcut evaluation":

'X and Y'
     If 'X => false', the result is 'false' and Y is not evaluated.

'X or Y'
     If 'X => true', the result is 'true' and Y is not evaluated.


File: mailfromd.info,  Node: Precedence,  Next: Type casting,  Prev: Boolean expressions,  Up: Expressions

4.14.9 Operator Precedence
--------------------------

Operator "precedence" is an abstract value associated with each language
operator, that determines the order in which operators are executed when
they appear together within a single expression.  Operators with higher
precedence are executed first.  For example, '*' has a higher precedence
than '+', therefore the expression 'a + b * c' is evaluated in the
following order: first 'b' is multiplied by 'c', then 'a' is added to
the product.

   When operators of equal precedence are used together they are
evaluated from left to right (i.e., they are "left-associative"), except
for comparison operators, which are non-associative (these are
explicitly marked as such in the table below).  This means that you
cannot write:

     if 5 <= x <= 10

Instead, you should write:

     if 5 <= x and x <= 10

   The precedences of the 'mailfromd' operators where selected so as to
match that used in most programming languages.(1)

   The following table lists all operators in order of decreasing
precedence:

'(...)'
     Grouping

'$ %'
     'Sendmail' macros and 'mailfromd' variables

'* /'
     Multiplication, division

'+ -'
     Addition, subtraction

'<< >>'
     Bitwise shift left and right

'< <= >= >'
     Relational operators (non-associative)

'= != matches fnmatches'
     Equality and special comparison (non-associative)

'&'
     Logical (bitwise) AND

'^'
     Logical (bitwise) XOR

'|'
     Logical (bitwise) OR

'not'
     Boolean negation

'and'
     Logical 'and'.

'or'
     Logical 'or'

'.'
     String concatenation

   ---------- Footnotes ----------

   (1) The only exception is 'not', whose precedence in MFL is much
lower than usual (in most programming languages it has the same
precedence as unary '-').  This allows to write conditional expressions
in more understandable manner.  Consider the following condition:

     if not x < 2 and y = 3

   It is understood as "if 'x' is not less than 2 and 'y' equals 3",
whereas with the usual precedence for 'not' it would have meant "if
negated 'x' is less than 2 and 'y' equals 3".


File: mailfromd.info,  Node: Type casting,  Prev: Precedence,  Up: Expressions

4.14.10 Type Casting
--------------------

When two operands on each side of a binary expression have different
type, 'mailfromd' evaluator coerces them to a common type.  This is
known as "implicit type casting".  The rules for implicit type casting
are:

  1. Both arguments to an arithmetical operation are cast to numeric
     type.

  2. Both arguments to the concatenation operation are cast to string.

  3. Both arguments to 'match' or 'fnmatch' function are cast to string.

  4. The argument of the unary negation (arithmetical or boolean) is
     cast to numeric.

  5. Otherwise the right-hand side argument is cast to the type of the
     left-hand side argument.

   The construct for explicit type cast is:

     TYPE(EXPR)

where TYPE is the name of the type to coerce EXPR to.  For example:

     string(2 + 4*8) => "34"


File: mailfromd.info,  Node: Shadowing,  Next: Statements,  Prev: Expressions,  Up: MFL

4.15 Variable and Constant Shadowing
====================================

When any two named entities happen to have the same name we say that a
"name clash" occurs.  The handling of name clashes depends on types of
the entities involved in it.

function - any
--------------

A name of a constant or variable can coincide with that of a function,
it does not produce any warnings or errors because functions, variables
and constants use different namespaces.  For example, the following code
is correct:

     const a 4

     func a()
     do
       echo a
     done

   When executed, it prints '4'.

function - function, handler - function, and function - handler
---------------------------------------------------------------

Redefinition of a function or using a predefined handler name (*note
Handlers::) as a function name results in a fatal error.  For example,
compiling this code:

     func a()
     do
       echo "1"
     done

     func a()
     do
       echo "2"
     done

causes the following error message:

     mailfromd: sample.mf:9: syntax error, unexpected
     FUNCTION_PROC, expecting IDENTIFIER

handler - variable
------------------

A variable name can coincide with a handler name.  For example, the
following code is perfectly OK:

     string envfrom "M"
     prog envfrom
     do
             echo envfrom
     done

handler - handler
-----------------

If two handlers with the same name are defined, the definition that
appears further in the source text replaces the previous one.  A warning
message is issued, indicating locations of both definitions, e.g.:

     mailfromd: sample.mf:116: Warning: Redefinition of handler
     `envfrom'
     mailfromd: sample.mf:34: Warning: This is the location of the
     previous definition

variable - variable
-------------------

Defining a variable having the same name as an already defined one
results in a warning message being displayed.  The compilation succeeds.
The second variable "shadows" the first, that is any subsequent
references to the variable name will refer to the second variable.  For
example:

     string x "Text"
     number x 1

     prog envfrom
     do
       echo x
     done

   Compiling this code results in the following diagnostics:

     mailfromd: sample.mf:4: Redeclaring `x' as different data type
     mailfromd: sample.mf:2: This is the location of the previous
     definition

   Executing it prints '1', i.e.  the value of the last definition of
'x'.

   The scope of the shadowing depends on storage classes of the two
variables.  If both of them have external storage class (i.e.  are
global ones), the shadowing remains in effect until the end of input.
In other words, the previous definition of the variable is effectively
forgotten.

   If the previous definition is a global, and the shadowing definition
is an automatic variable or a function parameter, the scope of this
shadowing ends with the scope of the second variable, after which the
previous definition (global) becomes visible again.  Consider the
following code:

     set x "initial"

     func foo(string x) returns string
     do
       return x
     done

     prog envfrom
     do
       echo foo("param")
       echo x
     done

   Its compilation produces the following warning:

     mailfromd: sample.mf:3: Warning: Parameter `x' is shadowing a global

   When executed, it produces the following output:

     param
     initial
     State envfrom: continue

variable - constant
-------------------

If a constant is defined which has the same name as a previously defined
variable (the constant "shadows" the variable), the compiler prints the
following diagnostic message:

     FILE:LINE: Warning: Constant name `NAME' clashes with a variable name
     FILE:LINE: Warning: This is the location of the previous definition

   A similar diagnostics is issued if a variable is defined whose name
coincides with a previously defined constant (the variable shadows the
constant).

   In any case, any subsequent notation %NAME refers to the last defined
symbol, be it variable or constant.

   Notice, that shadowing occurs only when using %NAME notation.
Referring to the constant using its name without '%' allows to avoid
shadowing effects.

   If a variable shadows a constant, the scope of the shadowing depends
on the storage class of the variable.  For automatic variables and
function parameters, it ends with the final 'done' closing the function.
For global variables, it lasts up to the end of input.

   For example, consider the following code:

     const a 4

     func foo(string a)
     do
       echo a
     done

     prog envfrom
     do
       foo(10)
       echo a
     done

   When run, it produces the following output:

     $ mailfromd --test sample.mf
     mailfromd: sample.mf:3: Warning: Variable name `a' clashes with a
     constant name
     mailfromd: sample.mf:1: Warning: This is the location of the previous
     definition
     10
     4
     State envfrom: continue

constant - constant
-------------------

Redefining a constant produces a warning message.  The latter definition
shadows the former.  Shadowing remains in effect until the end of input.


File: mailfromd.info,  Node: Statements,  Next: Conditionals,  Prev: Shadowing,  Up: MFL

4.16 Statements
===============

Statements are language constructs, that, unlike expressions, do not
return any value.  Statements execute some actions, such as assigning a
value to a variable, or serve to control the execution flow in the
program.

* Menu:

* Actions::                     Actions control the handling of the mail.
* Assignments::
* Pass::
* Echo::


File: mailfromd.info,  Node: Actions,  Next: Assignments,  Up: Statements

4.16.1 Action Statements
------------------------

An "action" statement instructs 'mailfromd' to perform a certain action
over the message being processed.  There are two kinds of actions:
return actions and header manipulation actions.

Reply Actions
.............

Reply actions tell 'Sendmail' to return given response code to the
remote party.  There are five such actions:

'accept'
     Return an 'accept' reply.  The remote party will continue
     transmitting its message.

'reject CODE EXCODE MESSAGE-EXPR'
'reject (CODE-EXPR, EXCODE-EXPR, MESSAGE-EXPR)'
     Return a 'reject' reply.  The remote party will have to cancel
     transmitting its message.  The three arguments are optional, their
     usage is described below.

'tempfail CODE EXCODE MESSAGE'
'tempfail (CODE-EXPR, EXCODE-EXPR, MESSAGE-EXPR)'
     Return a 'temporary failure' reply.  The remote party can retry to
     send its message later.  The three arguments are optional, their
     usage is described below.

'discard'
     Instructs 'Sendmail' to accept the message and silently discard it
     without delivering it to any recipient.

'continue'
     Stops the current handler and instructs 'Sendmail' to continue
     processing of the message.

   Two actions, 'reject' and 'tempfail' can take up to three optional
parameters.  There are two forms of supplying these parameters.

   In the first form, called "literal" or "traditional" notation, the
arguments are supplied as additional words after the action name, and
are separated by whitespace.  The first argument is a three-digit RFC
2821 reply code.  It must begin with '5' for 'reject' and with '4' for
'tempfail'.  If two arguments are supplied, the second argument must be
either an "extended reply code" (RFC 1893/2034) or a textual string to
be returned along with the SMTP reply.  Finally, if all three arguments
are supplied, then the second one must be an extended reply code and the
third one must give the textual string.  The following examples
illustrate the possible ways of using the 'reject' statement:

     reject
     reject 503
     reject 503 5.0.0
     reject 503 "Need HELO command"
     reject 503 5.0.0 "Need HELO command"

   The notion "textual string", used above means either a literal string
or an MFL expression that evaluates to string.  However, both code and
extended code must always be literal.

   The second form of supplying arguments is called "functional"
notation, because it resembles the function syntax.  When used in this
form, the action word is followed by a parenthesized group of exactly
three arguments, separated by commas.  Each argument is a MFL
expression.  The meaning and ordering of the arguments is the same as in
literal form.  Any or all of these three arguments may be absent, in
which case it will be replaced by the default value.  To illustrate
this, here are the statements from the previous example, written in
functional notation:

     reject(,,)
     reject(503,,)
     reject(503, 5.0.0)
     reject(503, , "Need HELO command")
     reject(503, 5.0.0, "Need HELO command")

   Notice that there is an important difference between the two
notations.  The functional notation allows to compute both reply codes
at run time, e.g.:

       reject(500 + dig2*10 + dig3, "5.%edig2.%edig2")

Header Actions
..............

Header manipulation actions provide basic means to add, delete or modify
the message RFC 2822 headers.

'add NAME STRING'
     Add the header NAME with the value STRING.  E.g.:

          add "X-Seen-By" "Mailfromd 8.8"

     (notice argument quoting)

'replace NAME STRING'
     The same as 'add', but if the header NAME already exists, it will
     be removed first, for example:

          replace "X-Last-Processor" "Mailfromd 8.8"

'delete NAME'
     Delete the header named NAME:

          delete "X-Envelope-Date"

   These actions impose some restrictions.  First of all, their first
argument must be a literal string (not a variable or expression).
Secondly, there is no way to select a particular header instance to
delete or replace, which may be necessary to properly handle multiple
headers (e.g. 'Received').  For more elaborate ways of header
modifications, see *note Header modification functions::.


File: mailfromd.info,  Node: Assignments,  Next: Pass,  Prev: Actions,  Up: Statements

4.16.2 Variable Assignments
---------------------------

An "assignment" is a special statement that assigns a value to the
variable.  It has the following syntax:

     set NAME VALUE

where NAME is the variable name and VALUE is the value to be assigned to
it.

   Assignment statements can appear in any part of a filter program.  If
an assignment occurs outside of function or handler definition, the
VALUE must be a literal value (*note Literals::).  If it occurs within a
function or handler definition, VALUE can be any valid 'mailfromd'
expression (*note Expressions::).  In this case, the expression will be
evaluated and its value will be assigned to the variable.  For example:

     set delay 150

     prog envfrom
     do
       set delay delay * 2
       ...
     done


File: mailfromd.info,  Node: Pass,  Next: Echo,  Prev: Assignments,  Up: Statements

4.16.3 The 'pass' statement
---------------------------

The 'pass' statement has no effect.  It is used in places where no
statement is needed, but the language syntax requires one:

     on poll $f do
     when success:
       pass
     when not_found or failure:
       reject 550
     done


File: mailfromd.info,  Node: Echo,  Prev: Pass,  Up: Statements

4.16.4 The 'echo' statement
---------------------------

The 'echo' statement concatenates all its arguments into a single string
and sends it to the 'syslog' using the priority 'info'.  It is useful
for debugging your script, in conjunction with built-in constants (*note
Built-in constants::), for example:

     func foo(number x)
     do
       echo "%__file__:%__line__: foo called with arg %x"
       ...
     done


File: mailfromd.info,  Node: Conditionals,  Next: Loops,  Prev: Statements,  Up: MFL

4.17 Conditional Statements
===========================

"Conditional expressions", or conditionals for short, test some
conditions and alter the control flow depending on the result.  There
are two kinds of conditional statements: "if-else" branches and "switch"
statements.

   The syntax of an "if-else" branching construct is:

       if CONDITION THEN-BODY [else ELSE-BODY] fi

Here, CONDITION is an expression that governs control flow within the
statement.  Both THEN-BODY and ELSE-BODY are lists of 'mailfromd'
statements.  If CONDITION is true, THEN-BODY is executed, if it is
false, ELSE-BODY is executed.  The 'else' part of the statement is
optional.  The condition is considered false if it evaluates to zero,
otherwise it is considered true.  For example:

     if $f = ""
       accept
     else
       reject
     fi

This will accept the message if the value of the 'Sendmail' macro '$f'
is an empty string, and reject it otherwise.  Both THEN-BODY and
ELSE-BODY can be compound statements including other 'if' statements.
Nesting level of conditional statements is not limited.

   To facilitate writing complex conditional statements, the 'elif'
keyword can be used to introduce alternative conditions, for example:

     if $f = ""
       accept
     elif $f = "root"
       echo "Mail from root!"
     else
       reject
     fi

   Another type of branching instruction is 'switch' statement:

     switch CONDITION
     do
     case X1 [or X2 ...]:
       STMT1
     case Y1 [or Y2 ...]:
       STMT2
       .
       .
       .
     [default:
       STMT]
     done

Here, X1, X2, Y1, Y2 are literal expressions; STMT1, STMT2 and STMT are
arbitrary 'mailfromd' statements (possibly compound); CONDITION is the
controlling expression.  The vertical dotted row represent another
eventual 'case' branches.

   This statement is executed as follows: the CONDITION expression is
evaluated and if its value equals X1 or X2 (or any other X from the
first 'case'), then STMT1 is executed.  Otherwise, if CONDITION
evaluates to Y1 or Y2 (or any other Y from the second 'case'), then
STMT2 is executed.  Other 'case' branches are tried in turn.  If none of
them matches, STMT (called the "default branch") is executed.

   There can be as many 'case' branches as you wish.  The 'default'
branch is optional.  There can be at most one 'default' branch.

   An example of 'switch' statement follows:

     switch x
     do
     case 1 or 3:
       add "X-Branch" "1"
       accept
     case 2 or 4 or 6:
       add "X-Branch" "2"
     default:
       reject
     done

   If the value of 'mailfromd' variable 'x' is 2 or 3, it will accept
the message immediately, and add a 'X-Branch: 1' header to it.  If 'x'
equals 2 or 4 or 6, this code will add 'X-Branch: 2' header to the
message and will continue processing it.  Otherwise, it will reject the
message.

   The controlling condition of a 'switch' statement may evaluate to
numeric or string type.  The type of the condition governs the type of
comparisons used in 'case' branches: for numeric types, numeric equality
will be used, whereas for string types, string equality is used.


File: mailfromd.info,  Node: Loops,  Next: Exceptions,  Prev: Conditionals,  Up: MFL

4.18 Loop Statements
====================

The loop statement allows for repeated execution of a block of code,
controlled by some conditional expression.  It has the following form:

     loop [LABEL]
          [for STMT1] [,while EXPR1] [,STMT2]
     do
       STMT3
     done [while EXPR2]

where STMT1, STMT2, and STMT3 are statement lists, EXPR1 and EXPR2 are
expressions.

   The control flow is as follows:

  1. If STMT1 is specified, execute it.

  2. Evaluate EXPR1.  If it is zero, go to 6.  Otherwise, continue.

  3. Execute STMT3.

  4. If STMT2 is supplied, execute it.

  5. If EXPR2 is given, evaluate it.  If it is zero, go to 6.
     Otherwise, go to 2.

  6. End.

   Thus, STMT3 is executed until either EXPR1 or EXPR2 yield a zero
value.

   The "loop body" - STMT3 - can contain special statements:

'break [LABEL]'
     Terminates the loop immediately.  Control passes to '6' (End) in
     the formal definition above.  If LABEL is supplied, the statement
     terminates the loop statement marked with that label.  This allows
     to break from nested loops.

     It is similar to 'break' statement in C or shell.

'next [LABEL]'
     Initiates next iteration of the loop.  Control passes to '4' in the
     formal definition above.  If LABEL is supplied, the statement
     starts next iteration of the loop statement marked with that label.
     This allows to request next iteration of an upper-level loop from a
     nested loop statement.

   The 'loop' statement can be used to create iterative statements of
arbitrary complexity.  Let's illustrate it in comparison with C.

   The statement:

     loop
     do
       STMT-LIST
     done

creates an infinite loop.  The only way to exit from such a loop is to
call 'break' (or 'return', if used within a function), somewhere in
STMT-LIST.

   The following statement is equivalent to 'while (EXPR1) STMT-LIST' in
C:

     loop while EXPR
     do
       STMT-LIST
     done

   The C construct 'for (EXPR1; EXPR2; EXPR3)' is written in MFL as
follows:

     loop for STMT1, while EXPR2, STMT2
     do
       STMT3
     done

   For example, to repeat STMT3 10 times:

     loop for set i 0, while i < 10, set i i + 1
     do
       STMT3
     done

   Finally, the C 'do' loop is implemented as follows:

     loop
     do
       STMT-LIST
     done while EXPR

   As a real-life example of a loop statement, let's consider the
implementation of function 'ptr_validate', which takes a single argument
IPSTR, and checks its validity using the following algorithm:

   Perform a DNS reverse-mapping for IPSTR, looking up the corresponding
'PTR' record in 'in-addr.arpa'.  For each record returned, look up its
IP addresses (A records).  If IPSTR is among the returned IP addresses,
return 1 ('true'), otherwise return 0 ('false').

   The implementation of this function in MFL is:

     #pragma regex push +extended

     func ptr_validate(string ipstr) returns number
     do
       loop for string names dns_getname(ipstr) . " "
                number i index(names, " "),
            while i != -1,
            set names substr(names, i + 1)
            set i index(names, " ")
       do
         loop for string addrs dns_getaddr(substr(names, 0, i)) . " "
                  number j index(addrs, " "),
              while j != -1,
              set addrs substr(addrs, j + 1)
              set j index(addrs, " ")
         do
           if ipstr == substr(addrs, 0, j)
             return 1
           fi
         done
       done
       return 0
     done


File: mailfromd.info,  Node: Exceptions,  Next: Polling,  Prev: Loops,  Up: MFL

4.19 Exceptional Conditions
===========================

When the running program encounters a condition it is not able to
handle, it signals an "exception".  To illustrate the concept, let's
consider the execution of the following code fragment:

       if primitive_hasmx(domainpart($f))
         accept
       fi

The function 'primitive_hasmx' (*note primitive_hasmx::) tests whether
the domain name given as its argument has any 'MX' records.  It should
return a boolean value.  However, when querying the Domain Name System,
it may fail to get a definite result.  For example, the DNS server can
be down or temporary unavailable.  In other words, 'primitive_hasmx' can
be in a situation when, instead of returning 'yes' or 'no', it has to
return 'don't know'.  It has no way of doing so, therefore it signals an
"exception".

   Each exception is identified by "exception type", an integer number
associated with it.

* Menu:

* Built-in Exceptions::
* User-defined Exceptions::
* Catch and Throw::


File: mailfromd.info,  Node: Built-in Exceptions,  Next: User-defined Exceptions,  Up: Exceptions

4.19.1 Built-in Exceptions
--------------------------

The first 20 exception numbers are reserved for "built-in exceptions".
These are declared in module 'status.mf'.  The following table
summarizes all built-in exception types implemented by 'mailfromd'
version 8.8.  Exceptions are listed in lexicographic order.

'e_badmmq'
     The called function cannot finish its task because an uncompatible
     message modification function was called at some point before it.
     For details, *note MMQ and dkim_sign::.

'e_dbfailure'
     General database failure.  For example, the database cannot be
     opened.  This exception can be signaled by any function that
     queries any DBM database.

'e_divzero'
     Division by zero.

'e_exists'
     This exception is emitted by 'dbinsert' built-in if the requested
     key is already present in the database (*note dbinsert: Database
     functions.).

'e_eof'
     Function reached end of file while reading.  *Note I/O functions::,
     for a description of functions that can signal this exception.

'e_failure'
'failure'
'e_failure'
     A general failure has occurred.  In particular, this exception is
     signaled by DNS lookup functions when any permanent failure occurs.
     This exception can be signaled by any DNS-related function
     ('hasmx', 'poll', etc.)  or operation ('mx matches').

'e_format'
     Invalid input format.  This exception is signaled if input data to
     a function are improperly formatted.  In version 8.8 it is signaled
     by 'message_burst' function if its input message is not formatted
     according to RFC 934.  *Note Message digest functions::.

'e_invcidr'
     Invalid CIDR notation.  This is signaled by 'match_cidr' function
     when its second argument is not a valid CIDR.

'e_invip'
     Invalid IP address.  This is signaled by 'match_cidr' function when
     its first argument is not a valid IP address.

'e_invtime'
     Invalid time interval specification.  It is signaled by 'interval'
     function if its argument is not a valid time interval (*note time
     interval specification::).

'e_io'
     An error occurred during the input-output operation.  *Note I/O
     functions::, for a description of functions that can signal this
     exception.

'e_macroundef'
     A Sendmail macro is undefined.

'e_noresolve'
     The argument of a DNS-related function cannot be resolved to host
     name or IP address.  Currently only 'ismx' (*note ismx::) raises
     this exception.

'e_range'
     The supplied argument is outside the allowed range.  This is
     signalled, for example, by 'substring' function (*note
     substring::).

'e_regcomp'
     Regular expression cannot be compiled.  This can happen when a
     regular expression (a right-hand argument of a 'matches' operator)
     is built at the runtime and the produced string is an invalid
     regex.

'e_ston_conv'
     String-to-number conversion failed.  This can be signaled when a
     string is used in numeric context which cannot be converted to the
     numeric data type.  For example:

           set x "10a"
           if x / 2
             ...

     The 'if' condition will signal 'ston_conv', since '10a' cannot be
     converted to a number.

'e_temp_failure'
'temp_failure'
'e_temp_failure'
     A temporary failure has occurred.  This can be signaled by
     DNS-related functions or operations.

'e_url'
     The supplied URL is invalid.  *Note Interfaces to Third-Party
     Programs::.

   In addition to these, two symbols are defined that are not exception
types in the strict sense of the world, but are provided to make writing
filter scripts more convenient.  These are 'success', meaning successful
return from a function, and 'not_found', meaning that the required
entity (e.g.  domain name or email address) was not found.  *Note Figure
4.1: figure-poll-wrapper, for an illustration on how these can be used.
For consistency with other exception codes, these can be spelled as
'e_success' and 'e_not_found'.


File: mailfromd.info,  Node: User-defined Exceptions,  Next: Catch and Throw,  Prev: Built-in Exceptions,  Up: Exceptions

4.19.2 User-defined Exceptions
------------------------------

You can define your own exception types using the 'dclex' statement:

     dclex TYPE

   In this statement, TYPE must be a valid MFL identifier, not used for
another constant (*note Constants::).  The 'dclex' statement defines a
new exception identified by the constant TYPE and allocates a new
exception number for it.

   The TYPE can subsequently be used in 'throw' and 'catch' statements,
for example:

     dclex myrange

     number fact(number val)
       returns number
     do
       if val < 0
         throw myrange "fact argument is out of range"
       fi
       ...
     done


File: mailfromd.info,  Node: Catch and Throw,  Prev: User-defined Exceptions,  Up: Exceptions

4.19.3 Exception Handling
-------------------------

Normally when an exception is signalled, the program execution is
terminated and the MTA is returned a 'tempfail' status.  Additional
information regarding the exception is then output to the logging
channel (*note Logging and Debugging::).  However, the user can
intercept any exception by installing his own exception-handling
routines.

   An exception-handling routine is introduced by a "try-catch"
statement, which has the following syntax:

     try
     do
       STMTLIST
     done
     catch EXCEPTION-LIST
     do
       HANDLER-BODY
     done

where STMTLIST and HANDLER-BODY are sequences of MFL statements and
EXCEPTION-LIST is the list of exception types, separated by the word
'or'.  A special EXCEPTION-LIST '*' is allowed and means all exceptions.

   This construct works as follows.  First, the statements from STMTLIST
are executed.  If the execution finishes successfully, control is passed
to the first statement after the 'catch' block.  Otherwise, if an
exception is signalled and this exception is listed in EXCEPTION-LIST,
the execution is passed to the HANDLER-BODY.  If the exception is not
listed in EXCEPTION-LIST, it is handled as usual.

   The following example shows a 'try--catch' construct used for
handling eventual exceptions, signalled by 'primitive_hasmx'.

     try
     do
       if primitive_hasmx(domainpart($f))
         accept
       else
         reject
       fi
     done
     catch e_failure or e_temp_failure
     do
       echo "primitive_hasmx failed"
       continue
     done

   The 'try--catch' statement can appear anywhere inside a function or a
handler, but it cannot appear outside of them.  It can also be nested
within another 'try--catch', in either of its parts.  Upon exit from a
function or milter handler, all exceptions are restored to the state
they had when it has been entered.

   A 'catch' block can also be used alone, without preceding 'try' part.
Such a construct is called a "standalone catch".  It is mostly useful
for setting global exception handlers in a 'begin' statement (*note
begin/end::).  When used within a usual function or handler, the
exception handlers set by a standalone catch remain in force until
either another standalone catch appears further in the same function or
handler, or an end of the function is encountered, whichever occurs
first.

   A standalone catch defined within a function must return from it by
executing 'return' statement.  If it does not do that explicitly, the
default value of 1 is returned.  A standalone catch defined within a
milter handler must end execution with any of the following actions:
'accept', 'continue', 'discard', 'reject', 'tempfail'.  By default,
'continue' is used.

   It is not recommended to mix 'try--catch' constructs and standalone
catches.  If a standalone catch appears within a 'try--catch' statement,
its scope of visibility is undefined.

   Upon entry to a HANDLER-BODY, two implicit positional arguments are
defined, which can be referenced in HANDLER-BODY as '$1' and '$2'.  The
first argument gives the numeric code of the exception that has
occurred.  The second argument is a textual string containing a
human-readable description of the exception.

   The following is an improved version of the previous example, which
uses these parameters to supply more information about the failure:

     try
     do
       if primitive_hasmx(domainpart($f))
         accept
       else
         reject
       fi
     done
     catch e_failure or e_temp_failure
     do
       echo "Caught exception $1: $2"
       continue
     done

   The following example defines the function 'hasmx' that returns true
if the domain part of its argument has any 'MX' records, and false if it
does not or if an exception occurs (1).

     func hasmx (string s)
       returns number
     do
       try
       do
         return primitive_hasmx(domainpart(s))
       done
       catch *
       do
         return 0
       done
     done

   The same function can written using standalone 'catch':

     func hasmx (string s)
       returns number
     do
       catch *
       do
         return 0
       done
       return primitive_hasmx(domainpart(s))
     done

   All variables remain visible within 'catch' body, with the exception
of positional arguments of the enclosing handler.  To access positional
arguments of a handler from the 'catch' body, assign them to local
variables prior to the 'try--catch' construct, e.g.:

     prog header
     do
       string hname $1
       string hvalue $2
       try
       do
         ...
       done
       catch *
       do
         echo "Exception $1 while processing header %hname: %hvalue"
         echo $2
         tempfail
       done

   You can also generate (or "raise") exceptions explicitly in the code,
using 'throw' statement:

     throw EXCODE DESCR

   The arguments correspond exactly to the positional parameters of the
'catch' statement: EXCODE gives the numeric code of the exception, DESCR
gives its textual description.  This statement can be used in complex
scripts to create non-local exits from deeply nested statements.

   Notice, that the the EXCODE argument must be an immediate value: an
exception identifier (either a built-in one or one declared previously
using a 'dclex' statement).

   ---------- Footnotes ----------

   (1) This function is part of the 'mailfromd' library, *Note hasmx::.


File: mailfromd.info,  Node: Polling,  Next: Modules,  Prev: Exceptions,  Up: MFL

4.20 Sender Verification Tests
==============================

The filter script language provides a wide variety of functions for
sender address verification or "polling", for short.  These functions,
which were described in *note SMTP Callout functions::, can be used to
implement any sender verification method.  The additional data that can
be needed is normally supplied by two global variables: 'ehlo_domain',
keeping the default domain for the 'EHLO' command, and
'mailfrom_address', which stores the sender address for probe messages
(*note Predefined variables::).

   For example, a simplest way to implement standard polling would be:

     prog envfrom
     do
       if stdpoll($1, ehlo_domain, mailfrom_address) == 0
         accept
       else
         reject 550 5.1.0 "Sender validity not confirmed"
       fi
     done

   However, this does not take into account exceptions that 'stdpoll'
can signal.  To handle them, one will have to use 'catch', for example
thus:

     require status

     prog envfrom
     do
       try
       do
         if stdpoll($1, ehlo_domain, mailfrom_address) == 0
           accept
         else
           reject 550 5.1.0 "Sender validity not confirmed"
         fi
       done
       catch e_failure or e_temp_failure
       do
         switch $1
         do
         case failure:
           reject 550 5.1.0 "Sender validity not confirmed"
         case temp_failure:
           tempfail 450 4.1.0 "Try again later"
         done
       done
     done

   If polls are used often, one can define a wrapper function, and use
it instead.  The following example illustrates this approach:

     func poll_wrapper(string email) returns number
     do
       catch e_failure or e_temp_failure
       do
         return email
       done
       return stdpoll(email, ehlo_domain, mailfrom_address)
     done

     prog envfrom
     do
       switch poll_wrapper($f)
       do
       case success:
         accept
       case not_found or failure:
         reject 550 5.1.0 "Sender validity not confirmed"
       case temp_failure:
         tempfail 450 4.1.0 "Try again later"
       done
     done

Figure 4.1: Building Poll Wrappers

   Notice the way 'envfrom' handles 'success' and 'not_found', which are
not exceptions in the strict sense of the word.

   The above paradigm is so common that 'mailfromd' provides a special
language construct to simplify it: the 'on' statement.  Instead of
manually writing the wrapper function and using it as a 'switch'
condition, you can rewrite the above example as:

     prog envfrom
     do
       on stdpoll($1, ehlo_domain, mailfrom_address)
       do
       when success:
         accept
       when not_found or failure:
         reject 550 5.1.0 "Sender validity not confirmed"
       when temp_failure:
         tempfail 450 4.1.0 "Try again later"
       done
     done

Figure 4.2: Standard poll example

As you see the statement is pretty similar to 'switch'.  The major
syntactic difference is the use of the keyword 'when' to introduce
conditional branches.

   General syntax of the 'on' statement is:

     on CONDITION
     do
       when X1 [or X2 ...]:
         STMT1
       when Y1 [or Y2 ...]:
         STMT2
         .
         .
         .
     done

The CONDITION is either a function call or a special 'poll' statement
(see below).  The values used in 'when' branches are normally symbolic
exception names (*note exception names::).

   When the compiler processes the 'on' statement it does the following:

  1. Builds a unique wrapper function, similar to that described in
     *note Figure 4.1: figure-poll-wrapper.; The name of the function is
     constructed from the CONDITION function name and an unsigned
     number, called "exception mask", that is unique for each
     combination of exceptions used in 'when' branches; To avoid name
     clashes with the user-defined functions, the wrapper name begins
     and ends with '$' which normally is not allowed in the identifiers;

  2. Translates the 'on' body to the corresponding 'switch' statement;

   A special form of the CONDITION is 'poll' keyword, whose syntax is:

     poll [for] EMAIL
          [host HOST]
          [from DOMAIN]
          [as EMAIL]

   The order of particular keywords in the 'poll' statement is
arbitrary, for example 'as EMAIL' can appear before EMAIL as well as
after it.

   The simplest form, 'poll EMAIL', performs the standard sender
verification of email address EMAIL.  It is translated to the following
function call:

       stdpoll(EMAIL, ehlo_domain, mailfrom_address)

   The construct 'poll EMAIL host HOST', runs the strict sender
verification of address EMAIL on the given host.  It is translated to
the following call:

       strictpoll(HOST, EMAIL, ehlo_domain, mailfrom_address)

   Other keywords of the 'poll' statement modify these two basic forms.
The 'as' keyword introduces the email address to be used in the SMTP
'MAIL FROM' command, instead of 'mailfrom_address'.  The 'from' keyword
sets the domain name to be used in 'EHLO' command.  So, for example the
following construct:

       poll EMAIL host HOST from DOMAIN as ADDR

is translated to

       strictpoll(HOST, EMAIL, DOMAIN, ADDR)

   To summarize the above, the code described in *note Figure 4.2:
figure-stdpoll. can be written as:

     prog envfrom
     do
       on poll $f do
       when success:
         accept
       when not_found or failure:
         reject 550 5.1.0 "Sender validity not confirmed"
       when temp_failure:
         tempfail 450 4.1.0 "Try again later"
       done
     done


File: mailfromd.info,  Node: Modules,  Next: Preprocessor,  Prev: Polling,  Up: MFL

4.21 Modules
============

A "module" is a logically isolated part of code that implements a
separate concern or feature and contains a collection of conceptually
united functions and/or data.  Each module occupies a separate
compilation unit (i.e.  file).  The functionality provided by a module
is incorporated into another module or the main program by "requiring"
this module or by "importing" the desired components from it.

* Menu:

* module structure::    Declaring Modules
* scope of visibility::
* import::              Require and Import


File: mailfromd.info,  Node: module structure,  Next: scope of visibility,  Up: Modules

4.21.1 Declaring Modules
------------------------

A module file must begin with a "module declaration":

     module MODNAME [INTERFACE-TYPE].

   Note the final dot.

   The MODNAME parameter declares the name of the module.  It is
recommended that it be the same as the file name without the '.mf'
extension.  The module name must be a valid MFL literal.  It also must
not coincide with any defined MFL symbol, therefore we recommend to
always quote it (see example below).

   The optional parameter INTERFACE-TYPE defines the "default scope of
visibility" for the symbols declared in this module.  If it is 'public',
then all symbols declared in this module are made public (importable) by
default, unless explicitly declared otherwise (*note scope of
visibility::).  If it is 'static', then all symbols, not explicitly
marked as public, become static.  If the INTERFACE-TYPE is not given,
'public' is assumed.

   The actual MFL code follows the 'module' line.

   The module definition is terminated by the "logical end" of its
compilation unit, i.e.  either by the end of file, or by the keyword
'bye', whichever occurs first.

   Special keyword 'bye' may be used to prematurely end the current
compilation unit before the physical end of the containing file.  Any
material between 'bye' and the end of file is ignored by the compiler.

   Let's illustrate these concepts by writing a module 'revip':

     module 'revip' public.

     func revip(string ip)
       returns string
     do
       return inet_ntoa(ntohl(inet_aton(ip)))
     done

     bye

     This text is ignored.  You may put any additional
     documentation here.


File: mailfromd.info,  Node: scope of visibility,  Next: import,  Prev: module structure,  Up: Modules

4.21.2 Scope of Visibility
--------------------------

"Scope of Visibility" of a symbol defines from where this symbol may be
referred to.  Symbols in MFL may have either of the following two
scopes:

"Public"
     Public symbols are visible from the current module, as well as from
     any external modules, including the main script file, provided that
     they are properly imported (*note import::).

"Static"
     Static symbols are visible only from the current module.  There is
     no way to refer to them from outside.

   The default scope of visibility for all symbols declared within a
module is defined in the module declaration (*note module structure::).
It may be overridden for any individual symbol by prefixing its
declaration with an appropriate "qualifier": either 'public' or
'static'.


File: mailfromd.info,  Node: import,  Prev: scope of visibility,  Up: Modules

4.21.3 Require and Import
-------------------------

Functions or variables declared in another module must be "imported"
prior to their actual use.  MFL provides two ways of doing so: by
"requiring" the entire module or by importing selected symbols from it.

 -- Module Import: require modname
     The 'require' statement instructs the compiler to locate the module
     MODNAME and to load all public interfaces from it.

   The compiler looks for the file 'MODNAME.mf' in the current search
path (*note include search path::).  If no such file is found, a
compilation error is reported.

   For example, the following statement:

     require revip

imports all interfaces from the module 'revip.mf'.

   Another, more sophisticated way to import from a module is to use the
'from ... import' construct:

     from MODULE import SYMBOLS.

   Note the final dot.  The 'from' and 'module' statements are the only
two constructs in MFL that require the delimiter.

   The MODULE has the same semantics as in the 'require' construct.  The
SYMBOLS is a comma-separated list of symbol names to import from MODULE.
A symbol name may be given in several forms:

  1. Literal

     Literals specify exact symbol names to import.  For example, the
     following statement imports from module 'A.mf' symbols 'foo' and
     'bar':

          from A import foo,bar.

  2. Regular expression

     Regular expressions must be surrounded by slashes.  A regular
     expression instructs the compiler to import all symbols whose names
     match that expression.  For example, the following statement
     imports from 'A.mf' all symbols whose names begin with 'foo' and
     contain at least one digit after it:

          from A import '/^foo.*[0-9]/'.

     The type of regular expressions used in the 'from' statement is
     controlled by '#pragma regex' (*note regex::).

  3. Regular expression with transformation

     Regular expression may be followed by a "s-expression", i.e.  a
     'sed'-like expression of the form:

          s/REGEXP/REPLACE/[FLAGS]

     where REGEXP is a "regular expression", REPLACE is a replacement
     for each part of the input that matches REGEXP.  S-expressions and
     their parts are discussed in detail in *note s-expression::.

     The effect of such construct is to import all symbols that match
     the regular expression and apply the s-expression to their names.

     For example:

          from A import '/^foo.*[0-9]/s/.*/my_&/'.

     This statement imports all symbols whose names begin with 'foo' and
     contain at least one digit after it, and renames them, by prefixing
     their names with the string 'my_'.  Thus, if 'A.mf' declared a
     function 'foo_1', it becomes visible under the name of 'my_foo_1'.


File: mailfromd.info,  Node: Preprocessor,  Next: Filter Script Example,  Prev: Modules,  Up: MFL

4.22 MFL Preprocessor
=====================

Before compiling the script file, 'mailfromd' preprocesses it.  The
built-in preprocessor handles only file inclusion (*note include::),
while the rest of traditional facilities, such as macro expansion, are
supported via 'm4', which is used as an external preprocessor.

   The detailed description of 'm4' facilities lies far beyond the scope
of this document.  You will find a complete user manual in *note GNU M4
manual: (m4)Top.  For the rest of this section we assume the reader is
sufficiently acquainted with 'm4' macro processor.

   The external preprocessor is invoked with '-s' flag, instructing it
to include line synchronization information in its output, which is
subsequently used by MFL compiler for purposes of error reporting.  The
initial set of macro definitions is supplied in file 'pp-setup', located
in the library search path(1), which is fed to the preprocessor input
before the script file itself.  The default 'pp-setup' file renames all
'm4' built-in macro names so they all start with the prefix 'm4_'(2).
It changes comment characters to '/*', '*/' pair, and leaves the default
quoting characters, grave ('`') and acute (''') accents without change.
Finally, 'pp-setup' defines the following macros:

 -- M4 Macro: boolean defined (IDENTIFIER)
     The IDENTIFIER must be the name of an optional abstract argument to
     the function.  This macro must be used only within a function
     definition.  It expands to the MFL expression that yields 'true' if
     the actual parameter is supplied for IDENTIFIER.  For example:

          func rcut(string text; number num)
            returns string
          do
            if (defined(num))
              return substr(text, length(text) - num)
            else
              return text
            fi
          done

     This function will return last NUM characters of TEXT if NUM is
     supplied, and entire TEXT otherwise, e.g.:

          rcut("text string") => "text string"
          rcut("text string", 3) => "ing"

     Invoking the 'defined' macro with the name of a mandatory argument
     yields 'true'

 -- M4 Macro: printf (FORMAT, ...)
     Provides a 'printf' statement, that formats its optional parameters
     in accordance with FORMAT and sends the resulting string to the
     current log output (*note Logging and Debugging::).  *Note String
     formatting::, for a description of FORMAT.

     Example usage:

          printf('Function %s returned %d', funcname, retcode)

 -- M4 Macro: string _ (MSGID)
     A convenience macro.  Expands to a call to 'gettext' (*note NLS
     Functions::).

 -- M4 Macro: string_list_iterate (LIST, DELIM, VAR, CODE)
     This macro intends to compensate for the lack of array data type in
     MFL.  It splits the string LIST into segments delimited by string
     DELIM.  For each segment, the MFL code CODE is executed.  The code
     can use the variable VAR to refer to the segment string.

     For example, the following fragment prints names of all existing
     directories listed in the 'PATH' environment variable:

          string path getenv("PATH")
          string seg

          string_list_iterate(path, ":", seg, `
               if access(seg, F_OK)
                 echo "%seg exists"
               fi')

     Care should be taken to properly quote its arguments.  In the code
     below the string 'str' is treated as a comma-separated list of
     values.  To avoid interpreting the comma as argument delimiter the
     second argument must be quoted:

          string_list_iterate(str, `","', seg, `
               echo "next segment: " . seg')

 -- M4 Macro: N_ (MSGID)
     A convenience macro, that expands to MSGID verbatim.  It is
     intended to mark the literal strings that should appear in the
     '.po' file, where actual call to 'gettext' (*note NLS Functions::)
     cannot be used.  For example:

          /* Mark the variable for translation: cannot use gettext here */
          string message N_("Mail accepted")

          prog envfrom
          do
            ...
            /* Translate and log the message */
            echo gettext(message)

   You can obtain the preprocessed output, without starting actual
compilation, using '-E' command line option:

     $ mailfromd -E file.mf

   The output is in the form of preprocessed source code, which is sent
to the standard output.  This can be useful, among others, to debug your
own macro definitions.

   Macro definitions and deletions can be made on the command line, by
using the '-D' and '-U' options.  They have the following format:

'-D NAME[=VALUE]'
'--define=NAME[=VALUE]'
     Define a symbol NAME to have a value VALUE.  If VALUE is not
     supplied, the value is taken to be the empty string.  The VALUE can
     be any string, and the macro can be defined to take arguments, just
     as if it was defined from within the input using the 'm4_define'
     statement.

     For example, the following invocation defines symbol 'COMPAT' to
     have a value '43':

          $ mailfromf -DCOMPAT=43

'-U NAME'
'--undefine=NAME'
     A counterpart of the '-D' option is the option '-U' ('--undefine').
     It undefines a preprocessor symbol whose name is given as its
     argument.  The following example undefines the symbol 'COMPAT':

          $ mailfromf -UCOMPAT

   The following two options are supplied mainly for debugging purposes:

'--no-preprocessor'
     Disables the external preprocessor.

'--preprocessor=COMMAND'
     Use COMMAND as external preprocessor.  Be especially careful with
     this option, because 'mailfromd' cannot verify whether COMMAND is
     actually some kind of a preprocessor or not.

   ---------- Footnotes ----------

   (1) It is usually located in
'/usr/local/share/mailfromd/8.8/include/pp-setup'.

   (2) This is similar to GNU m4 '--prefix-builtin' options.  This
approach was chosen to allow for using non-GNU 'm4' implementations as
well.


File: mailfromd.info,  Node: Filter Script Example,  Next: Reserved Words,  Prev: Preprocessor,  Up: MFL

4.23 Example of a Filter Script File
====================================

In this section we will discuss a working example of the filter script
file.  For the ease of illustration, it is divided in several sections.
Each section is prefaced with a comment explaining its function.

   This filter assumes that the 'mailfromd.conf' file contains the
following:

     relayed-domain-file (/etc/mail/sendmail.cw,
                          /etc/mail/relay-domains);
     io-timeout 33;
     database cache {
       negative-expire-interval 1 day;
       positive-expire-interval 2 weeks;
     };

   Of course, the exact parameter settings may vary, what is important
is that they be declared.  *Note Mailfromd Configuration::, for a
description of 'mailfromd' configuration file syntax.

   Now, let's return to the script.  Its first part defines the
configuration settings for this host:

     #pragma regex +extended +icase

     set mailfrom_address "<>"
     set ehlo_domain "gnu.org.ua"

   The second part loads the necessary source modules:

     require 'status'
     require 'dns'
     require 'rateok'

   Next we define 'envfrom' handler.  In the first two rules, it accepts
all mails coming from the null address and from the machines which we
relay:

     prog envfrom
     do
       if $f = ""
         accept
       elif relayed hostname($client_addr)
         accept
       elif hostname($client_addr) = $client_addr
         reject 550 5.7.7 "IP address does not resolve"

   Next rule rejects all messages coming from hosts with dynamic IP
addresses.  A regular expression used to catch such hosts is not 100%
fail-proof, but it tries to cover most existing host naming patterns:

        elif hostname($client_addr) matches
              ".*(adsl|sdsl|hdsl|ldsl|xdsl|dialin|dialup|\
     ppp|dhcp|dynamic|[-.]cpe[-.]).*"
          reject 550 5.7.1 "Use your SMTP relay"

   Messages coming from the machines whose host names contain something
similar to an IP are subject to strict checking:

        elif hostname($client_addr) matches
        ".*[0-9]{1,3}[-.][0-9]{1,3}[-.][0-9]{1,3}[-.][0-9]{1,3}.*"
          on poll host $client_addr for $f do
          when success:
            pass
          when not_found or failure:
            reject 550 5.1.0 "Sender validity not confirmed"
          when temp_failure:
            tempfail
          done

   If the sender domain is relayed by any of the 'yahoo.com' or
'nameserver.com' 'MX's, no checks are performed.  We will greylist this
message in 'envrcpt' handler:

        elif $f mx fnmatches "*.yahoo.com"
             or $f mx fnmatches "*.namaeserver.com"
          pass

   Finally, if the message does not meet any of the above conditions, it
is verified by the standard procedure:

        else
          on poll $f do
          when success:
            pass
          when not_found or failure:
            reject 550 5.1.0 "Sender validity not confirmed"
          when temp_failure:
            tempfail
          done
        fi

   At the end of the handler we check if the sender-client pair does not
exceed allowed mail sending rate:

        if not rateok("$f-$client_addr", interval("1 hour 30 minutes"), 100)
          tempfail 450 4.7.0 "Mail sending rate exceeded.  Try again later"
        fi
     done

   Next part defines the 'envrcpt' handler.  Its primary purpose is to
greylist messages from some domains that could not be checked otherwise:

     prog envrcpt
     do
       set gltime 300
       if $f mx fnmatches "*.yahoo.com"
          or $f mx fnmatches "*.namaeserver.com"
          and not dbmap("/var/run/whitelist.db", $client_addr)
         if greylist("$client_addr-$f-$rcpt_addr", gltime)
           if greylist_seconds_left = gltime
             tempfail 450 4.7.0
                    "You are greylisted for %gltime seconds"
           else
             tempfail 450 4.7.0
                    "Still greylisted for " .
                    %greylist_seconds_left . " seconds"
           fi
         fi
       fi
     done


File: mailfromd.info,  Node: Reserved Words,  Prev: Filter Script Example,  Up: MFL

4.24 Reserved Words
===================

For your reference, here is an alphabetical list of all reserved words:

   * __defpreproc__
   * __defstatedir__
   * __file__
   * __function__
   * __line__
   * __major__
   * __minor__
   * __module__
   * __package__
   * __patch__
   * __preproc__
   * __statedir__
   * __version__
   * accept
   * add
   * and
   * alias
   * begin
   * break
   * bye
   * case
   * catch
   * const
   * continue
   * default
   * delete
   * discard
   * do
   * done
   * echo
   * end
   * elif
   * else
   * fi
   * fnmatches
   * for
   * from
   * func
   * if
   * import
   * loop
   * matches
   * module
   * next
   * not
   * number
   * on
   * or
   * pass
   * precious
   * prog
   * public
   * reject
   * replace
   * return
   * returns
   * require
   * set
   * static
   * string
   * switch
   * tempfail
   * throw
   * try
   * vaptr
   * when
   * while

   Several keywords are context-dependent: 'mx' is a keyword if it
appears before 'matches' or 'fnmatches'.  Following strings are keywords
in 'on' context:

   * as
   * host
   * poll

   The following keywords are preprocessor macros:

   * defined
   * _ (an underscore)
   * N_

   Any keyword beginning with a 'm4_' prefix is a reserved preprocessor
symbol.


File: mailfromd.info,  Node: Library,  Next: Using MFL Mode,  Prev: MFL,  Up: Top

5 The MFL Library Functions
***************************

This chapter describes library functions available in Mailfromd version
8.8.  For the simplicity of explanation, we use the word 'boolean' to
indicate variables of numeric type that are used as boolean values.  For
such variables, the term 'False' stands for the numeric 0, and 'True'
for any non-zero value.

* Menu:

* Macro access::
* String manipulation::
* String formatting::
* Character Type::
* Email processing functions::
* Envelope modification functions::
* Header modification functions::
* Body Modification Functions::
* Message modification queue::
* Mail header functions::
* Mail body functions::
* EOM Functions::
* Current Message Functions::
* Mailbox functions::
* Message functions::
* Quarantine functions::
* SMTP Callout functions::
* Compatibility Callout functions::
* Internet address manipulation functions::
* DNS functions::
* Geolocation functions::
* Database functions::
* I/O functions::
* System functions::
* Passwd functions::
* Sieve Interface::
* Interfaces to Third-Party Programs::
* Rate limiting functions::
* Greylisting functions::
* Special test functions::
* Mail Sending Functions::
* Blacklisting Functions::
* SPF Functions::
* DKIM::
* Sockmaps::
* NLS Functions::
* Syslog Interface::
* Debugging Functions::


File: mailfromd.info,  Node: Macro access,  Next: String manipulation,  Up: Library

5.1 Sendmail Macro Access Functions
===================================

 -- Built-in Function: string getmacro (string MACRO)
     Returns the value of Sendmail macro MACRO.  If MACRO is not
     defined, raises the 'e_macroundef' exception.

     Calling 'getmacro(NAME)' is completely equivalent to referencing
     '${NAME}', except that it allows to construct macro names
     programmatically, e.g.:

            if getmacro("auth_%var") = "foo"
              ...
            fi

 -- Built-in Function: boolean macro_defined (string NAME)
     Return true if Sendmail macro NAME is defined.

   Notice, that if your MTA supports macro name negotiation(1), you will
have to export macro names used by these two functions using '#pragma
miltermacros' construct.  Consider this example:

     func authcheck(string name)
     do
       string macname "auth_%name"
       if macro_defined(macname)
         if getmacro(macname)
           ...
         fi
       fi
     done

     #pragma miltermacros envfrom auth_authen

     prog envfrom
     do
       authcheck("authen")
     done

   In this case, the parser cannot deduce that the 'envfrom' handler
will attempt to reference the 'auth_authen' macro, therefore the
'#pragma miltermacros' is used to help it.

   ---------- Footnotes ----------

   (1) That is, if it supports Milter protocol 6 and upper.  Sendmail
8.14.0 and Postfix 2.6 and newer do.  MeTA1 (via 'pmult') does as well.
*Note MTA Configuration::, for more details.


File: mailfromd.info,  Node: String manipulation,  Next: String formatting,  Prev: Macro access,  Up: Library

5.2 String Manipulation Functions
=================================

 -- Built-in Function: string escape (string STR, [string CHARS])
     Returns a copy of STR with the characters from CHARS escaped, i.e.
     prefixed with a backslash.  If CHARS is not specified, '\"' is
     assumed.

          escape('"a\tstr"ing') => '\"a\\tstr\"ing'
          escape('new "value"', '\" ') => 'new\ \"value\"'

 -- Built-in Function: string unescape (string STR)
     Performs the reverse to 'escape', i.e.  removes any prefix
     backslash characters.

          unescape('a \"quoted\" string') => 'a "quoted" string'

 -- Built-in Function: string unescape (string STR, [string CHARS])

 -- Built-in Function: string domainpart (string STR)
     Returns the domain part of STR, if it is a valid email address,
     otherwise returns STR itself.

          domainpart("gray") => "gray"
          domainpart("gray@gnu.org.ua") => "gnu.org.ua"

 -- Built-in Function: number index (string S, string T)
 -- Built-in Function: number index (string S, string T, number START)
     Returns the index of the first occurrence of the string T in the
     string S, or -1 if T is not present.

          index("string of rings", "ring") => 2

     Optional argument START, if supplied, indicates the position in
     string where to start searching.

          index("string of rings", "ring", 3) => 10

     To find the last occurrence of a substring, use the function RINDEX
     (*note rindex::).

 -- Built-in Function: number interval (string STR)
     Converts STR, which should be a valid time interval specification
     (*note time interval specification::), to seconds.

 -- Built-in Function: number length (string STR)
     Returns the length of the string STR in bytes.

          length("string") => 6

 -- Built-in Function: string dequote (string STR)
     Removes '<' and '>' surrounding STR.  If STR is not enclosed by
     angle brackets or these are unbalanced, the argument is returned
     unchanged:

          dequote("<root@gnu.org.ua>") => "root@gnu.org.ua"
          dequote("root@gnu.org.ua") => "root@gnu.org.ua"
          dequote("there>") => "there>"

 -- Built-in Function: string localpart (string STR)
     Returns the local part of STR if it is a valid email address,
     otherwise returns STR unchanged.

          localpart("gray") => "gray"
          localpart("gray@gnu.org.ua") => "gray"

 -- Built-in Function: string replstr (string S, number N)
     Replicate a string, i.e.  return a string, consisting of S repeated
     N times:

          replstr("12", 3) => "121212"

 -- Built-in Function: string revstr (string S)
     Returns the string composed of the characters from S in reversed
     order:

          revstr("foobar") => "raboof"

 -- Built-in Function: number rindex (string S, string T)
 -- Built-in Function: number rindex (string S, string T, number START)

     Returns the index of the last occurrence of the string T in the
     string S, or -1 if T is not present.

          rindex("string of rings", "ring") => 10

     Optional argument START, if supplied, indicates the position in
     string where to start searching.  E.g.:

          rindex("string of rings", "ring", 10) => 2

     See also *note 'index' built-in function: index-built-in.

 -- Built-in Function: string substr (string STR, number START)
 -- Built-in Function: string substr (string STR, number START, number
          LENGTH)

     Returns the at most LENGTH-character substring of STR starting at
     START.  If LENGTH is omitted, the rest of STR is used.

     If LENGTH is greater than the actual length of the string, the
     'e_range' exception is signalled.

          substr("mailfrom", 4) => "from"
          substr("mailfrom", 4, 2) => "fr"

 -- Built-in Function: string substring (string STR, number START,
          number END)
     Returns a substring of STR between offsets START and END,
     inclusive.  Negative END means offset from the end of the string.
     In other words, yo obtain a substring from START to the end of the
     string, use 'substring(STR, START, -1)':

          substring("mailfrom", 0, 3) => "mail"
          substring("mailfrom", 2, 5) => "ilfr"
          substring("mailfrom", 4, -1) => "from"
          substring("mailfrom", 4, length("mailfrom") - 1) => "from"
          substring("mailfrom", 4, -2) => "fro"

     This function signals 'e_range' exception if either START or END
     are outside the string length.

 -- Built-in Function: string tolower (string STR)

     Returns a copy of the string STR, with all the upper-case
     characters translated to their corresponding lower-case
     counterparts.  Non-alphabetic characters are left unchanged.

          tolower("MAIL") => "mail"

 -- Built-in Function: string toupper (string STR)
     Returns a copy of the string STR, with all the lower-case
     characters translated to their corresponding upper-case
     counterparts.  Non-alphabetic characters are left unchanged.

          toupper("mail") => "MAIL"

 -- Built-in Function: string ltrim (string STR[, string CSET)
     Returns a copy of the input string STR with any leading characters
     present in CSET removed.  If the latter is not given, white space
     is removed (spaces, tabs, newlines, carriage returns, and line
     feeds).

          ltrim("  a string") => "a string"
          ltrim("089", "0") => "89"

     Note the last example.  It shows how 'ltrim' can be used to convert
     decimal numbers in string representation that begins with '0'.
     Normally such strings will be treated as representing octal
     numbers.  If they are indeed decimal, use 'ltrim' to strip off the
     leading zeros, e.g.:

          set dayofyear ltrim(strftime('%j', time()), "0")

 -- Built-in Function: string rtrim (string STR[, string CSET)
     Returns a copy of the input string STR with any trailing characters
     present in CSET removed.  If the latter is not given, white space
     is removed (spaces, tabs, newlines, carriage returns, and line
     feeds).

 -- Built-in Function: number vercmp (string A, string B)
     Compares two strings as 'mailfromd' version numbers.  The result is
     negative if B precedes A, zero if they refer to the same version,
     and positive if B follows A:

          vercmp("5.0", "5.1") => 1
          vercmp("4.4", "4.3") => -1
          vercmp("4.3.1", "4.3") => -1
          vercmp("8.0", "8.0") => 0

 -- Library Function: string sa_format_score (number CODE, number PREC)
     Format CODE as a floating-point number with PREC decimal digits:

          sa_format_score(5000, 3) => "5.000"

     This function is convenient for formatting SpamAssassin scores for
     use in message headers and textual reports.  It is defined in
     module 'sa.mf'.

     *Note SpamAssassin: sa, for examples of its use.

 -- Library Function: string sa_format_report_header (string TEXT)
     Format a SpamAssassin report text in order to include it in a RFC
     822 header.  This function selects the score listing from TEXT, and
     prefixes each line with '* '.  Its result looks like:

          *  0.2 NO_REAL_NAME           From: does not include a real name
          *  0.1 HTML_MESSAGE           BODY: HTML included in message

     *Note SpamAssassin: sa, for examples of its use.

 -- Library Function: string strip_domain_part (string DOMAIN, number N)

     Returns at most N last components of the domain name DOMAIN.  If N
     is 0 the function returns DOMAIN.

     This function is defined in the module 'strip_domain_part.mf'
     (*note Modules::).

     Examples:

          require strip_domain_part
          strip_domain_part("puszcza.gnu.org.ua", 2) => "org.ua"
          strip_domain_part("puszcza.gnu.org.ua", 0) => "puszcza.gnu.org.ua"

 -- Library Function: boolean is_ip (string STR)

     Returns 'true' if STR is a valid IPv4 address.  This function is
     defined in the module 'is_ip.mf' (*note Modules::).

     For example:

          require is_ip

          is_ip("1.2.3.4") => 1
          is_ip("1.2.3.x") => 0
          is_ip("blah") => 0
          is_ip("255.255.255.255") => 1
          is_ip("0.0.0.0") => 1

 -- Library Function: string revip (string IP)

     Reverses octets in IP, which must be a valid string representation
     of an IPv4 address.

     Example:

     'revip("127.0.0.1") => "1.0.0.127"'

 -- Library Function: string verp_extract_user (string EMAIL, string
          DOMAIN)

     If EMAIL is a valid VERP-style email address for DOMAIN, this
     function returns the user name, corresponding to that email.
     Otherwise, it returns empty string.

          verp_extract_user("gray=gnu.org.ua@tuhs.org", 'gnu\..*')
            => "gray"


File: mailfromd.info,  Node: String formatting,  Next: Character Type,  Prev: String manipulation,  Up: Library

5.3 String formatting
=====================

 -- Built-in Function: string sprintf (string FORMAT, ...)
     The function 'sprintf' formats its argument according to FORMAT
     (see below) and returns the resulting string.  It takes varying
     number of parameters, the only mandatory one being FORMAT.

Format string
-------------

The format string is a simplified version of the format argument to C
'printf'-family functions.

   The format string is composed of zero or more "directives": ordinary
characters (not '%'), which are copied unchanged to the output stream;
and "conversion specifications", each of which results in fetching zero
or more subsequent arguments.  Each conversion specification is
introduced by the character '%', and ends with a conversion specifier.
In between there may be (in this order) zero or more "flags", an
optional "minimum field width", and an optional "precision".

   Notice, that in practice that means that you should use single quotes
with the FORMAT arguments, to protect conversion specifications from
being recognized as variable references (*note singe-vs-double::).

   No type conversion is done on arguments, so it is important that the
supplied arguments match their corresponding conversion specifiers.  By
default, the arguments are used in the order given, where each '*' and
each conversion specifier asks for the next argument.  If insufficiently
many arguments are given, 'sprintf' raises 'e_range' exception.  One can
also specify explicitly which argument is taken, at each place where an
argument is required, by writing '%M$', instead of '%' and '*M$' instead
of '*', where the decimal integer M denotes the position in the argument
list of the desired argument, indexed starting from 1.  Thus,

         sprintf('%*d', width, num);
and
         sprintf('%2$*1$d', width, num);
are equivalent.  The second style allows repeated references to the same
argument.

Flag characters
---------------

The character '%' is followed by zero or more of the following "flags":

'#'
     The value should be converted to an "alternate form".  For 'o'
     conversions, the first character of the output string is made zero
     (by prefixing a '0' if it was not zero already).  For 'x' and 'X'
     conversions, a non-zero result has the string '0x' (or '0X' for 'X'
     conversions) prepended to it.  Other conversions are not affected
     by this flag.

'0'
     The value should be zero padded.  For 'd', 'i', 'o', 'u', 'x', and
     'X' conversions, the converted value is padded on the left with
     zeros rather than blanks.  If the '0' and '-' flags both appear,
     the '0' flag is ignored.  If a precision is given, the '0' flag is
     ignored.  Other conversions are not affected by this flag.

'-'
     The converted value is to be left adjusted on the field boundary.
     (The default is right justification.)  The converted value is
     padded on the right with blanks, rather than on the left with
     blanks or zeros.  A '-' overrides a '0' if both are given.

'' ' (a space)'
     A blank should be left before a positive number (or empty string)
     produced by a signed conversion.

'+'
     A sign ('+' or '-') always be placed before a number produced by a
     signed conversion.  By default a sign is used only for negative
     numbers.  A '+' overrides a space if both are used.

Field width
-----------

An optional decimal digit string (with nonzero first digit) specifying a
minimum field width.  If the converted value has fewer characters than
the field width, it will be padded with spaces on the left (or right, if
the left-adjustment flag has been given).  Instead of a decimal digit
string one may write '*' or '*M$' (for some decimal integer M) to
specify that the field width is given in the next argument, or in the
M-th argument, respectively, which must be of numeric type.  A negative
field width is taken as a '-' flag followed by a positive field width.
In no case does a non-existent or small field width cause truncation of
a field; if the result of a conversion is wider than the field width,
the field is expanded to contain the conversion result.

Precision
---------

An optional precision, in the form of a period ('.') followed by an
optional decimal digit string.  Instead of a decimal digit string one
may write '*' or '*M$' (for some decimal integer M) to specify that the
precision is given in the next argument, or in the M-th argument,
respectively, which must be of numeric type.  If the precision is given
as just '.', or the precision is negative, the precision is taken to be
zero.  This gives the minimum number of digits to appear for 'd', 'i',
'o', 'u', 'x', and 'X' conversions, or the maximum number of characters
to be printed from a string for the 's' conversion.

Conversion specifier
--------------------

A character that specifies the type of conversion to be applied.  The
conversion specifiers and their meanings are:

d
i
     The numeric argument is converted to signed decimal notation.  The
     precision, if any, gives the minimum number of digits that must
     appear; if the converted value requires fewer digits, it is padded
     on the left with zeros.  The default precision is '1'.  When '0' is
     printed with an explicit precision '0', the output is empty.

o
u
x
X
     The numeric argument is converted to unsigned octal ('o'), unsigned
     decimal ('u'), or unsigned hexadecimal ('x' and 'X') notation.  The
     letters 'abcdef' are used for 'x' conversions; the letters 'ABCDEF'
     are used for 'X' conversions.  The precision, if any, gives the
     minimum number of digits that must appear; if the converted value
     requires fewer digits, it is padded on the left with zeros.  The
     default precision is '1'.  When '0' is printed with an explicit
     precision 0, the output is empty.

s
     The string argument is written to the output.  If a precision is
     specified, no more than the number specified of characters are
     written.

%
     A '%' is written.  No argument is converted.  The complete
     conversion specification is '%%'.


File: mailfromd.info,  Node: Character Type,  Next: Email processing functions,  Prev: String formatting,  Up: Library

5.4 Character Type
==================

These functions check whether all characters of STR fall into a certain
character class according to the 'C' ('POSIX') locale(1).  'True' (1) is
returned if they do, 'false' (0) is returned otherwise.  In the latter
case, the global variable 'ctype_mismatch' is set to the index of the
first character that is outside of the character class (characters are
indexed from 0).

 -- Built-in Function: boolean isalnum (string STR)
     Checks for alphanumeric characters:

            isalnum("a123") => 1
            isalnum("a.123") => 0 (ctype_mismatch = 1)

 -- Built-in Function: boolean isalpha (string STR)
     Checks for an alphabetic character:

            isalnum("abc") => 1
            isalnum("a123") => 0

 -- Built-in Function: boolean isascii (string STR)
     Checks whether all characters in STR are 7-bit ones, that fit into
     the ASCII character set.

            isascii("abc") => 1
            isascii("ab\0200") => 0

 -- Built-in Function: boolean isblank (string STR)
     Checks if STR contains only blank characters; that is, spaces or
     tabs.

 -- Built-in Function: boolean iscntrl (string STR)
     Checks for control characters.

 -- Built-in Function: boolean isdigit (string STR)
     Checks for digits (0 through 9).

 -- Built-in Function: boolean isgraph (string STR)
     Checks for any printable characters except spaces.

 -- Built-in Function: boolean islower (string STR)
     Checks for lower-case characters.

 -- Built-in Function: boolean isprint (string STR)
     Checks for printable characters including space.

 -- Built-in Function: boolean ispunct (string STR)
     Checks for any printable characters which are not a spaces or
     alphanumeric characters.

 -- Built-in Function: boolean isspace (string STR)
     Checks for white-space characters, i.e.: space, form-feed ('\f'),
     newline ('\n'), carriage return ('\r'), horizontal tab ('\t'), and
     vertical tab ('\v').

 -- Built-in Function: boolean isupper (string STR)
     Checks for uppercase letters.

 -- Built-in Function: boolean isxdigit (string STR)
     Checks for hexadecimal digits, i.e.  one of '0', '1', '2', '3',
     '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f', 'A',
     'B', 'C', 'D', 'E', 'F'.

   ---------- Footnotes ----------

   (1) Support for other locales is planned for future versions.


File: mailfromd.info,  Node: Email processing functions,  Next: Envelope modification functions,  Prev: Character Type,  Up: Library

5.5 Email processing functions.
===============================

 -- Built-in Function: number email_map (string EMAIL)
     Parses EMAIL and returns a bitmap, consisting of zero or more of
     the following flags:

     'EMAIL_MULTIPLE'
          EMAIL has more than one email address.

     'EMAIL_COMMENTS'
          EMAIL has comment parts.

     'EMAIL_PERSONAL'
          EMAIL has personal part.

     'EMAIL_LOCAL'
          EMAIL has local part.

     'EMAIL_DOMAIN'
          EMAIL has domain part.

     'EMAIL_ROUTE'
          EMAIL has route part.

     These constants are declared in the 'email.mf' module.  The
     function 'email_map' returns 0 if its argument is not a valid email
     address.

 -- Library Function: boolean email_valid (string EMAIL)
     Returns 'True' (1) if EMAIL is a valid email address, consisting of
     local and domain parts only.  E.g.:

          email_valid("gray@gnu.org") => 1
          email_valid("gray") => 0
          email_valid('"Sergey Poznyakoff <gray@gnu.org>') => 0

     This function is defined in 'email.mf' (*note Modules::).


File: mailfromd.info,  Node: Envelope modification functions,  Next: Header modification functions,  Prev: Email processing functions,  Up: Library

5.6 Envelope Modification Functions
===================================

Envelope modification functions set sender and add or delete recipient
addresses from the message envelope.  This allows MFL scripts to
redirect messages to another addresses.

 -- Built-in Function: void set_from (string EMAIL [, string ARGS])
     Sets envelope sender address to EMAIL, which must be a valid email
     address.  Optional ARGS supply arguments to ESMTP 'MAIL FROM'
     command.

 -- Built-in Function: void rcpt_add (string ADDRESS)
     Add the e-mail ADDRESS to the envelope.

 -- Built-in Function: void rcpt_delete (string ADDRESS)
     Remove ADDRESS from the envelope.

   The following example code uses these functions to implement a simple
alias-like capability:

     prog envrcpt
     do
        string alias dbget(aliasdb, $1, "NULL", 1)
        if alias != "NULL"
          rcpt_delete($1)
          rcpt_add(alias)
        fi
     done


File: mailfromd.info,  Node: Header modification functions,  Next: Body Modification Functions,  Prev: Envelope modification functions,  Up: Library

5.7 Header Modification Functions
=================================

There are two ways to modify message headers in a MFL script.  First is
to use header actions, described in *note Actions::, and the second way
is to use message modification functions.  Compared with the actions,
the functions offer a series of advantages.  For example, using
functions you can construct the name of the header to operate upon (e.g.
by concatenating several arguments), something which is impossible when
using actions.  Moreover, apart from three basic operations (add, modify
and remove), as supported by header actions, header functions allow to
insert a new header into a particular place.

 -- Built-in Function: void header_add (string NAME, string VALUE)
     Adds a header 'NAME: VALUE' to the message.

     In contrast to the 'add' action, this function allows to construct
     the header name using arbitrary MFL expressions.

 -- Built-in Function: void header_add (string NAME, string VALUE,
          number IDX)
     This syntax is preserved for backward compatibility.  It is
     equivalent to 'header_insert', which see.

 -- Built-in Function: void header_insert (string NAME, string VALUE,
          number IDX)
     This function inserts a header 'NAME: 'value'' at IDXth header
     position in the internal list of headers maintained by the MTA.
     That list contains headers added to the message either by the
     filter or by the MTA itself, but not the headers included in the
     message itself.  Some of the headers in this list are conditional,
     e.g.  the ones added by the 'H?COND?' directive in 'sendmail.cf'.
     MTA evaluates them after all header modifications have been done
     and removes those of headers for which they yield false.  This
     means that the position at which the header added by
     'header_insert' will appear in the final message will differ from
     IDX.

 -- Built-in Function: void header_delete (string NAME [, number INDEX])
     Delete header NAME from the envelope.  If INDEX is given, delete
     INDEXth instance of the header NAME.

     Notice the differences between this function and the 'delete'
     action:

       1. It allows to construct the header name, whereas 'delete'
          requires it to be a literal string.

       2. Optional INDEX argument allows to select a particular header
          instance to delete.

 -- Built-in Function: void header_replace (string NAME, string VALUE [,
          number INDEX])
     Replace the value of the header NAME with VALUE.  If INDEX is
     given, replace INDEXth instance of header NAME.

     Notice the differences between this function and the 'replace'
     action:

       1. It allows to construct the header name, whereas 'replace'
          requires it to be a literal string.

       2. Optional INDEX argument allows to select a particular header
          instance to replace.

 -- Library Function: void header_rename (string NAME, string NEWNAME[,
          number IDX])

     Defined in the module 'header_rename.mf'.
     Available only in the 'eom' handler.

     Renames the IDXth instance of header NAME to NEWNAME.  If IDX is
     not given, assumes 1.

     If the specified header or the IDX instance of it is not present in
     the current message, the function silently returns.  All other
     errors cause run-time exception.

     The position of the renamed header in the header list is not
     preserved.

     The example below renames 'Subject' header to 'X-Old-Subject':

          require 'header_rename'

          prog eom
          do
            header_rename("Subject", "X-Old-Subject")
          done

 -- Library Function: void header_prefix_all (string NAME [, string
          PREFIX])

     Defined in the module 'header_rename.mf'.
     Available only in the 'eom' handler.

     Renames all headers named NAME by prefixing them with PREFIX.  If
     PREFIX is not supplied, removes all such headers.

     All renamed headers will be placed in a continuous block in the
     header list.  The absolute position in the header list will change.
     Relative ordering of renamed headers will be preserved.

 -- Library Function: void header_prefix_pattern (string PATTERN, string
          PREFIX)

     Defined in the module 'header_rename.mf'.
     Available only in the 'eom' handler.

     Renames all headers with names matching PATTERN (in the sense of
     'fnmatch', *note fnmatches: Special comparisons.) by prefixing them
     with PREFIX.

     All renamed headers will be placed in a continuous block in the
     header list.  The absolute position in the header list will change.
     Relative ordering of renamed headers will be preserved.

     If called with one argument, removes all headers matching PATTERN.

     For example, to prefix all headers beginning with 'X-Spamd-' with
     an additional 'X-':

          require 'header_rename'

          prog eom
          do
            header_prefix_pattern("X-Spamd-*", "X-")
          done


File: mailfromd.info,  Node: Body Modification Functions,  Next: Message modification queue,  Prev: Header modification functions,  Up: Library

5.8 Body Modification Functions
===============================

Body modification is an experimental feature of MFL.  The version 8.8
provides only one function for that purpose.

 -- Built-in Function: void replbody (string TEXT)
     Replace the body of the message with TEXT.  Notice, that TEXT must
     not contain RFC 822 headers.  See the previous section if you want
     to manipulate message headers.

     Example:

            replbody("Body of this message has been removed by the mail filter.")

     No restrictions are imposed on the format of TEXT.

 -- Built-in Function: void replbody_fd (number FD)
     Replaces the body of the message with the content of the stream FD.
     Use this function if the body is very big, or if it is returned by
     an external program.

     Notice that this function starts reading from the current position
     in FD.  Use 'rewind' if you wish to read from the beginning of the
     stream.

     The example below shows how to preprocess the body of the message
     using external program '/usr/bin/mailproc', which is supposed to
     read the body from its standard input and write the processed text
     to its standard output:

          number fd   # Temporary file descriptor

          prog data
          do
            # Open the temporary file
            set fd tempfile()
          done

          prog body
          do
            # Write the body to it.
            write_body(fd, $1, $2)
          done

          prog eom
          do
            # Use the resulting stream as the stdin to the mailproc
            # command and read the new body from its standard output.
            rewind(fd)
            replbody_fd(spawn("</usr/bin/mailproc", fd))
          done


File: mailfromd.info,  Node: Message modification queue,  Next: Mail header functions,  Prev: Body Modification Functions,  Up: Library

5.9 Message Modification Queue
==============================

Message modification functions described in the previous subsections do
not take effect immediately, in the moment they are called.  Instead
they store the requested changes in the internal "message modification
queue".  These changes are applied at the end of processing, before
'eom' stage finishes (*note Figure 3.1: milter-control-flow.).

   One important consequence of this way of operation is that calling
any MTA action (*note Actions::), causes all prior modifications to the
message to be ignored.  That is because after receiving the action
command, MTA will not call filter for that message any more.  In
particular, the 'eom' handler will not be called, and the message
modification queue will not be flushed.  While it is logical for such
actions as 'reject' or 'tempfail', it may be quite confusing for
'accept'.  Consider, for example, the following code:

     prog envfrom
     do
       if $1 == ""
         header_add("X-Filter", "foo")
         accept
       fi
     done

   Obviously, the intention was to add a 'X-Filter' header and accept
the message if it was sent from the null address.  What happens in
reality, however, is a bit different: the message is accepted, but no
header is added to it.  If you need to accept the message and retain any
modifications you have done to it, you need to use an auxiliary
variable, e.g.:

     number accepted 0
     prog envfrom
     do
       if $1 == ""
         header_add("X-Filter", "foo")
         set accepted 1
       fi
     done

   Then, test this variable for non-zero value at the beginning of each
subsequent handler, e.g.:

     prog data
     do
       if accepted
         continue
       fi
       ...
     done

   To help you trace such problematic usages of 'accept', 'mailfromd'
emits the following warning:

     RUNTIME WARNING near /etc/mailfromd.mf:36: `accept' causes previous
     message modification commands to be ignored; call mmq_purge() prior
     to `accept', to suppress this warning

   If it is OK to lose all modifications, call 'mmq_purge', as suggested
in this message.

 -- Built-in Function: void mmq_purge ()
     Remove all modification requests from the queue.  This function
     undoes the effect of any of the following functions, if they had
     been called previously: 'rcpt_add', 'rcpt_delete', 'header_add',
     'header_insert', 'header_delete', 'header_replace', 'replbody',
     'quarantine'.


File: mailfromd.info,  Node: Mail header functions,  Next: Mail body functions,  Prev: Message modification queue,  Up: Library

5.10 Mail Header Functions
==========================

 -- Built-in Function: string message_header_encode (string TEXT,
          [string ENC, string CHARSET])
     Encode TEXT in accordance with RFC 2047.  Optional arguments:

     ENC
          Encoding to use.  Valid values are 'quoted-printable', or 'Q'
          (the default) and 'base64', or 'B'.

     CHARSET
          Character set.  By default 'UTF-8'.

     If the function is unable to encode the string, it raises the
     exception 'e_failure'.

     For example:

          set string "Keld Jørn Simonsen <keld@dkuug.dk>"
          message_header_encode(string, "ISO-8859-1")
            => "=?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>"

 -- Built-in Function: string message_header_decode (string TEXT,
          [string CHARSET])
     TEXT must be a header value encoded in accordance with RFC 2047.
     The function returns the decoded string.  If the decoding fails, it
     raises 'e_failure' exception.  The optional argument CHARSET
     specifies the character set to use (default - 'UTF-8').

          set string "=?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>"
          message_header_decode(string)
           => "Keld Jørn Simonsen <keld@dkuug.dk>"

 -- Built-in Function: string unfold (string TEXT)
     If TEXT is a "folded" multi-line RFC 2822 header value, unfold it.
     If TEXT is a single-line string, return its unchanged copy.

     For example, suppose that the message being processed contained the
     following header:

          List-Id: Sent bugreports to
            <some-address@some.net>

     Then, applying 'unfold' to its value(1) will produce:

          Sent bugreports to <some-address@some.net>

   ---------- Footnotes ----------

   (1) For example:

     prog header
     do
       echo unfold($2)
     done


File: mailfromd.info,  Node: Mail body functions,  Next: EOM Functions,  Prev: Mail header functions,  Up: Library

5.11 Mail Body Functions
========================

 -- Built-in Function: string body_string (pointer TEXT, number COUNT)
     Converts first COUNT bytes from the memory location pointed to by
     TEXT into a regular string.

     This function is intended to convert the '$1' argument passed to a
     'body' handler to a regular MFL string.  For more information about
     its use, see *note body handler::.

 -- Built-in Function: bool body_has_nulls (pointer TEXT, number COUNT)
     Returns 'True' if first COUNT bytes of the string pointed to by
     TEXT contain ASCII NUL characters.

     Example:

          prog body
          do
            if body_has_nulls($1, $2)
              reject
            fi
          done