This is mailfromd.info, produced by makeinfo version 6.7 from mailfromd.texi. Published by the Free Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Copyright (C) 2005-2020 Sergey Poznyakoff Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License". INFO-DIR-SECTION Email START-INFO-DIR-ENTRY * Mailfromd: (mailfromd). General-purpose mail-filtering software. * mailfromd: (mailfromd) Invocation. Mail Filtering and Real-time Modification daemon. * calloutd: (mailfromd) calloutd. A Stand-Alone Callout Daemon. * mfdbtool: (mailfromd) mfdbtool. Database Management Tool. * mtasim: (mailfromd) mtasim. MTA simulator. * pmult: (mailfromd) pmult. Pmilter multiplexer program. END-INFO-DIR-ENTRY Dedico aquest treball a Lluis Llach, per obrir els nous horitzons.  File: mailfromd.info, Node: Top, Next: Preface, Up: (dir) Mailfromd ********* This edition of the 'Mailfromd Manual', last updated 26 July 2020, documents 'mailfromd' Version 8.8. * Menu: * Preface:: Short description of this manual; brief history and acknowledgments. * Intro:: Introduction to Mailfromd. * Building:: Building the Package. * Tutorial:: Mailfromd Tutorial. * MFL:: The Mail Filtering Language. * Library:: The MFL Library Functions. * Using MFL Mode:: Using the GNU Emacs MFL Mode. * Mailfromd Configuration:: Configuring 'mailfromd'. * Invocation:: How to Start and Stop 'mailfromd'. * MTA Configuration:: Using 'mailfromd' with Various MTAs * calloutd:: A Stand-Alone Callout Daemon. * mfdbtool:: A Database Management Tool. * mtasim:: An MTA simulator. * pmult:: Pmilter multiplexer program. * Reporting Bugs:: How to Report a Bug. Appendices * Gacopyz:: * Time and Date Formats:: * s-expression:: * Upgrading:: * Copying This Manual:: The GNU Free Documentation License. * Concept Index:: Index of Concepts. -- The Detailed Node Listing -- Preface * History:: Short 'mailfromd' history. * Acknowledgments:: Acknowledgments. Introduction to 'mailfromd' * Conventions:: Typographical conventions. * Overview:: Mailfromd at a first glance * SAV:: Principles of Sender Address Verification. * Rate Limit:: Controlling Mail Sending Rate. * SPF:: SPF, DKIM, and others. Sender Address Verification. * Limitations:: Tutorial * Start Up:: * Simplest Configurations:: * Conditional Execution:: * Functions and Modules:: * Domain Name System:: * Checking Sender Address:: * SMTP Timeouts:: * Avoiding Verification Loops:: * HELO Domain:: * rset:: * Controlling Number of Recipients:: * Sending Rate:: * Greylisting:: * Local Account Verification:: * Databases:: * Testing Filter Scripts:: * Run Mode:: * Logging and Debugging:: * Runtime errors:: * Notes:: Databases * Database Formats:: * Basic Database Operations:: * Database Maintenance:: Run Mode * top-block:: The Top of a Script File. * getopt:: Parsing Command Line Arguments. Mail Filtering Language * Comments:: Comments. * Pragmas:: Pragmatic comments. * Data Types:: * Numbers:: * Literals:: * Here Documents:: * Sendmail Macros:: * Constants:: * Variables:: * Back references:: * Handlers:: * begin/end:: * Functions:: Functions. * Expressions:: Expressions. * Shadowing:: Variable and Constant Shadowing. * Statements:: * Conditionals:: Conditional Statements. * Loops:: Loop Statements. * Exceptions:: Exceptional Conditions and their Handling. * Polling:: Sender Verification Tests. * Modules:: Modules are Collections of Useful Functions. * Preprocessor:: Input Text Is Preprocessed. * Filter Script Example:: A Working Filter Script Explained. * Reserved Words:: A Reference List of Reserved Words. Pragmatic comments * prereq:: Pragma prereq. * stacksize:: Pragma stacksize. * regex:: Pragma regex. * dbprop:: Pragma dbprop. * greylist:: Pragma greylist. * miltermacros:: Pragma miltermacros. * provide-callout:: Pragma provide-callout. Constants * Built-in constants:: Variables * Predefined variables:: Functions * Some Useful Functions:: Expressions * Constant expressions:: String and Numeric Constants. * Function calls:: A Function Call is an Expression. * Concatenation:: String Concatenation. * Arithmetic operations:: '+', '-', etc. * Bitwise shifts:: '<<' and '>>'. * Relational expressions:: '=', '<', etc. * Special comparisons:: 'matches', 'mx matches', etc. * Boolean expressions:: 'and', 'or', 'not'. * Precedence:: How various operators nest. * Type casting:: Statements * Actions:: Actions control the handling of the mail. * Assignments:: * Pass:: * Echo:: Exceptional Conditions * Built-in Exceptions:: * User-defined Exceptions:: * Catch and Throw:: Modules * module structure:: Declaring Modules * scope of visibility:: * import:: Require and Import The MFL Library Functions * Macro access:: * String manipulation:: * String formatting:: * Character Type:: * Email processing functions:: * Envelope modification functions:: * Header modification functions:: * Body Modification Functions:: * Message modification queue:: * Mail header functions:: * Mail body functions:: * EOM Functions:: * Current Message Functions:: * Mailbox functions:: * Message functions:: * Quarantine functions:: * SMTP Callout functions:: * Compatibility Callout functions:: * Internet address manipulation functions:: * DNS functions:: * Geolocation functions:: * Database functions:: * I/O functions:: * System functions:: * Passwd functions:: * Sieve Interface:: * Interfaces to Third-Party Programs:: * Rate limiting functions:: * Greylisting functions:: * Special test functions:: * Mail Sending Functions:: * Blacklisting Functions:: * SPF Functions:: * DKIM:: * Sockmaps:: * NLS Functions:: * Syslog Interface:: * Debugging Functions:: Message Functions * Header functions:: * Message body functions:: * MIME functions:: * Message digest functions:: Interfaces to Third-Party Programs * SpamAssassin:: * DSPAM:: * ClamAV:: DSPAM * flags-dspam:: DSPAM Operation Modes and Flags. * class-dspam:: DSPAM Class and Source Bits. * vars-dspam:: DSPAM Global Variables. DKIM * Setting up a DKIM record:: Configuring 'mailfromd' * conf-types:: Special Configuration Data Types * conf-base:: Base Mailfromd Configuration * conf-server:: Server Configuration * conf-milter:: Milter Connection Configuration * conf-debug:: Logging and Debugging configuration * conf-timeout:: Timeout Configuration * conf-callout:: Call-out Configuration * conf-priv:: Privilege Configuration * conf-database:: Database Configuration * conf-runtime:: Runtime Constants * conf-mailutils:: Standard Mailutils Statements 'Mailfromd' Command Line Syntax * options:: Command Line Options. * Starting and Stopping:: How to Start and Shut Down the Daemon. Command Line Options. * Operation Modifiers:: * General Settings:: * Preprocessor Options:: * Timeout Control:: * Logging and Debugging Options:: * Informational Options:: Using 'mailfromd' with Various MTAs * Sendmail:: * MeTA1:: * Postfix:: 'calloutd' * config-calloutd:: Calloutd Configuration. * invocation-calloutd:: Calloutd Command-Line Options. * protocol-calloutd:: The Callout Protocol. Calloutd Configuration * conf-calloutd-setup:: 'calloutd' General Setup. * conf-calloutd-server:: The 'server' Statement. * conf-calloutd-log:: 'calloutd' Logging. 'mfdbtool' * Invoking mfdbtool:: * Configuring mfdbtool:: 'mtasim' -- a testing tool * interactive mode:: * expect commands:: * traces:: * daemon mode:: * command summary:: * option summary:: Pmilter multiplexer program. * pmult configuration:: * pmult example:: * pmult invocation:: Pmult Configuration * pmult-conf:: Multiplexer Configuration. * pmult-macros:: Translating MeTA1 macros. * pmult-client:: Pmult Client Configuration. * pmult-debug:: Debugging Pmult. Upgrading * 870-880:: Upgrading from 8.7 to 8.8 * 850-860:: Upgrading from 8.5 to 8.6 * 820-830:: Upgrading from 8.2 to 8.3 (or 8.4) * 700-800:: Upgrading from 7.0 to 8.0 * 600-700:: Upgrading from 6.0 to 7.0 * 5x0-600:: Upgrading from 5.x to 6.0 * 500-510:: Upgrading from 5.0 to 5.1 * 440-500:: Upgrading from 4.4 to 5.0 * 43x-440:: Upgrading from 4.3.x to 4.4 * 420-43x:: Upgrading from 4.2 to 4.3.x * 410-420:: Upgrading from 4.1 to 4.2 * 400-410:: Upgrading from 4.0 to 4.1 * 31x-400:: Upgrading from 3.1.x to 4.0 * 30x-31x:: Upgrading from 3.0.x to 3.1 * 2x-30x:: Upgrading from 2.x to 3.0.x * 1x-2x:: Upgrading from 1.x to 2.x  File: mailfromd.info, Node: Preface, Next: Intro, Prev: Top, Up: Top Preface ******* Simple Mail Transfer Protocol (SMTP) which is the standard for email transmissions across the Internet was designed in the good old days when nobody could even think of the possibility of e-mail being abused to send tons of unsolicited messages of dubious contents. Therefore it lacks mechanisms that could have prevented this abuse ("spamming"), or at least could have made it difficult. Attempts to introduce such mechanisms (such as SMTP-AUTH extension (http://tools.ietf.org/html/rfc2554)) are being made, but they are not in wide use yet and, probably, their introduction will not be enough to stop the e-mail abuse. Spamming is today's grim reality and developers spend lots of time and efforts designing new protection measures against it. 'Mailfromd' is one of such attempts. The package is designed to work with any MTA supporting 'Milter' or 'Pmilter' protocol, such as 'Sendmail', 'MeTA1' or 'Postfix'. It allows you to: * Control whether messages come from trustworthy senders, using so called "callout" or "Sender Address Verification" (*note SAV::) mechanism. * Prevent emails coming from forged addresses by use of SPF mechanism (*note SPF Functions::). * Limit connection and/or sending rates (*note Rate Limit::). * Use "black-", "white-" and "greylisting" techniques. * Invoke external programs or other mail filters. * Menu: * History:: Short 'mailfromd' history. * Acknowledgments:: Acknowledgments.  File: mailfromd.info, Node: History, Next: Acknowledgments, Up: Preface Short history of 'mailfromd'. ============================= The idea of the utility appeared in 2005, and its first version appeared soon afterward. Back then it was a simple implementation of Sender Address Verification (*note SAV::) for 'Sendmail' (hence its name - 'mailfromd') with rudimentary tuning possibilities. After a short run on my mail servers, I discovered that the utility was not flexible enough. It took less than a month to implement a configuration file that allowed the user to control program and data flow during the 'envfrom' SMTP state. The new version, 1.0, appeared in June, 2005. Next major release, 1.2 (1.1 contained mostly bugfixes), appeared two months later, and introduced "mail sending rate" control (*note Rate Limit::). The program evolved during the next year, and the version 2.0 was released in September, 2006. This version was a major change in the main idea of the program. Configuration file become a flexible filter script allowing the operator to control almost all SMTP states. The program supplied in the script file was compiled into a pseudo-code at startup, this code being subsequently evaluated each time the filter was invoked. This caused a considerable speed-up in comparison with the previous versions, where the run-time evaluator was traversing the parse tree. This version also introduced (implicitly, at the time), two separate data types for the entities declared in the script, which also played its role in the speed improvement (in the previous versions all data were considered strings). Lots of improvements were made in the filter language (MFL, *note MFL::) itself, such as user-defined functions, the 'switch' statement, the 'catch' statement for handling run-time errors, etc. The set of built-in functions extended considerably. A testsuite (using DejaGNU) was introduced in this version. During this initial development period the limitations imposed by 'libmilter' implementation became obvious. Finally, I felt they were stopping further development, and decided that 'mailfromd' should use its own 'Milter' implementation. This new library, 'libgacopyz' was the main new feature of the 3.0 release, which was released in November, 2006. Another major feature was the '--dump-macros' option and 'macros' to 'rc.mailfromd' script, that were intended to facilitate the configuration on 'Sendmail' side. The development of 3.x (more properly, 3.1.x) series concentrated mainly on bug-fixes, while the main development was done on the next branch. The version 4.0 appeared on May 12, 2007. A full list of changes in this release is more than 500 lines long, so it is impractical to list them here. In particular, this version introduced lots of new features in MFL syntax and the library of useful MFL functions. The runtime engine was also improved, in particular, stack space become expandable which eliminated many run-time errors. This version also provided a foundation for MFL module system. The code generation was re-implemented to facilitate introduction of object files in future versions. Another new features in this release include SPF support and 'mtasim' utility -- an MTA simulator designed for testing 'mailfromd' scripts (*note mtasim::). The test suite in this version was made portable by rewriting it in Autotest. Another big leap forward was the 5.0 release, which appeared on December 26, 2008. It largely enriched a set of available functions (61 new functions were introduced, which amounts to 41% of all the available functions in 5.0 release) and introduced several improvements in the MFL itself. Among others, function aliases and optional arguments in user-defined functions were introduced in this release. The new "run operation mode" allowed to execute arbitrary MFL functions from the command line. This release also raised the Mailutils version requirements to at least 2.0. Version 6.0, which was released in on 12 December, 2009, introduced a full-fledged modular system, akin to that of Python, and quite a few improvements to the language. such as explicit type casts, concatenation operator, static variables, etc. Starting from version 7.0, the focus of further development of 'mailfromd' has shifted. While previously it had been regarded as a mail-filtering server, since then it was developed as a system for extending MTA functionality in the broad sense, mail filtering being only one of features it provides. Version 7.0 makes the MFL syntax more consistent and the language itself more powerful. For example, it is no longer necessary to use prefixes before variables to dereference them. The new 'try--catch' construct allows for elegant handling of exceptions and errors. User-defined exceptions provide a way for programming complex loops and recursions with non-local exits. This version introduces a concept of dedicated callout server. This allows 'mailfromd' to defer verifications for a later time if the remote server does not response within a reasonably short period of time (*note SMTP Timeouts::). Six years later the version 8.0 was released. This version was a major rewrite of the mailfromd codebase. It introduced a separate callout daemon that made it possible to separate the mailfromd server machine from machines performing callout checks. The MFL language was extended by a number of built-in functions. Since version 8.3 (2017-11-02) 'mailfromd' uses 'adns'(1) for DNS queries. The version 8.7 released in July, 2020 introduced DKIM support. ---------- Footnotes ---------- (1)  File: mailfromd.info, Node: Acknowledgments, Prev: History, Up: Preface Acknowledgments =============== Many people need to be thanked for their assistance in developing and debugging 'mailfromd'. After S. C. Johnson, I can say that this program "owes much to a most stimulating collection of users, who have goaded me beyond my inclination, and frequently beyond my ability in their endless search for "one more feature". Their irritating unwillingness to learn how to do things my way has usually led to my doing things their way; most of the time, they have been right." A real test for a program like 'mailfromd' cannot be done but in conditions of production environment. A decision to try it in these conditions is by no means an easy one, it requires courage and good faith in the intentions and abilities of the author. To begin with, I would like to thank my contributors for these virtues. Jan Rafaj has intrepidly been using 'mailfromd' since its early releases and invested lots of efforts in improving the program and its documentation. He is the author of many of the MFL library functions, shipped with the package. Some of his ideas are still waiting in my implementation queue, while new ones are consistently arriving. Peter Markeloff patiently tested every 'mailfromd' release and helped discover and fix many bugs. Zeus Panchenko contributed many ideas and gave lots of helpful comments. He offered invaluable help in debugging and testing 'mailfromd' on FreeBSD platform. Sergey Afonin proposed many improvements and new ideas. He also invested a lot of his time in finding bugs and testing bugfixes. John McEleney and Ben McKeegan contributed the token bucket filter implementation (*note TBF::). Con Tassios helped to find and fix various bugs and contributed the new implementation of the 'greylist' function (*note greylisting types::). The following people (in alphabetical order) provided bug reports and helpful comments for various versions of the program: Alan Dobkin, Brent Spencer, Jeff Ballard, Nacho González López, Phil Miller, Simon Christian, Thomas Lynch.  File: mailfromd.info, Node: Intro, Next: Building, Prev: Preface, Up: Top 1 Introduction to 'mailfromd' ***************************** 'Mailfromd' is a general-purpose mail filtering daemon and a suite of accompanying utilities for 'Sendmail'(1), 'MeTA1'(2), 'Postfix'(3) or any other MTA that supports 'Milter' (or 'Pmilter') protocol. It is able to filter both incoming and outgoing messages using a filter program, written in "mail filtering language" (MFL). The daemon interfaces with the MTA using 'Milter' protocol. The name 'mailfromd' can be thought of as an abbreviation for '_Mail_ _F_iltering and _R_untime _M_odification' _D_aemon, with an 'o' for itself. Historically, it stemmed from the fact that the original implementation was a simple filter implementing the "sender address verification" technique. Since then the program has changed dramatically, and now it is actually a language translator and run-time evaluator providing a set of built-in and library functions for filtering electronic mail. The first part of this manual is an overview, describing the features 'mailfromd' offers in general. The second part is a tutorial, which provides an introduction for those who have not used 'mailfromd' previously. It moves from topic to topic in a logical, progressive order, building on information already explained. It offers only the principal information needed to master basic practical usage of 'mailfromd', while omitting many subtleties. The other parts are meant to be used as a reference for those who know 'mailfromd' well enough, but need to look up some notions from time to time. Each chapter presents everything that needs to be said about a specific topic. The manual assumes that the reader has a good knowledge of the SMTP protocol and the mail transport system he uses ('Sendmail' , 'Postfix' or 'MeTA1'). * Menu: * Conventions:: Typographical conventions. * Overview:: Mailfromd at a first glance * SAV:: Principles of Sender Address Verification. * Rate Limit:: Controlling Mail Sending Rate. * SPF:: SPF, DKIM, and others. ---------- Footnotes ---------- (1) See (2) See (3) See  File: mailfromd.info, Node: Conventions, Next: Overview, Up: Intro 1.1 Typographical conventions ============================= This manual is written using Texinfo, the GNU documentation formatting language. The same set of Texinfo source files is used to produce both the printed and online versions of the documentation. This section briefly documents the typographical conventions used in this manual. Examples you would type at the command line are preceded by the common shell primary prompt, '$'. The command itself is printed 'in this font', and the output it produces 'in this font', for example: $ mailfromd --version mailfromd (mailfromd 8.8) In the text, the command names are printed 'like this', command line options are displayed in 'this font'. Some notions are emphasized _like this_, and if a point needs to be made strongly, it is done *this way*. The first occurrence of a new term is usually its "definition" and appears in the same font as the previous occurrence of "definition" in this sentence. File names are indicated like this: '/path/to/ourfile'. The variable names are represented LIKE THIS, keywords and fragments of program text are written in 'this font'.  File: mailfromd.info, Node: Overview, Next: SAV, Prev: Conventions, Up: Intro 1.2 Overview of Mailfromd ========================= In contrast to the most existing milter filters, 'mailfromd' does not implement any default filtering policies. Instead, it depends entirely on a "filter script", supplied to it by the administrator. The script, written in a specialized and simple to use language, called MFL (*note MFL::), is supposed to run a set of tests and to decide whether the message should be accepted by the MTA or not. To perform the tests, the script can examine the values of 'Sendmail' macros, use an extensive set of built-in and library functions, and invoke user-defined functions.  File: mailfromd.info, Node: SAV, Next: Rate Limit, Prev: Overview, Up: Intro 1.3 Sender Address Verification. ================================ "Sender address verification", or "callout", is one of the basic mail verification techniques, implemented by 'mailfromd'. It consists in probing each MX server for the given address, until one of them gives a definite (positive or negative) reply. Using this technique you can block a sender address if it is not deliverable, thereby cutting off a large amount of spam. It can also be useful to block mail for undeliverable recipients, for example on a mail relay host that does not have a list of all the valid recipient addresses. This prevents undeliverable junk mail from entering the queue, so that your MTA doesn't have to waste resources trying to send 'MAILER-DAEMON' messages back. Let's illustrate how it works on an example: Suppose that the user '' is trying to send mail to one of your local users. The remote machine connects to your MTA and issues 'MAIL FROM: ' command. However, your MTA does not have to take its word for it, so it uses 'mailfromd' to verify the sender address validity. 'Mailfromd' strips the domain name from the address ('somedomain.net') and queries DNS about 'MX' records for that domain. Suppose, it receives the following list 10 relay1.somedomain.net 20 relay2.somedomain.net It then connects to first MX server, using SMTP protocol, as if it were going to send a message to ''. This is called sending a "probe message". If the server accepts the recipient address, the 'mailfromd' accepts the incoming mail. Otherwise, if the server rejects the address, the mail is rejected as well. If the MX server cannot be connected, 'mailfromd' selects next server from the list and continues this process until it finds the answer or the list of servers is exhausted. The "probe message" is like a normal mail except that no data are ever being sent. The probe message transaction in our example might look as follows ('S:' meaning messages sent by remote MTA, 'C:' meaning those sent by 'mailfromd'): C: HELO mydomain.net S: 220 OK, nice to meet you C: MAIL FROM: <> S: 220 <>: Sender OK C: RCPT TO: S: 220 : Recipient OK C: QUIT Probe messages are never delivered, deferred or bounced; they are always discarded. The described method of address verification is called a "standard" method throughout this document. 'Mailfromd' also implements a method we call "strict". When using strict method, 'mailfromd' first resolves IP address of sender machine to a fully qualified domain name. Then it obtains 'MX' records for this machine, and then proceeds with probing as described above. So, the difference between the two methods is in the set of 'MX' records that are being probed: standard method queries 'MX's based on the sender email domain, strict method works with 'MX's for the sender IP address. Strict method allows to cut off much larger amount of spam, although it does have many drawbacks. Returning to our example above, consider the following situation: '' is a perfectly normal address, but it is being used by a spammer from some other domain, say 'otherdomain.com'. The standard method is not able to cope with such cases, whereas the strict one is. An alert reader will ask: what happens if 'mailfromd' is not able to get a definite answer from any of MX servers? Actually, it depends entirely on how you will instruct it to act in this case, but the general practice is to return temporary failure, which will urge the remote party to retry sending their message later. After receiving a definite answer, 'mailfromd' will cache it in its database, so that next time your MTA receives a message from that address (or from the sender IP/email address pair, for strict method), it will not waste its time trying to reach MX servers again. The records remain in the cache database for a certain time, after which they are discarded. * Menu: * Limitations::  File: mailfromd.info, Node: Limitations, Up: SAV 1.3.1 Limitations of Sender Address Verification ------------------------------------------------ Before deciding whether and how to use sender address verification, you should be aware of its limitations. Both standard and strict methods suffer from the following limitations: * The sender verification methods will perform poorly on highly loaded sites. The traffic and/or resource usage overhead may not be feasible for you. However, you may experiment with various 'mailfromd' options to find an optimal configuration. * Some sites may blacklist your MTA if it probes them too often. 'Mailfromd' eliminates this drawback by using a "cache database", which keeps results of the recent callouts. * When verifying the remote address, no attempt to actually deliver the message is made. If MTA accepts the address, 'mailfromd' assumes it is OK. However in reality, a mail for a remote address can bounce _after_ the nearest MTA accepts the recipient address. This drawback can often be avoided by combining sender address verification with greylisting (*note Greylisting::). * If the remote server rejects the address, no attempt is being made to discern between various reasons for rejection (client rejected, 'HELO rejected', 'MAIL FROM' rejected, etc.) * Some major sites such as 'yahoo.com' do not reject unknown addresses in reply to the 'RCPT TO' command, but report a delivery failure in response to end of 'DATA' after a message is transferred. Of course, sender address verification does not work with such sites. However, a combination of address verification and greylisting (*note Greylisting::) may be a good choice in such cases. In addition, strict verification breaks forward mail delivery. This is obvious, since mail forwarding is based on delivering unmodified message to another location, so the sender address domain will most probably not be the same as that of the MTA doing the forwarding.  File: mailfromd.info, Node: Rate Limit, Next: SPF, Prev: SAV, Up: Intro 1.4 Controlling Mail Sending Rate. ================================== "Mail Sending Rate" for a given identity is defined as the number of messages with this identity received within a predefined interval of time. MFL offers a set of functions for limiting mail sending rate (*note Rate limiting functions::), and for controlling broader rate aspects, such as data transfer rates (*note TBF::).  File: mailfromd.info, Node: SPF, Prev: Rate Limit, Up: Intro 1.5 SPF, DKIM, and others ========================= "Sender Policy Framework", or SPF for short, is an extension to SMTP protocol that allows to identify forged identities supplied with the 'MAIL FROM' and 'HELO' commands. The framework is explained in detail in RFC 4408 () and on the SPF Project Site (http://www.openspf.org/). Mailfromd provides a set of functions for using SPF to control mail flow. These are described in *note SPF Functions::. "DomainKeys Identified Mail" (DKIM) is an email authentication method designed to detect forged sender addresses in emails. Mailfromd supports both DKIM signing and verification. *Note DKIM::, for a detailed description of these features. Mailfromd also provides support for several third-party spam-abatement programs, in particular 'SpamAssassin', 'ClamAV', and DSPAM. These are discussed in *note Interfaces to Third-Party Programs::.  File: mailfromd.info, Node: Building, Next: Tutorial, Prev: Intro, Up: Top 2 Building the Package ********************** This chapter contains a detailed list of steps you need to undertake in order to configure and build the package. 1. Make sure you have the necessary software installed. To build 'mailfromd' you will need to have following packages on your machine: A. GNU mailutils version 3.3 or newer. GNU mailutils is a general-purpose library for handling electronic mail. It is available from . B. GNU adns library, version 1.5.1 or newer. GNU adns is an advanced DNS client library. The recent version can be downloaded from . Visit , for more information. C. A DBM library. 'Mailfromd' is able to link with any flavor of DBM supported by GNU mailutils. As of version 8.8 it will refuse to build without DBM. By default, 'configure' will try to find the best implementation installed on your machine (preference is given to Berkeley DB) and will use it. You can, however, explicitly specify which implementation you want to use. To do so, use the '--with-dbm' configure option. Its argument specifies the "type" of database to use. It must be one of the types supported by GNU mailutils. At the time of this writing, these are: bdb Berkeley DB (versions 2 to 6). gdbm GNU DBM. kc Kyoto Cabinet tc Tokyo Cabinet ndbm NDBM To check what database types are supported by your version of mailutils, run the following command: $ mailutils dbd gdbm kc tc ndbm For backward compatibility, 'configure' accepts the following two options: '--with-gdbm' Same as '--with-dbm=gdbm'. '--with-berkeley-db' Same as '--with-dbm=bdb'. For 'Sendmail' users, it often makes sense to configure 'mailfromd' to use the same database flavor as 'sendmail'. The following table will help you do that. The column 'DB type' lists types of DBM databases supported by 'mailfromd'. The column 'confMAPDEF' lists the value of 'confMAPDEF' Sendmail configuration macro corresponding to that database type. The column 'configure option' contains the corresponding option to configure. DB type confMAPDEF configure option --------------------------------------------------------------------------- NDBM '-NNDBM' '--with-dbm=ndbm' Berkeley DB '-NNEWDB' '--with-dbm=bdb' GDBM N/A '--with-dbm=gdbm' 2. Decide what user privileges will be used to run 'mailfromd' After startup, the program drops root privileges. By default, it switches to the privileges of user 'mail', group 'mail'. If there is no such user on your system, or you wish to use another user account for this purpose, override it using DEFAULT_USER environment variable. For example for 'mailfromd' to run as user 'nobody', use ./configure DEFAULT_USER=nobody The user name can also be changed at run-time (*note --user::). 3. Decide where to install 'mailfromd' and where its filter script and data files will be located. As usual, the default value for the installation prefix is '/usr/local'. If it does not suit you, specify another location using '--prefix' option, e.g.: '--prefix=/usr'. During installation phase, the build system will install several files. These files are: 'PREFIX/sbin/mailfromd' Main daemon. *Note mailfromd: Invocation. 'PREFIX/etc/mailfromd.mf' Default main filter script file. It is installed only if it is not already there. Thus, if you are upgrading to a newer version of 'mailfromd', your old script file will be preserved with all your changes. *Note MFL::, for a description of the mail filtering language. 'PREFIX/share/mailfromd/8.8/*.mf' MFL modules. *Note Modules::. 'PREFIX/info/mailfromd.info*' Documentation files. 'PREFIX/bin/mtasim' MTA simulator program for testing 'mailfromd' scripts. *Note mtasim::. 'PREFIX/sbin/pmult' Pmilter multiplexor for 'MeTA1'. *Note pmult::. It is build only if 'MeTA1' version 'PreAlpha29.0' or newer is installed on the system. You may disable it by using the '--disable-pmilter' command line option. When testing for 'MeTA1' presence, 'configure' assumes its default location. If it is not found there, inform 'configure' about its actual location by using the following option: --enable-pmilter=PREFIX where PREFIX stands for the 'MeTA1' installation prefix. It is advisable to use the same settings for file name prefixes as those you used when configuring 'mailutils'. In particular, try to use the same '--sysconfdir', since it will facilitate configuring the whole system. Another important point is location of "local state directory", i.e. a directory where 'mailfromd' keeps its data files (e.g. communication socket, PID-file and database files). By default, its full name is 'LOCALSTATEDIR/mailfromd'. You can change it by setting 'DEFAULT_STATE_DIR' configuration variable. This value can be changed at run-time using the 'state-directory' configuration statement (*note state-directory: conf-base.). 4. Select default communication socket. This is the socket used to communicate with MTA, in the usual 'Milter' port notation (*note milter port specification::). If the socket name does not begin with a protocol or directory separator, it is assumed to be a UNIX socket, located in the local state directory. The default value is 'mailfrom', which is equivalent to 'unix:LOCALSTATEDIR/mailfromd/mailfrom'. To alter this, use 'DEFAULT_SOCKET' environment variable, e.g.: ./configure DEFAULT_SOCKET=inet:999@localhost The communication socket can be changed at run time using '--port' command line option (*note --port::) or the 'listen' configuration statement (*note listen: conf-server.). 5. Select default expiration interval. "Expiration interval" defines the period of time during which a record in the 'mailfromd' database is considered valid. It is described in more detail in *note Databases::. The default value is 86400 seconds, i.e. 24 hours. It is OK for most sites. If, however, you wish to change it, use DEFAULT_EXPIRE_INTERVAL environment variable. The 'DEFAULT_EXPIRE_RATES_INTERVAL' variable sets default expiration time for mail rate database (*note Rate limiting functions::). Expiration settings can be changed at run time using 'database' statement in the 'mailfromd' configuration file (*note conf-database::). 6. Select a 'syslog' implementation to use. 'Mailfromd' uses 'syslog' for diagnostics output. The default 'syslog' implementation on most systems (most notably, on GNU/Linux) uses blocking 'AF_UNIX SOCK_DGRAM' sockets. As a result, when an application calls 'syslog()', and 'syslogd' is not responding and the socket buffers get full, the application will hang. For 'mailfromd', as for any daemon, it is more important that it continue to run, than that it continue to log. For this purpose, 'mailfromd' is shipped with a non-blocking 'syslog' implementation by Simon Kelley. This implementation, instead of blocking, buffers log lines in memory. When the buffer log overflows, some lines are lost, but the daemon continues to run. When lines are lost, this fact is logged with a message of the form: async_syslog overflow: 5 log entries lost To enable this implementation, configure the package with '--enable-syslog-async' option, e.g.: ./configure --enable-syslog-async Additionally, you can instruct 'mailfromd' to use asynchronous syslog by default. To do so, set 'DEFAULT_SYSLOG_ASYNC' to 1, as shown in example below: ./configure --enable-syslog-async DEFAULT_SYSLOG_ASYNC=1 You will be able to override these defaults at run-time by using the '--logger' command line option (*note Logging and Debugging::). 7. Run 'configure' with all the desired options. For example, the following command: ./configure DEFAULT_SOCKET=inet:999@localhost --with-berkeley-db=3 will configure the package to use Berkeley DB database, version 2, and 'inet:999@localhost' as the default communication socket. At the end of its run 'configure' will print a concise summary of its configuration settings. It looks like that (with the long lines being split for readability): ******************************************************************* Mailfromd configured with the following settings: External preprocessor..................... /usr/bin/m4 -s DBM version............................... Berkeley DB v. 3 Default user.............................. mail State directory........................... $(localstatedir)/$(PACKAGE) Socket.................................... mailfrom Expiration interval....................... 86400 Negative DNS answer expiration interval... 3600 Rates expire interval..................... 300 Default syslog implementation............. blocking Readline (for mtasim)..................... yes Documentation rendition type.............. PROOF Enable pmilter support.................... no Enable GeoIP support...................... no ******************************************************************* Make sure these settings satisfy your needs. If they do not, reconfigure the package with the right options. 8. Run 'make'. 9. Run 'make' install. 10. Make sure 'LOCALSTATEDIR/mailfromd' has the right owner and mode. 11. Examine filter script file ('SYSCONFDIR/mailfromd.mf') and edit it, if necessary. 12. If you are upgrading from an earlier release of Mailfromd, refer to *note Upgrading::, for detailed instructions.  File: mailfromd.info, Node: Tutorial, Next: MFL, Prev: Building, Up: Top 3 Tutorial ********** This chapter contains a tutorial introduction, guiding you through various 'mailfromd' configurations, starting from the simplest ones and proceeding up to more advanced forms. It omits most complicated details, concentrating mainly on the common practical tasks. If you are familiar to 'mailfromd', you can skip this chapter and go directly to the next one (*note MFL::), which contains detailed discussion of the mail filtering language and 'mailfromd' interaction with the Mail Transport Agent. * Menu: * Start Up:: * Simplest Configurations:: * Conditional Execution:: * Functions and Modules:: * Domain Name System:: * Checking Sender Address:: * SMTP Timeouts:: * Avoiding Verification Loops:: * HELO Domain:: * rset:: * Controlling Number of Recipients:: * Sending Rate:: * Greylisting:: * Local Account Verification:: * Databases:: * Testing Filter Scripts:: * Run Mode:: * Logging and Debugging:: * Runtime errors:: * Notes::  File: mailfromd.info, Node: Start Up, Next: Simplest Configurations, Up: Tutorial 3.1 Start Up ============ The 'mailfromd' utility runs as a standalone "daemon" program and listens on a predefined communication channel for requests from the "Mail Transfer Agent" (MTA, for short). When processing each message, the MTA installs communication with 'mailfromd', and goes through several states, collecting the necessary data from the sender. At each state it sends the relevant information to 'mailfromd', and waits for it to reply. The 'mailfromd' filter receives the message data through "Sendmail macros" and runs a "handler program" defined for the given state. The result of this run is a "response code", that it returns to the MTA. The following response codes are defined: 'continue' Continue message processing. 'accept' Accept this message for delivery. After receiving this code the MTA continues processing this message without further consulting 'mailfromd' filter. 'reject' Reject this message. The message processing stops at this stage, and the sender receives the reject reply ('5XX' reply code). No further 'mailfromd' handlers are called for this message. 'discard' Silently discard the message. This means that MTA will continue processing this message as if it were going to deliver it, but will discard it after receiving. No further interaction with 'mailfromd' occurs. 'tempfail' Temporarily reject the message. The message processing stops at this stage, and the sender receives the 'temporary failure' reply ('4XX' reply code). No further 'mailfromd' handlers are called for this message. The instructions on how to process the message are supplied to 'mailfromd' in its "filter script file". It is normally called '/usr/local/etc/mailfromd.mf' (but can be located elsewhere, *note Invocation::) and contains a set of "milter state handlers", or subroutines to be executed in various SMTP states. Each interaction state can be supplied its own handling procedure. A missing procedure implies 'continue' response code. The filter script can define up to nine "milter state handlers", called after the names of milter states: 'connect', 'helo', 'envfrom', 'envrcpt', 'data', 'header', 'eoh', 'body', and 'eom'. The 'data' handler is invoked only if MTA uses Milter protocol version 3 or later. Two special handlers are available for initialization and clean-up purposes: 'begin' is called before the processing starts, and 'end' is called after it is finished. The diagram below shows the control flow when processing an SMTP transaction. Lines marked with 'C:' show SMTP commands issued by the remote machine (the "client"), those marked with '=>' show called handlers with their arguments. An '[R]' appearing at the start of a line indicates that this part of the transaction can be repeated any number of times: => begin() => connect(HOSTNAME, FAMILY, PORT, 'IP address') C: HELO DOMAIN helo(DOMAIN) for each message transaction do C: MAIL FROM SENDER => envfrom(SENDER) [R] C: RCPT TO RECIPIENT => envrcpt(RECIPIENT) C: DATA => data() [R] C: HEADER: VALUE => header(HEADER, VALUE) C: => eoh() [R] C: BODY-LINE => /* Collect lines into blocks BLK of => * at most LEN bytes and for each => * such block call: => */ => body(BLK, LEN) C: . => eom() done => end() Figure 3.1: Mailfromd Control Flow This control flow is maintained for as long as each called handler returns 'continue' (*note Actions::). Otherwise, if any handler returns 'accept' or 'discard', the message processing continues, but no other handler is called. In the case of 'accept', the MTA will accept the message for delivery, in the case of 'discard' it will silently discard it. If any of the handlers returns 'reject' or 'tempfail', the result depends on the handler. If this code is returned by 'envrcpt' handler, it causes this particular recipient address to be rejected. When returned by any other handler, it causes the whole message will be rejected. The 'reject' and 'tempfail' actions executed by 'helo' handler do not take effect immediately. Instead, their action is deferred until the next SMTP command from the client, which is usually 'MAIL FROM'.  File: mailfromd.info, Node: Simplest Configurations, Next: Conditional Execution, Prev: Start Up, Up: Tutorial 3.2 Simplest Configurations =========================== The 'mailfromd' script file contains a series of "declarations" of the handler procedures. Each declaration has the form: prog NAME do ... done where 'prog', 'do' and 'done' are the "keywords", and NAME is the state name for this handler. The dots in the above example represent the actual "code", or a set of commands, instructing 'mailfromd' how to process the message. For example, the declaration: prog envfrom do accept done installs a handler for 'envfrom' state, which always approves the message for delivery, without any further interaction with 'mailfromd'. The word 'accept' in the above example is an "action". "Action" is a special language statement that instructs the run-time engine to stop execution of the program and to return a response code to the 'Sendmail'. There are five actions, one for each response code: 'continue', 'accept', 'reject', 'discard', and 'tempfail'. Among these, 'reject' and 'discard' can optionally take one to three arguments. There are two ways of supplying the arguments. In the first form, called "literal" or "traditional" notation, the arguments are supplied as additional words after the action name, separated by whitespace. The first argument is a three-digit RFC 2821 reply code. It must begin with '5' for 'reject' and with '4' for 'tempfail'. If two arguments are supplied, the second argument must be either an "extended reply code" (RFC 1893/2034) or a textual string to be returned along with the SMTP reply. Finally, if all three arguments are supplied, then the second one must be an extended reply code and the third one must supply the textual string. The following examples illustrate all possible ways of using the 'reject' statement in literal notation: reject reject 503 reject 503 5.0.0 reject 503 "Need HELO command" reject 503 5.0.0 "Need HELO command" Please note the quotes around the textual string. Another form for these action is called "functional" notation, because it resembles the function syntax. When used in this form, the action word is followed by a parenthesized group of exactly three arguments, separated by commas. The meaning and ordering of the argument is the same as in literal form. Any of three arguments may be absent, in which case it will be replaced by the default value. To illustrate this, here are the statements from the previous example, written in functional notation: reject(,,) reject(503,,) reject(503, 5.0.0) reject(503,, "Need HELO command") reject(503, 5.0.0, "Need HELO command")  File: mailfromd.info, Node: Conditional Execution, Next: Functions and Modules, Prev: Simplest Configurations, Up: Tutorial 3.3 Conditional Execution ========================= Programs consisting of a single action are rarely useful. In most cases you will want to do some checking and decide whether to process the message depending on its result. For example, if you do not want to accept messages from the address '', you could write the following program: prog envfrom do if $f = "badguy@some.net" reject else accept fi done This example illustrates several important concepts. First or all, '$f' in the third line is a "Sendmail macro reference". Sendmail macros are referenced the same way as in 'sendmail.cf', with the only difference that curly braces around macro names are optional, even if the name consists of several letters. The value of a macro reference is always a string. The equality operator ('=') compares its left and right arguments and evaluates to true if the two strings are exactly the same, or to false otherwise. Apart from equality, you can use the regular relational operators: '!=', '>', '>=', '<' and '<='. Notice that string comparison in 'mailfromd' is always case sensitive. To do case-insensitive comparison, translate both operands to upper or lower case (*Note tolower::, and *note toupper::). The 'if' statement decides what actions to execute depending on the value its condition evaluates to. Its usual form is: if EXPRESSION THEN-BODY [else ELSE-BODY] fi The THEN-BODY is executed if the EXPRESSION evaluates to 'true' (i.e. to any non-zero value). The optional ELSE-BODY is executed if the EXPRESSION yields 'false' (i.e. zero). Both THEN-BODY and ELSE-BODY can contain other 'if' statements, their nesting depth is not limited. To facilitate writing complex conditional statements, the 'elif' keyword can be used to introduce alternative conditions, for example: prog envfrom do if $f = "badguy@some.net" reject elif $f = "other@domain.com" tempfail 470 "Please try again later" else accept fi done *Note switch::, for more elaborate forms of conditional branching.  File: mailfromd.info, Node: Functions and Modules, Next: Domain Name System, Prev: Conditional Execution, Up: Tutorial 3.4 Functions and Modules ========================= As any programming language, MFL supports a concept of "function", i.e. a body of code that is assigned a unique name and can be invoked elsewhere as many times as needed. All functions have a "definition" that introduces types and names of the formal parameters and the result type, if the function is to return a meaningful value (function definitions in MFL are discussed in detail in *note User-Defined Functions: User-defined.). A function is invoked using a special construct, a "function call": NAME (ARG-LIST) where NAME is the function name, and ARG-LIST is a comma-separated list of expressions. Each expression in ARG-LIST is evaluated, and its type is compared with that of the corresponding formal argument. If the types differ, the expression is converted to the formal argument type. Finally, a copy of its value is passed to the function as a corresponding argument. The order in which the expressions are evaluated is not defined. The compiler checks that the number of elements in ARG-LIST match the number of mandatory arguments for function NAME. If the function does not deliver a result, it should only be called as a statement. Functions may be recursive, even mutually recursive. 'Mailfromd' comes with a rich set of predefined functions for various purposes. There are two basic function classes: "built-in" functions, that are implemented by the MFL runtime environment in 'mailfromd', and "library" functions, that are implemented in MFL. The built-in functions are always available and no preparatory work is needed before calling them. In contrast, the library functions are defined in "modules", special MFL source files that contain functions designed for a particular task. In order to access a library function, you must first "require" a module it is defined in. This is done using 'require' statement. For example, the function 'hostname' looks up in the DNS the name corresponding to the IP address specified as its argument. This function is defined in module 'dns.mf', so before calling it you must require this module: require dns The 'require' statement takes a single argument: the name of the requested module (without the '.mf' suffix). It looks up the module on disk and loads it if it is available. For more information about the module system *Note Modules::.  File: mailfromd.info, Node: Domain Name System, Next: Checking Sender Address, Prev: Functions and Modules, Up: Tutorial 3.5 Domain Name System ====================== Site administrators often do not wish to accept mail from hosts that do not have a proper reverse delegation in the Domain Name System. In the previous section we introduced the library function 'hostname', that looks up in the DNS the name corresponding to the IP address specified as its argument. If there is no corresponding name, the function returns its argument unchanged. This can be used to test if the IP was resolved, as illustrated in the example below: require 'dns' prog envfrom do if hostname($client_addr) = $client_addr reject fi done The '#require dns' statement loads the module 'dns.mf', after which the definition of 'hostname' becomes available. A similar function, 'resolve', which resolves the symbolic name to the corresponding IP address is provided in the same 'dns.mf' module.  File: mailfromd.info, Node: Checking Sender Address, Next: SMTP Timeouts, Prev: Domain Name System, Up: Tutorial 3.6 Checking Sender Address =========================== A special language construct is provided for verification of sender addresses ("callout"): on poll $f do when success: accept when not_found or failure: reject 550 5.1.0 "Sender validity not confirmed" when temp_failure: tempfail 450 4.1.0 "Try again later" done The 'on poll' construct runs standard verification (*note standard verification::) for the email address specified as its argument (in the example above it is the value of the Sendmail macro '$f'). The check can result in the following conditions: 'success' The address exists. 'not_found' The address does not exist. 'failure' Some error of permanent nature occurred during the check. The existence of the address cannot be verified. 'temp_failure' Some temporary failure occurred during the check. The existence of the address cannot be verified at the moment. The 'when' branches of the 'on poll' statement introduce statements, that are executed depending on the actual return condition. If any condition occurs that is not handled within the 'on' block, the run-time evaluator will signal an "exception"(1) and return temporary failure, therefore it is advisable to always handle all four conditions. In fact, the condition handling shown in the above example is preferable for most normal configurations: the mail is accepted if the sender address is proved to exist and rejected otherwise. If a temporary failure occurs, the remote party is urged to retry the transaction some time later. The 'poll' statement itself has a number of options that control the type of the verification. These are discussed in detail in *note poll::. It is worth noticing that there is one special email address which is always available on any host, it is the "null address" '<>' used in error reporting. It is of no use verifying its existence: prog envfrom do if $f == "" accept else on poll $f do when success: accept when not_found or failure: reject 550 5.1.0 "Sender validity not confirmed" when temp_failure: tempfail 450 4.1.0 "Try again later" done fi done ---------- Footnotes ---------- (1) For more information about exceptions and their handling, please refer to *note Exceptions::.  File: mailfromd.info, Node: SMTP Timeouts, Next: Avoiding Verification Loops, Prev: Checking Sender Address, Up: Tutorial 3.7 SMTP Timeouts ================= When using polling functions, it is important to take into account possible delays, which can occur in SMTP transactions. Such delays may be due to low network bandwidth or high load on the remote server. Some sites impose them willingly, as a spam-fighting measure. Ideally the callout verification should use the timeout values defined in the RFC 2822, but this is impossible in practice, because it would cause a "timeout escalation", which consists in propagating delays encountered in a callout SMTP session back to the remote client whose session initiated the callout. Consider, for example, the following scenario. An MFL script performs a callout on 'envfrom' stage. The remote server is overloaded and delays heavily in responding, so that the initial response arrives 3 minutes after establishing the connection, and processing the 'EHLO' command takes another 3 minutes. These delays are OK according to the RFC, which imposes a 5 minute limit for each stage, but while waiting for the remote reply our SMTP server remains in the 'envfrom' state with the client waiting for a response to its 'MAIL' command more than 6 minutes, which is intolerable, because of the same 5 minute limit. Thus, the client will almost certainly break the session. To avoid this, 'mailfromd' uses a special instance, called "callout server", which is responsible for running callout SMTP sessions asynchronously. The usual sender verification is performed using so-called "soft" timeout values, which are set to values short enough to not disturb the incoming session (e.g. a timeout for 'HELO' response is 3 seconds, instead of 5 minutes). If this verification yields a definite answer, that answer is stored in the cache database and returned to the calling procedure immediately. If, however, the verification is aborted due to a timeout, the caller procedure is returned an 'e_temp_failure' exception, and the callout is scheduled for processing by a callout server. This exception normally causes the milter session to return a temporary error to the sender, urging it to retry the connection later. In the meantime, the callout server runs the sender verification again using another set of timeouts, called "hard" timeouts, which are normally much longer than 'soft' ones (they default to the values required by RFC 2822). If it gets a definitive result (e.g. 'email found' or 'email not found'), the server stores it in the cache database. If the callout ends due to a timeout, a 'not_found' result is stored in the database. Some time later, the remote server retries the delivery, and the 'mailfromd' script is run again. This time, the callout function will immediately obtain the already cached result from the database and proceed accordingly. If the callout server has not finished the request by the time the sender retries the connection, the latter is again returned a temporary error, and the process continues until the callout is finished. Usually, callout server is just another instance of 'mailfromd' itself, which is started automatically to perform scheduled SMTP callouts. It is also possible to set up a separate callout server on another machine. This is discussed in *note calloutd::. For a detailed information about callout timeouts and their configuration, see *note conf-timeout::. For a description of how to configure 'mailfromd' to use callout servers, see *note conf-server::.  File: mailfromd.info, Node: Avoiding Verification Loops, Next: HELO Domain, Prev: SMTP Timeouts, Up: Tutorial 3.8 Avoiding Verification Loops =============================== An 'envfrom' program consisting only of the 'on poll' statement will work smoothly for incoming mails, but will create infinite loops for outgoing mails. This is because upon sending an outgoing message 'mailfromd' will start the verification procedure, which will initiate an SMTP transaction with the same mail server that runs it. This transaction will in turn trigger execution of 'on poll' statement, etc. ad infinitum. To avoid this, any properly written filter script should not run the verification procedure on the email addresses in those domains that are relayed by the server it runs on. This can be achieved using 'relayed' function. The function returns 'true' if its argument is contained in one of the predefined "domain list" files. These files correspond to 'Sendmail' plain text files used in 'F' class definition forms (see 'Sendmail Installation and Operation Guide', chapter 5.3), i.e. they contain one domain name per line, with empty lines and lines started with '#' being ignored. The domain files consulted by 'relayed' function are defined in the 'relayed-domain-file' configuration file statement (*note relayed-domain-file: conf-base.): relayed-domain-file (/etc/mail/local-host-names, /etc/mail/relay-domains); or: relayed-domain-file /etc/mail/local-host-names; relayed-domain-file /etc/mail/relay-domains; The above example declares two domain list files, most commonly used in 'Sendmail' installations to keep hostnames of the server (1) and names of the domains, relayed by this server(2). Given all this, we can improve our filter program: require 'dns' prog envfrom do if $f == "" accept elif relayed(hostname(${client_addr})) accept else on poll $f do when success: accept when not_found or failure: reject 550 5.1.0 "Sender validity not confirmed" when temp_failure: tempfail 450 4.1.0 "Try again later" done fi done If you feel that your Sendmail's relayed domains are not restrictive enough for 'mailfromd' filters (for example you are relaying mails from some third-party servers), you can use a database of trusted mail server addresses. If the number of such servers is small enough, a single 'or' statement can be used, e.g.: elif ${client_addr} = "10.10.10.1" or ${client_addr} = "192.168.11.7" accept ... otherwise, if the servers' IP addresses fall within one or several CIDRs, you can use the 'match_cidr' function (*note Internet address manipulation functions::), e.g.: elif match_cidr (${client_addr}, "199.232.0.0/16") accept ... or combine both methods. Finally, you can keep a DBM database of relayed addresses and use 'dbmap' or 'dbget' function for checking (*note Database functions::). elif dbmap("%__statedir__/relay.db", ${client_addr}) accept ... ---------- Footnotes ---------- (1) class 'w', see 'Sendmail Installation and Operation Guide', chapter 5.2. (2) class 'R'  File: mailfromd.info, Node: HELO Domain, Next: rset, Prev: Avoiding Verification Loops, Up: Tutorial 3.9 HELO Domain =============== Some of the mail filtering conditions may depend on the value of "helo domain" name, i.e. the argument to the SMTP 'EHLO' (or 'HELO') command. If you ever need such conditions, take into account the following caveats. Firstly, although 'Sendmail' passes the helo domain in '$s' macro, it does not do this consistently. In fact, the '$s' macro is available only to the 'helo' handler, all other handlers won't see it, no matter what the value of the corresponding 'Milter.macros.HANDLER' statement. So, if you wish to access its value from any handler, other than 'helo', you will have to store it in a "variable" in the 'helo' handler and then use this variable value in the other handler. This approach is also recommended for another MTAs. This brings us to the concept of variables in 'mailfromd' scripts. A variable is declared using the following syntax: TYPE NAME where VARIABLE is the variable name and TYPE is 'string', if the variable is to hold a string value, and 'number', if it is supposed to have a numeric value. A variable is assigned a value using the 'set' statement: set NAME EXPR where EXPR is any valid MFL expression. The 'set' statement can occur within handler or function declarations as well as outside of them. There are two kinds of 'Mailfromd' variables: "global variables", that are visible to all handlers and functions, and "automatic variables", that are available only within the handler or function where they are declared. For our purpose we need a global variable (*Note Variable classes: Variables, for detailed descriptions of both kinds of variables). The following example illustrates an approach that allows to use the 'HELO' domain name in any handler: # Declare the helohost variable string helohost prog helo do # Save the host name for further use set helohost $s done prog envfrom do # Reject hosts claiming to be localhost if helohost = "localhost" reject 570 "Please specify real host name" fi done Notice, that for this approach to work, your MTA must export the 's' macro (e.g., in case of Sendmail, the 'Milter.macros.helo' statement in the 'sendmail.cf' file must contain 's'. *note Sendmail::). This requirement can be removed by using the "handler argument" of 'helo'. Each 'mailfromd' handler is given one or several arguments. The exact number of arguments and their meaning are handler-specific and are described in *note Handlers::, and *note Figure 3.1: milter-control-flow. The arguments are referenced by their ordinal number, using the notation '$N'. The 'helo' handler takes one argument, whose value is the helo domain. Using this information, the 'helo' handler from the example above can be rewritten as follows: prog helo do # Save the host name for further use set helohost $1 done  File: mailfromd.info, Node: rset, Next: Controlling Number of Recipients, Prev: HELO Domain, Up: Tutorial 3.10 SMTP RSET and Milter Abort Handling ======================================== In previous section we have used a global variable to hold certain information and share it between handlers. In the majority of cases, such information is session specific, and becomes invalid if the remote party issues the SMTP 'RSET' command. Therefore, 'mailfromd' clears all global variables when it receives a Milter 'abort' request, which is normally generated by this command. However, you may need some variables that retain their values even across SMTP session resets. In 'mailfromd' terminology such variables are called "precious". Precious variables are declared by prefixing their declaration with the keyword 'precious'. Consider, for example, this snippet of code: precious number rcpt_counter prog envrcpt do set rcpt_counter rcpt_counter + 1 done Here, the variable 'rcpt_counter' is declared as precious and its value is incremented each time the 'envrcpt' handler is called. This way, 'rcpt_counter' will keep the total number of SMTP 'RCPT' commands issued during the session, no matter how many times it was restarted using the 'RSET' command.  File: mailfromd.info, Node: Controlling Number of Recipients, Next: Sending Rate, Prev: rset, Up: Tutorial 3.11 Controlling Number of Recipients ===================================== Any MTA provides a way to limit the number of recipients per message. For example, in 'Sendmail' you may use the 'MaxRecipientsPerMessage' option(1). However, such methods are not flexible, so you are often better off using 'mailfromd' for this purpose. 'Mailfromd' keeps the number of recipients collected so far in variable 'rcpt_count', which can be controlled in 'envrcpt' handler as shown in the example below: prog envrcpt do if rcpt_count > 10 reject 550 5.7.1 "Too many recipients" fi done This filter will accept no more than 10 recipients per message. You may achieve finer granularity by using additional conditions. For example, the following code will allow any number of recipients if the mail is coming from a domain relayed by the server, while limiting it to 10 for incoming mail from other domains: prog envrcpt do if not relayed(hostname($client_addr)) and rcpt_count > 10 reject 550 5.7.1 "Too many recipients" fi done There are three important features to notice in the above code. First of all, it introduces two "boolean" operators: 'and', which evaluates to 'true' only if both left-side and right-side expressions are 'true', and 'not', which reverses the value of its argument. Secondly, the scope of an operation is determined by its "precedence", or "binding strength". 'Not' binds more tightly than 'and', so its scope is limited by the next expression between it and 'and'. Using parentheses to underline the operator scoping, the above 'if' condition can be rewritten as follows: if (not (relayed(hostname($client_addr)))) and (%rcpt_count > 10) Finally, it is important to notice that all boolean expressions are computed using "shortcut evaluation". To understand what it is, let's consider the following expression: 'X and Y'. Its value is 'true' only if both X and Y are 'true'. Now suppose that we evaluate the expression from left to right and we find that X is false. This means that no matter what the value of Y is, the resulting expression will be 'false', therefore there is no need to compute Y at all. So, the boolean shortcut evaluation works as follows: 'X and Y' If 'X => false', do not evaluate Y and return 'false'. 'X or Y' If 'X => true', do not evaluate Y and return 'true'. Thus, in the expression 'not relayed(hostname($client_addr)) and rcpt_count > 10', the value of the 'rcpt_count' variable will be compared with '10' only if the 'relayed' function yielded 'false'. To further enhance our sample filter, you may wish to make the 'reject' output more informative, to let the sender know what the recipient limit is. To do so, you can use the "concatenation operator" '.' (a dot): set max_rcpt 10 prog envrcpt do if not relayed(hostname($client_addr)) and rcpt_count > 10 reject 550 5.7.1 "Too many recipients, max=" . max_rcpt fi done When evaluating the third argument to 'reject', 'mailfromd' will first convert 'max_rcpt' to string and then concatenate both strings together, producing string 'Too many recipients, max=10'. ---------- Footnotes ---------- (1) 'Sendmail (tm) Installation and Operation Guide', chapter 5.6, 'O -- Set Option'.  File: mailfromd.info, Node: Sending Rate, Next: Greylisting, Prev: Controlling Number of Recipients, Up: Tutorial 3.12 Sending Rate ================= We have introduced the notion of mail sending rate in *note Rate Limit::. 'Mailfromd' keeps the computed rates in the special 'rate' database (*note Databases::). Each record in this database consists of a 'key', for which the rate is computed, and the rate value, in form of a double precision floating point number, representing average number of messages per second sent by this 'key' within the last sampling interval. In the simplest case, the sender email address can be used as a 'key', however we recommend to use a conjunction EMAIL-SENDER_IP instead, so the actual EMAIL owner won't be blocked by actions of some spammer abusing his/her address. Two functions are provided to control and update sending rates. The 'rateok' function takes three mandatory arguments: bool rateok(string KEY, number INTERVAL, number THRESHOLD) The KEY meaning is described above. The INTERVAL is the sampling interval, or the number of seconds to which the actual sending rate value is converted. Remember that it is stored internally as a floating point number, and thus cannot be directly used in 'mailfromd' filters, which operate only on integer numbers. To use the rate value, it is first converted to messages per given interval, which is an integer number. For example, the rate '0.138888' brought to 1-hour interval gives '500' (messages per hour). When the 'rateok' function is called, it recomputes rate record for the given KEY. If the new rate value converted to messages per given INTERVAL is less than THRESHOLD, the function updates the database and returns 'True'. Otherwise it returns 'False' and does not update the database. This function must be "required" prior to use, by placing the following statement somewhere at the beginning of your script: require rateok For example, the following code limits the mail sending rate for each 'email address'-'IP' combination to 180 per hour. If the actual rate value exceeds this limit, the sender is returned a temporary failure response: require rateok prog envfrom do if not rateok($f . "-" . ${client_addr}, 3600, 180) tempfail 450 4.7.0 "Mail sending rate exceeded. Try again later" fi done Notice argument concatenation, used to produce the key. It is often inconvenient to specify intervals in seconds, therefore a special 'interval' function is provided. It converts its argument, which is a textual string representing time interval in English, to the corresponding number of seconds. Using this function, the function invocation would be: rateok($f . "-" . ${client_addr}, interval("1 hour"), 180) The 'interval' function is described in *note interval::, and time intervals are discussed in *note time interval specification::. The 'rateok' function begins computing the rate as soon as it has collected enough data. By default, it needs at least four mails. Since this may lead to a big number of false positives (i.e. overestimated rates) at the beginning of sampling interval, there is a way to specify a minimum number of samples 'rateok' must collect before starting to actually compute rates. This number of samples is given as the optional fourth argument to the function. For example, the following call will always return 'True' for the first 10 mails, no matter what the actual rate: rateok($f . "-" . ${client_addr}, interval("1 hour"), 180, 10) The 'tbf_rate' function allows to exercise more control over the mail rates. This function implements a "token bucket filter" (TBF) algorithm. The token bucket controls when the data can be transmitted based on the presence of abstract entities called "tokens" in a container called "bucket". Each token represents some amount of data. The algorithm works as follows: * A token is added to the bucket at a constant rate of 1 token per T microseconds. * A bucket can hold at most M tokens. If a token arrives when the bucket is full, that token is discarded. * When N items of data arrive (e.g. N mails), N tokens are removed from the bucket and the data are accepted. * If fewer than N tokens are available, no tokens are removed from the bucket and the data are not accepted. This algorithm allows to keep the data traffic at a constant rate T with bursts of up to M data items. Such bursts occur when no data was being arrived for M*T or more microseconds. 'Mailfromd' keeps buckets in a database 'tbf'. Each bucket is identified by a unique "key". The 'tbf_rate' function is defined as follows: bool tbf_rate(string KEY, number N, number T, number M) The KEY identifies the bucket to operate upon. The rest of arguments is described above. The 'tbf_rate' function returns 'True' if the algorithm allows to accept the data and 'False' otherwise. Depending on how the actual arguments are selected the 'tbf_rate' function can be used to control various types of flow rates. For example, to control mail sending rate, assign the arguments as follows: N to the number of mails and T to the control interval in microseconds: prog envfrom do if not tbf_rate($f . "-" . $client_addr, 1, 10000000, 20) tempfail 450 4.7.0 "Mail sending rate exceeded. Try again later" fi done The example above permits to send at most one mail each 10 seconds. The burst size is set to 20. Another use for the 'tbf_rate' function is to limit the total delivered mail size per given interval of time. To do so, the function must be used in 'prog eom' handler, because it is the only handler where the entire size of the message is known. The N argument must contain the number of bytes in the email (or email bytes * number of recipients), and the T must be set to the number of bytes per microsecond a given user is allowed to send. The M argument must be large enough to accommodate a couple of large emails. E.g.: prog eom do if not tbf_rate("$f-$client_addr", message_size(current_message()), 10240*1000000, # At most 10 kb/sec 10*1024*1024) tempfail 450 4.7.0 "Data sending rate exceeded. Try again later" fi done *Note Rate limiting functions::, for more information about 'rateok' and 'tbf_rate' functions.  File: mailfromd.info, Node: Greylisting, Next: Local Account Verification, Prev: Sending Rate, Up: Tutorial 3.13 Greylisting ================ Greylisting is a simple method of defending against the spam proposed by Evan Harris. In few words, it consists in recording the 'sender IP'-'sender email'-'recipient email' triplet of mail transactions. Each time the unknown triplet is seen, the corresponding message is rejected with the 'tempfail' code. If the mail is legitimate, this will make the originating server retry the delivery later, until the destination eventually accepts it. If, however, the mail is a spam, it will probably never be retried, so the users will not be bothered by it. Even if the spammer will retry the delivery, the "greylisting period" will give spam-detection systems, such as DNSBLs, enough time to detect and blacklist it, so by the time the destination host starts accepting emails from this triplet, it will already be blocked by other means. You will find the detailed description of the method in The Next Step in the Spam Control War: Greylisting (http://projects.puremagic.com/greylisting/whitepaper.html), the original whitepaper by Evan Harris. The 'mailfromd' implementation of greylisting is based on 'greylist' function. The function takes two arguments: the 'key', identifying the greylisting triplet, and the 'interval'. The function looks up the key in the "greylisting database". If such a key is not found, a new entry is created for it and the function returns 'true'. If the key is found, 'greylist' returns 'false', if it was inserted to the database more than 'interval' seconds ago, and 'true' otherwise. In other words, from the point of view of the greylisting algorithm, the function returns 'true' when the message delivery should be blocked. Thus, the simplest implementation of the algorithm would be: prog envrcpt do if greylist("${client_addr}-$f-${rcpt_addr}", interval("1 hour")) tempfail 451 4.7.1 "You are greylisted" fi done However, the message returned by this example, is not informative enough. In particular, it does not tell when the message will be accepted. To help you produce more informative messages, 'greylist' function stores the number of seconds left to the end of the greylisting period in the global variable 'greylist_seconds_left', so the above example could be enhanced as follows: prog envrcpt do set gltime interval("1 hour") if greylist("${client_addr}-$f-${rcpt_addr}", gltime) if greylist_seconds_left = gltime tempfail 451 4.7.1 "You are greylisted for %gltime seconds" else tempfail 451 4.7.1 "Still greylisted for %greylist_seconds_left seconds" fi fi done In real life you will have to avoid greylisting some messages, in particular those coming from the '<>' address and from the IP addresses in your relayed domain. It can easily be done using the techniques described in previous sections and is left as an exercise to the reader. 'Mailfromd' provides two implementations of greylisting primitives, which differ in the information stored in the database. The one described above is called "traditional". It keeps in the database the time when the greylisting was activated for the given key, so the 'greylisting' function uses its second argument ('interval') and the current timestamp to decide whether the key is still greylisted. The second implementation is called by the name of its inventor "Con Tassios". This implementation stores in the database the time when the greylisting period is set to expire, computed by the 'greylist' when it is first called for the given key, using the formula 'current_timestamp + interval'. Subsequent calls to 'greylist' compare the current timestamp with the one stored in the database and ignore their second argument. This implementation is enabled by one of the following pragmas: #pragma greylist con-tassios or #pragma greylist ct When Con Tassios implementation is used, yet another function becomes available. The function 'is_greylisted' (*note is_greylisted: Greylisting functions.) returns 'True' if its argument is greylisted and 'False' otherwise. It can be used to check for the greylisting status without actually updating the database: if is_greylisted("${client_addr}-$f-${rcpt_addr}") ... fi One special case is "whitelisting", which is often used together with greylisting. To implement it, 'mailfromd' provides the function 'dbmap', which takes two mandatory arguments: 'dbmap(FILE, KEY)' (it also allows an optional third argument, see *note dbmap::, for more information on it). The first argument is the name of the DBM file where to search for the key, the second one is the key to be searched. Assuming you keep your whitelist database in file '/var/run/whitelist.db', a more practical example will be: prog envrcpt do set gltime interval("1 hour") if not ($f = "" or relayed(hostname(${client_addr})) or dbmap("/var/run/whitelist.db", ${client_addr})) if greylist("${client_addr}-$f-${rcpt_addr}", gltime) if greylist_seconds_left = gltime tempfail 451 4.7.1 "You are greylisted for %gltime seconds" else tempfail 451 4.7.1 "Still greylisted for %greylist_seconds_left seconds" fi fi fi done  File: mailfromd.info, Node: Local Account Verification, Next: Databases, Prev: Greylisting, Up: Tutorial 3.14 Local Account Verification =============================== In your filter script you may need to verify if the given user name is served by your mail server, in other words, to verify if it represents a "local account". Notice that in this context, the word "local" does not necessarily mean that the account is local for the server running 'mailfromd', it simply means any account whose mailbox is served by the mail servers using 'mailfromd'. The 'validuser' function may be used for this purpose. It takes one argument, the user name, and returns 'true' if this name corresponds to a local account. To verify this, the function relies on 'libmuauth', a powerful authentication library shipped with GNU 'mailutils'. More precisely, it invokes a list of "authorization" functions. Each function is responsible for looking up the user name in a particular source of information, such as system 'passwd' database, an SQL database, etc. The search is terminated when one of the functions finds the name in question or the list is exhausted. In the former case, the account is local, in the latter it is not. This concept is discussed in detail in *note Authentication: (mailutils)authentication.). Here we will give only some practical advices for implementing it in 'mailfromd' filters. The actual list of available authorization modules depends on your 'mailutils' installation. Usually it includes, apart from traditional UNIX 'passwd' database, the functions for verifying PAM, RADIUS and SQL database accounts. Each of the authorization methods is configured using special configuration file statements. For the description of the Mailutils configuration files, *Note Mailutils Configuration File: (mailutils)configuration. You can obtain the template for 'mailfromd' configuration by running 'mailfromd --config-help'. For example, the following 'mailfromd.conf' file: auth { authorization pam:system; } pam { service mailfromd; } sets up the authorization using PAM and system 'passwd' database. The name of PAM service to use is 'mailfromd'. The function 'validuser' is often used together with 'dbmap', as in the example below: #pragma dbprop /etc/mail/aliases.db null if dbmap("/etc/mail/aliases.db", localpart($rcpt_addr)) and validuser(localpart($rcpt_addr)) ... fi For more information about 'dbmap' function, see *note dbmap::. For a description of 'dbprop' pragma, see *note Database functions::.  File: mailfromd.info, Node: Databases, Next: Testing Filter Scripts, Prev: Local Account Verification, Up: Tutorial 3.15 Databases ============== Some 'mailfromd' functions use DBM databases to save their persistent state data. Each database has a unique "identifier", and is assigned several pieces of information for its maintenance: the database "file name" and the "expiration period", i.e. the time after which a record is considered expired. To obtain the list of available databases along with their preconfigured settings, run 'mailfromd --show-defaults'. You will see an output similar to this: version: 8.8 script file: /etc/mailfromd.mf preprocessor: /usr/bin/m4 -s user: mail statedir: /var/run/mailfromd socket: unix:/var/run/mailfromd/mailfrom pidfile: /var/run/mailfromd/mailfromd.pid default syslog: blocking supported databases: gdbm, bdb default database type: bdb optional features: GeoIP greylist database: /var/run/mailfromd/greylist.db greylist expiration: 86400 tbf database: /var/run/mailfromd/tbf.db tbf expiration: 86400 rate database: /var/run/mailfromd/rates.db rate expiration: 86400 cache database: /var/run/mailfromd/mailfromd.db cache positive expiration: 86400 cache negative expiration: 43200 The text below 'optional features' line describes the available built-in databases. Notice that the 'cache' database, in contrast to the rest of databases, has two expiration periods associated with it. This is explained in the next subsection. * Menu: * Database Formats:: * Basic Database Operations:: * Database Maintenance::  File: mailfromd.info, Node: Database Formats, Next: Basic Database Operations, Up: Databases 3.15.1 Database Formats ----------------------- The version 8.8 runs the following database types (or "formats"): 'cache' "Cache database" keeps the information about external emails, obtained using sender verification functions (*note Checking Sender Address::). The key entry to this database is an email address or EMAIL:SENDER-IP string, for addresses checked using strict verification. The data its stores for each key are: 1. Address validity. This field can be either 'success' or 'not_found', meaning the address is confirmed to exists or it is not. 2. The time when the entry was entered into the database. It is used to check for expired entries. The 'cache' database has two expiration periods: a "positive expiration" period, that is applied to entries with the first field set to 'success', and a "negative expiration" period, applied to entries marked as 'not_found'. 'rate' The mail sending rate data, maintained by 'rate' function (*note Rate limiting functions::). A record consists of the following fields: timestamp The time when the entry was entered into the database. interval Interval during which the rate was measured (seconds). count Number of mails sent during this interval. 'tbf' This database is maintained by 'tbf_rate' function (*note TBF::). Each record represents a single bucket and consists of the following keys: timestamp Timestamp of most recent token, as a 64-bit unsigned integer (microseconds resolution). expirytime Estimated time when this bucket expires (seconds since epoch). tokens Number of tokens in the bucket ('size_t'). 'greylist' This database is maintained by 'greylist' function (*note Greylisting::). Each record holds only the timestamp. Its semantics depends on the greylisting implementation in use (*note greylisting types::). In traditional implementation, it is the time when the entry was entered into the database. In Con Tassios implementation, it is the time when the greylisting period expires.  File: mailfromd.info, Node: Basic Database Operations, Next: Database Maintenance, Prev: Database Formats, Up: Databases 3.15.2 Basic Database Operations -------------------------------- The 'mfdbtool' utility is provided for performing various operations on the 'mailfromd' database. To list the contents of a database, use '--list' option. When used without any arguments it will list the 'cache' database: $ mfdbtool --list abrakat@mail.com success Thu Aug 24 15:28:58 2006 baccl@EDnet.NS.CA not_found Fri Aug 25 10:04:18 2006 bhzxhnyl@chello.pl not_found Fri Aug 25 10:11:57 2006 brqp@aaanet.ru:24.1.173.165 not_found Fri Aug 25 14:16:06 2006 You can also list data for any particular key or keys. To do so, give the keys as arguments to 'mfdbtool': $ mfdbtool --list abrakat@mail.com brqp@aaanet.ru:24.1.173.165 abrakat@mail.com success Thu Aug 24 15:28:58 2006 brqp@aaanet.ru:24.1.173.165 not_found Fri Aug 25 14:16:06 2006 To list another database, give its format identifier with the '--format' ('-H') option. For example, to list the 'rate' database: $ mfdbtool --list --format=rate sam@mail.net-62.12.4.3 Wed Sep 6 19:41:42 2006 139 3 0.0216 6.82e-06 axw@rame.com-59.39.165.172 Wed Sep 6 20:26:24 2006 0 1 N/A N/A The '--format' option can be used with any database management option, described below. Another useful operation you can do while listing 'rate' database is the prediction of "estimated time of sending", i.e. the time when the user will be able to send mail if currently his mail sending rate has exceeded the limit. This is done using '--predict' option. The option takes an argument, specifying the mail sending rate limit, e.g. (the second line is split for readability): $ mfdbtool --predict="180 per 1 minute" ed@fae.net-21.10.1.2 Wed Sep 13 03:53:40 2006 0 1 N/A N/A; free to send service@19.netlay.com-69.44.129.19 Wed Sep 13 15:46:07 2006 7 2 0.286 0.0224; in 46 sec. on Wed Sep 13 15:49:00 2006 Notice, that there is no need to use '--list --format=rate' along with this option, although doing so is not an error. To delete an entry from the database, use '--delete' option, for example: 'mfdbtool --delete abrakat@mail.com'. You can give any number of keys to delete in the command line.  File: mailfromd.info, Node: Database Maintenance, Prev: Basic Database Operations, Up: Databases 3.15.3 Database Maintenance --------------------------- There are two principal operations of database management: expiration and compaction. "Expiration" consists in removing expired entries from the database. In fact, it is rarely needed, since the expired entries are removed in the process of normal 'mailfromd' work. Nevertheless, a special option is provided in case an explicit expiration is needed (for example, before dumping the database to another format, to avoid transferring useless information). The command line option '--expire' instructs 'mfdbtool' to delete expired entries from the specified database. As usual, the database is specified using '--format' option. If it is not given explicitly, 'cache' is assumed. While removing expired entries the space they occupied is marked as free, so it can be used by subsequent inserts. The database does not shrink after expiration is finished. To actually return the unused space to the file system you should "compact" your database. This is done by running 'mfdbtool --compact' (and, optionally, specifying the database to operate upon with '--format' option). Notice, that compacting a database needs roughly as much disk space on the partition where the database resides as is currently used by the database. Database compaction runs in three phases. First, the database is scanned and all non-expired records are stored in the memory. Secondly, a temporary database is created in the state directory and all the cached entries are flushed into it. This database is named after the PID of the running 'mfdbtool' process. Finally, the temporary database is renamed to the source database. Both '--compact' and '--expire' can be applied to all databases by combining them with '--all'. It is useful, for example, in 'crontab' files. For example, I have the following monthly job in my 'crontab': 0 1 1 * * /usr/bin/mfdbtool --compact --all  File: mailfromd.info, Node: Testing Filter Scripts, Next: Run Mode, Prev: Databases, Up: Tutorial 3.16 Testing Filter Scripts =========================== It is important to check your filter script before actually starting to use it. There are several ways to do so. To test the syntax of your filter script, use the '--lint' option. It will cause 'mailfromd' to exit immediately after attempting to compile the script file. If the compilation succeeds, the program will exit with code 0. Otherwise, it will exit with error code 78 ('configuration error'). In the latter case, 'mailfromd' will also print a diagnostic message, describing the error along with the exact location where the error was diagnosed, for example: mailfromd: /etc/mailfromd.mf:39: syntax error, unexpected reject The error location is indicated by the name of the file and the number of the line when the error occurred. By using the '--location-column' option you instruct 'mailfromd' to also print the "column number". E.g. with this option the above error message may look like: mailfromd: /etc/mailfromd.mf:39.12 syntax error, unexpected reject Here, '39' is the line and '12' is the column number. For complex scripts you may wish to obtain a listing of variables used in the script. This can be achieved using '--xref' command line option: The output it produces consists of four columns: Variable name Data type Either 'number' or 'string'. Offset in data segment Measured in words. References A comma-separated list of locations where the variable was referenced. Each location is represented as FILE:LINE. If several locations pertain to the same FILE, the file name is listed only once. Here is an example of the cross-reference output: $ mailfromd --xref Cross-references: ----------------- cache_used number 5 /etc/mailfromd.mf:48 clamav_virus_name string 9 /etc/mailfromd.mf:240,240 db string 15 /etc/mailfromd.mf:135,194,215 dns_record_ttl number 16 /etc/mailfromd.mf:136,172,173 ehlo_domain string 11 gltime number 13 /etc/mailfromd.mf:37,219,220,222,223 greylist_seconds_left number 1 /etc/mailfromd.mf:220,226,227 last_poll_host string 2 If the script passes syntax check, the next step is often to test if it works as you expect it to. This is done with '--test' ('-t') command line option. This option runs the 'envfrom' handler (or another one, see below) and prints the result of its execution. When running your script in test mode, you will need to supply the values of 'Sendmail' macros it needs. You do this by placing the necessary assignments in the command line. For example, this is how to supply initial values for 'f' and 'client_addr' macros: $ mailfromd --test f=gray@gnu.org client_addr=127.0.0.1 You may also need to alter initial values of some global variables your script uses. To do so, use '-v' ('--variable') command line option. This option takes a single argument consisting of the variable name and its initial value, separated by an equals sign. For example, here is how to change the value of 'ehlo_domain' global variable: $ mailfromd -v ehlo_domain=mydomain.org The '--test' option is often useful in conjunction with options '--debug', '--trace' and '--transcript' (*note Logging and Debugging::. The following example shows what the author got while debugging the filter script described in *note Filter Script Example::: $ mailfromd --test --debug=50 f=gray@gnu.org client_addr=127.0.0.1 MX 20 mx20.gnu.org MX 10 mx10.gnu.org MX 10 mx10.gnu.org MX 20 mx20.gnu.org getting cache info for gray@gnu.org found status: success (0), time: Thu Sep 14 14:54:41 2006 getting rate info for gray@gnu.org-127.0.0.1 found time: 1158245710, interval: 29, count: 5, rate: 0.172414 rate for gray@gnu.org-127.0.0.1 is 0.162162 updating gray@gnu.org-127.0.0.1 rates SET REPLY 450 4.7.0 Mail sending rate exceeded. Try again later State envfrom: tempfail To test any handler, other than 'envfrom', give its name as the argument to '--test' option. Since this argument is optional, it is important that it be given immediately after the option, without any intervening white space, for example 'mailfromd --test=helo', or 'mailfromd -thelo'. This method allows to test one handler at a time. To test the script as a whole, use 'mtasim' utility. When started it enters interactive mode, similar to that of 'sendmail -bs', where it expects SMTP commands on its standard input and sends answers to the standard output. The '--port=auto' command line option instructs it to start 'mailfromd' and to create a unique socket for communication with it. For the detailed description of the program and the ways to use it, *Note mtasim::.  File: mailfromd.info, Node: Run Mode, Next: Logging and Debugging, Prev: Testing Filter Scripts, Up: Tutorial 3.17 Run Mode ============= Mailfromd provides a special option that allows to run arbitrary MFL scripts. This is an experimental feature, intended for future use of MFL as a scripting language. When given the '--run' command line option, 'mailfromd' loads the script given in its command line and executes a function called 'main'. The function main must be declared as: func main(...) returns number Mailfromd passes all command line arguments that follow the script name as arguments to that function. When the function returns, its return value is used by 'mailfromd' as exit code. As an example, suppose the file 'script.mf' contains the following: func main (...) returns number do loop for number i 1, while i <= $#, set i i + 1 do echo "arg %i=" . $(i) done done This function prints all its arguments (*Note variadic functions::, for a detailed description of functions with variable number of arguments). Now running: $ mailfromd --run script.mf 1 file dest displays the following: arg 1=1 arg 2=file arg 3=dest Note, that MFL does not have a direct equivalent of shell's '$0' argument. If your function needs to know the name of the script that is being executed, use '__file__' built-in constant instead (*note __file__: Built-in constants. You may name your start function with any name other than the default 'main'. In this case, give its name as an argument to the '--run' option. This argument is optional, therefore it must be separated from the option by an equals sign (with no whitespace from either side). For example, given the command line below, 'mailfromd' loads the file 'script.mf' and execute the function named 'start': $ mailfromd --run=start script.mf * Menu: * top-block:: The Top of a Script File. * getopt:: Parsing Command Line Arguments.  File: mailfromd.info, Node: top-block, Next: getopt, Up: Run Mode 3.17.1 The Top of a Script File ------------------------------- The '--run' option makes it possible to use 'mailfromd' scripts as standalone programs. The traditional way to do so was to set the executable bit on the script file and to begin the script with the "interpreter selector", i.e. the characters '#!' followed by the name of the 'mailfromd' executable, e.g.: #! /usr/sbin/mailfromd --run This would cause the shell to invoke 'mailfromd' with the command line constructed from the '--run' option, the name of the invoked script file itself, and any actual arguments from the invocation. Once invoked, 'mailfromd' would treat the initial '#!' line as a usual single-line comment (*note Comments::). However, the interpretation of the '#!' by shells has various deficiencies, which depend on the actual shell being used. For example, some shells pass any characters following the whitespace after the interpreter name as a single argument, some others silently truncate the command line after some number of characters, etc. This often make it impossible to pass additional arguments to 'mailfromd'. For example, a script which begins with the following line would most probably fail to be executed properly: #! /usr/sbin/mailfromd --no-config --run To compensate for these deficiencies and to allow for more complex invocation sequences, 'mailfromd' handles initial '#' in a special way. If the first line of a source file begins with '#!/' or '#! /' (with a single space between '!' and '/'), it is treated as a start of a multi-line comment, which is closed by the two characters '!#' on a line by themselves. Thus, the correct way to begin a 'mailfromd' script is: #! /usr/sbin/mailfromd --run !# Using this feature, you can start the 'mailfromd' with arbitrary shell code, provided it ends with an 'exec' statement invoking the interpreter itself. For example: #!/bin/sh exec /usr/sbin/mailfromd --no-config --run $0 $@ !# func main(...) returns number do /* actual mfl code goes here */ done Note the use of '$0' and '$@' to pass the actual script file name and command line arguments to 'mailfromd'.  File: mailfromd.info, Node: getopt, Prev: top-block, Up: Run Mode 3.17.2 Parsing Command Line Arguments ------------------------------------- A special function is provided to break (parse) options in command lines, and to check for legal options. It uses the GNU getopt routines (*note getopt: (libc)Getopt.). -- Built-in Function: string getopt (number ARGC, pointer ARGV, ...) The 'getopt' function parses the command line arguments, as supplied by ARGC and ARGV. The ARGC argument is the argument count, and ARGV is an opaque data structure, representing the array of arguments(1). The operator 'vaptr' (*note vaptr::) is provided to initialize this argument. An argument that starts with '-' (and is not exactly '-' or '--'), is an option element. An argument that starts with a '-' is called "short" or "traditional" option. The characters of this element, except for the initial '-' are option characters. Each option character represents a separate option. An argument that starts with '--' is called "long" or "GNU" option. The characters of this element, except for the initial '--' form the "option name". Options may have arguments. The argument to a short option is supplied immediately after the option character, or as the next word in command line. E.g., if option '-f' takes a mandatory argument, then it may be given either as '-farg' or as '-f arg'. The argument to a long option is either given immediately after it and separated from the option name by an equals sign (as '--file=arg'), or is given as the next word in the command line (e.g. '--file arg'). If the option argument is optional, i.e. it may not necessarily be given, then only the first form is allowed (i.e. either '-farg' or '--file=arg'. The '--' command line argument ends the option list. Any arguments following it are not considered options, even if they begin with a dash. If 'getopt' is called repeatedly, it returns successively each of the option characters from each of the option elements (for short options) and each option name (for long options). In this case, the actual arguments are supplied only to the first invocation. Subsequent calls must be given two nulls as arguments. Such invocation instructs 'getopt' to use the values saved on the previous invocation. When the function finds another option, it returns its character or name updating the external variable 'optind' (see below) so that the next call to 'getopt' can resume the scan with the following option. When there are no more options left, or a '--' argument is encountered, 'getopt' returns an empty string. Then 'optind' gives the index in ARGV of the first element that is not an option. The legitimate options and their characteristics are supplied in additional arguments to 'getopt'. Each such argument is a string consisting of two parts, separated by a vertical bar ('|'). Any one of these parts is optional, but at least one of them must be present. The first part specifies short option character. If it is followed by a colon, this character takes mandatory argument. If it is followed by two colons, this character takes an optional argument. If only the first part is present, the '|' separator may be omitted. Examples: "c" "c|" Short option '-c'. "f:" "f:|" Short option '-f', taking a mandatory argument. "f::" "f::|" Short option '-f', taking an optional argument. If the vertical bar is present and is followed by any characters, these characters specify the name of a long option, synonymous to the short one, specified by the first part. Any mandatory or optional arguments to the short option remain mandatory or optional for the corresponding long option. Examples: "f:|file" Short option '-f', or long option '--file', requiring an argument. "f::|file" Short option '-f', or long option '--file', taking an optional argument. In any of the above cases, if this option appears in the command line, 'getopt' returns its short option character. To define a long option without a short equivalent, begin it with a bar, e.g.: "|help" If this option is to take an argument, this is specified using the mechanism described above, except that the short option character is replaced with a minus sign. For example: "-:|output" Long option '--output', which takes a mandatory argument. "-::|output" Long option '--output', which takes an optional argument. If an option is returned that has an argument in the command line, 'getopt' stores this argument in the variable 'optarg'. After each invocation, 'getopt' sets the variable 'optind' to the index of the next ARGV element to be parsed. Thus, when the list of options is exhausted and the function returned an empty string, 'optind' contains the index of the the first element that is not an option. When 'getopt' encounters an option that is not described in its arguments or if it detects a missing option argument it prints an error message using 'mailfromd' logging facilities, stores the offending option in the variable 'optopt', and returns '?'. If printing error message is not desired (e.g. the application is going to take care of error messaging), it can be disabled by setting the variable 'opterr' to '0'. The third argument to 'getopt', called "controlling argument", may be used to control the behavior of the function. If it is a colon, it disables printing the error message for unrecognized options and missing option arguments (as setting 'opterr' to '0' does). In this case 'getopt' returns ':', instead of '?' to indicate missing option argument. If the controlling argument is a plus sign, or the environment variable 'POSIXLY_CORRECT' is set, then option processing stops as soon as a non-option argument is encountered. By default, if options and non optional arguments are intermixed in ARGV, 'getopt' permutes them so that the options go first, followed by non-optional arguments. If the controlling argument is '-', then each non-option element in ARGV is handled as if it were the argument of an option with character code 1 ('"\001"', in MFL notation. This can used by programs that are written to expect options and other ARGV-elements in any order and that care about the ordering of the two. Any other value of the controlling argument is handled as an option definition. A special language construct is provided to supply the second argument (ARGV) to 'getopt' and similar functions: vaptr(PARAM) where PARAM is a positional parameter, from which to start the array of ARGV. For example: func main(...) returns number do set rc getopt($#, vaptr($1), "|help") ... Here, 'vaptr($1)' constructs the ARGV array from all the arguments, supplied to the function 'main'. To illustrate the use of 'getopt' function, let's suppose you write a script that takes the following options: '-f FILE' '--file=FILE' '--output[=DIR]' '--help' Then, the corresponding 'getopt' invocation will be: func main(...) returns number do loop for string rc getopt($#, vaptr($1), "f:|file", "-::|output", "h|help"), while rc != "", set rc getopt(0, 0) do switch rc do case "f": set file optarg case "output" set output 1 set output_dir optarg case "h" help() default: return 1 done ... ---------- Footnotes ---------- (1) When MFL has array data type, the second argument will change to array of strings.  File: mailfromd.info, Node: Logging and Debugging, Next: Runtime errors, Prev: Run Mode, Up: Tutorial 3.18 Logging and Debugging ========================== Depending on its operation mode, 'mailfromd' tries to guess whether it is appropriate to print its diagnostics and informational messages on standard error or to send them to syslog. Standard error is assumed if the program is run with one of the following command line options: * '--test' (*note Testing Filter Scripts::) * '--run' (*note Run Mode::) * '--lint' (*note Testing Filter Scripts::) * '--dump-code' (*note Logging and Debugging Options::) * '--dump-grammar-trace' (*note Logging and Debugging Options::) * '--dump-lex-trace' (*note Logging and Debugging Options::) * '--dump-macros' (*note Logging and Debugging Options::) * '--dump-tree' (*note Logging and Debugging Options::) * '--xref' or '--dump-xref') (*note Testing Filter Scripts::) If none of these are used, 'mailfromd' switches to syslog as soon as it finishes its startup. There are two ways to communicate with the 'syslogd' daemon: using the 'syslog' function from the system 'libc' library, which is a "blocking" implementation in most cases, or via internal, "asynchronous", syslog implementation. Whether the latter is compiled in and which of the implementation is used by default is determined while compiling the package, as described in *note Using non-blocking syslog: syslog-async. The '--logger' command line option allows you to manually select the diagnostic channel: '--logger=stderr' Log everything to the standard error. '--logger=syslog' Log to syslog. '--logger=syslog:async' Log to syslog using the asynchronous syslog implementation. Another way to select the diagnostic channel is by using the 'logger' statement in the configuration file. The statement takes the same argument as its command line counterpart. The rest of details regarding diagnostic output are controlled by the 'logging' configuration statement. The default syslog facility is 'mail'; it can be changed using the '--log-facility' command line option or 'facility' statement. Argument in both cases is a valid facility name, i.e. one of: 'user', 'daemon', 'auth', 'authpriv', 'mail', and 'local0' through 'local7'. The argument can be given in upper, lower or mixed cases, and it can be prefixed with 'log_': Another syslog-related parameter that can be configured is the "tag", which identifies 'mailfromd' messages. The default tag is the program name. It is changed by the '--log-tag' ('-L' command line option and the 'tag' logging statement. The following example configures both the syslog facility and tag: logging { facility local7; tag "mfd"; } As any other UNIX utility, 'mailfromd' is very quiet unless it has something important to communicate, such as, e.g. an error condition. A set of command line options is provided for controlling the verbosity of its output. The '--trace' option enables tracing Sendmail actions executed during message verifications. When this option is given, any 'accept', 'discard', 'continue', etc. triggered during execution of your filter program will leave their traces in the log file. Here is an example of how it looks like (syslog time stamp, tag and PID removed for readability): k8DHxvO9030656: /etc/mailfromd.mf:45: reject 550 5.1.1 Sender validity not confirmed This shows that while verifying the message with ID 'k8DHxvO9030656' the 'reject' action was executed by filter script '/etc/mailfromd.mf' at line 45. The use of message ID in the log deserves a special notice. The program will always identify its log messages with the 'Message-Id', when it is available. Your responsibility as an administrator is to make sure it is available by configuring your MTA to export the macro 'i' to 'mailfromd'. The rule of thumb is: make 'i' available to the very first handler 'mailfromd' executes. It is not necessary to export it to the rest of the handlers, since 'mailfromd' will cache it. For example, if your filter script contains 'envfrom' and 'envrcpt' handlers, export 'i' for 'envfrom'. The exact instructions on how to ensure it depend on the MTA you use. For 'Sendmail', refer to *note Sendmail::. For MeTA1, see *note MeTA1::, and *note pmult-macros::. For 'Postfix', see *note Postfix::. To push log verbosity further, use the 'debug' configuration statement (*note conf-debug::) or its command line equivalent, '--debug' ('-d', *note --debug::). Its argument is a "debugging level", whose syntax is described in . The debugging output is controlled by a set of levels, each of which can be set independently of others. Each debug level consists of a category name, which identifies the part of package for which additional debugging is desired, and a level number, which indicates how verbose should its output be. Valid debug levels are: error Displays error conditions which are normally not reported, but passed to the caller layers for handling. trace0 through trace9 Ten levels of verbosity, 'trace0' producing less output, 'trace9' producing the maximum amount of output. prot Displays network protocol interaction, where applicable. The overall debugging level is specified as a list of individual levels, delimited with semicolons. Each individual level can be specified as one of: !CATEGORY Disables all levels for the specified category. CATEGORY Enables all levels for the specified category. CATEGORY.LEVEL For this category, enables all levels from 'error' to LEVEL, inclusive. CATEGORY.=LEVEL Enables only the given LEVEL in this CATEGORY. CATEGORY.!LEVEL Disables all levels from 'error' to LEVEL, inclusive, in this CATEGORY. CATEGORY.!=LEVEL Disables only the given LEVEL in this CATEGORY. CATEGORY.LEVELA-LEVELB Enables all levels in the range from LEVELA to LEVELB, inclusive. CATEGORY.!LEVELA-LEVELB Disables all levels in the range from LEVELA to LEVELB, inclusive. Additionally, a comma-separated list of level specifications is allowed after the dot. For example, the following specification: acl.prot,!=trace9,!trace2 enables in category acl all levels, except trace9, trace0, trace1, and trace2. Implementation and applicability of each level of debugging differs between various categories. Categories built-in to mailutils are described in . Mailfromd introduces the following additional categories: db trace0 Detailed debugging info about expiration and compaction. trace5 List records being removed. dns trace8 Verbose information about attempted DNS queries and their results. trace9 Enables 'libadns' internal debugging. srvman trace0 Additional information about normal conditions, such as subprocess exiting successfully or a remote party being allowed access by ACL. trace1 Detailed transcript of server manager actions: startup, shutdown, subprocess cleanups, etc. trace3 Additional info about fd sets. trace4 Individual subserver status information. trace5 Subprocess registration. pmult trace1 Verbosely list incoming connections, functions being executed and erroneous conditions: missing headers in SMFIR_CHGHEADER, undefined macros, etc. trace2 List milter requests being processed. trace7 List SMTP body content in SMFIR_REPLBODY requests. error Verbosely list mild errors encountered: bad recipient addresses, etc. callout trace0 Verification session transcript. trace1 MX servers checks. trace5 List emails being checked. trace9 Additional info. main trace5 Info about hostnames in relayed domain list engine Debugging of the virtual engine. trace5 Message modification lists. trace6 Debug message modification operations and Sendmail macros registered. trace7 List SMTP stages ('xxfi_*' calls). trace9 Cleanup calls. pp Preprocessor. trace1 Show command line of the preprocessor being run. prog trace8 Stack operations trace9 Debug exception state save/restore operations. spf error Mild errors. trace0 List calls to 'spf_eval_record', 'spf_test_record', 'spf_check_host_internal', etc. trace1 General debug info. trace6 Explicitly list A records obtained when processing the 'a' SPF mechanism. Categories starting with 'bi_' debug built-in modules: bi_db Database functions. trace5 List database look-ups. trace6 Trace operations on the greylisting database. bi_sa SpamAssassin and ClamAV API. trace1 Report the findings of the 'clamav' function. trace9 Trace payload in interactions with 'spamd'. bi_io I/O functions. trace1 Debug the following functions: 'open', 'spawn', 'write'. trace2 Report stderr redirection. trace3 Report external commands being run. bi_mbox Mailbox functions. trace1 Report opened mailboxes. bi_other Other built-ins. trace1 Report results of checks for existence of usernames. For example, the following invocation enables levels up to 'trace2' in category 'engine', all levels in category 'savsrv' and levels up to 'trace0' in category 'srvman': $ mailfromd --debug='engine.trace2;savsrv;srvman.trace0' You need to have sufficient knowledge about 'mailfromd' internal structure to use this form of the '--debug' option. To control the execution of the sender verification functions (*note SMTP Callout functions::), you may use '--transcript' ('-X') command line option which enables transcripts of SMTP sessions in the logs. Here is an example of the output produced running 'mailfromd --transcript': k8DHxlCa001774: RECV: 220 spf-jail1.us4.outblaze.com ESMTP Postfix k8DHxlCa001774: SEND: HELO mail.gnu.org.ua k8DHxlCa001774: RECV: 250 spf-jail1.us4.outblaze.com k8DHxlCa001774: SEND: MAIL FROM: <> k8DHxlCa001774: RECV: 250 Ok k8DHxlCa001774: SEND: RCPT TO: k8DHxlCa001774: RECV: 550 <>: No thank you rejected: Account Unavailable: Possible Forgery k8DHxlCa001774: poll exited with status: not_found; sent "RCPT TO: ", got "550 <>: No thank you rejected: Account Unavailable: Possible Forgery" k8DHxlCa001774: SEND: QUIT  File: mailfromd.info, Node: Runtime errors, Next: Notes, Prev: Logging and Debugging, Up: Tutorial 3.19 Runtime Errors =================== A "runtime error" is a special condition encountered during execution of the filter program, that makes further execution of the program impossible. There are two kinds of runtime errors: fatal errors, and uncaught exceptions. Whenever a runtime error occurs, 'mailfromd' writes into the log file the following message: RUNTIME ERROR near FILE:LINE: TEXT where FILE:LINE indicates approximate source file location where the error occurred and TEXT gives the textual description of the error. Fatal runtime errors -------------------- Fatal runtime errors are caused by a condition that is impossible to fix at run time. For version 8.8 these are: Not enough memory There is not enough memory for the execution of the program. Try to make more memory available for 'mailfromd' or to reduce its memory requirements by rewriting your filter script. Out of stack space; increase #pragma stacksize Heap overrun; increase #pragma stacksize memory chunk too big to fit into heap These errors are reported when there is not enough space left on stack to perform the requested operation, and the attempt to resize the stack has failed. Usually 'mailfromd' expands the stack when the need arises (*note automatic stack resizing::). This runtime error indicates that there were no more memory available for stack expansion. Try to make more memory available for 'mailfromd' or to reduce its memory requirements by rewriting your filter script. Stack underflow Program attempted to pop a value off the stack but the stack was already empty. This indicates an internal error in the MFL compiler or 'mailfromd' runtime engine. If you ever encounter this error, please report it to . Include the log fragment (about 10-15 lines before and after this log message) and your filter script. *Note Reporting Bugs::, for more information about bug reporting. pc out of range The "program counter" is out of allowed range. This is a severe error, indicating an internal inconsistency in 'mailfromd' runtime engine. If you encounter it, please report it to . Include the log fragment (about 10-15 lines before and after this log message) and your filter script. *Note Reporting Bugs::, for more information about how to report a bug. Programmatic runtime errors --------------------------- These indicate a programmatic error in your filter script, which the MFL compiler was unable to discover at compilation stage: Invalid exception number: N The 'throw' statement used a not existent exception number N. Fix the statement and restart 'mailfromd'. *Note throw::, for the information about 'throw' statement and see *note Exceptions::, for the list of available exception codes. No previous regular expression You have used a back-reference (*note Back references::), where there is no previous regular expression to refer to. Fix this line in your code and restart the program. Invalid back-reference number You have used a back-reference (*note Back references::), with a number greater than the number of available groups in the previous regular expression. For example: if $f matches "(.*)@gnu.org" # Wrong: there is only one group in the regexp above! set x \2 ... Fix your code and restart the daemon. Uncaught exceptions ------------------- Another kind of runtime errors are "uncaught exceptions", i.e. exceptional conditions for which no handler was installed (*Note Exceptions::, for information on exceptions and on how to handle them). These errors mean that the programmer (i.e. you), made no provision for some specific condition. For example, consider the following code: prog envfrom do if $f mx matches "yahoo.com" foo() fi done It is syntactically correct, but it overlooks the fact that 'mx matches' may generate 'e_temp_failure' exception, if the underlying DNS query has timed out (*note Special comparisons::). If this happens, 'mailfromd' has no instructions on what to do next and reports an error. This can easily be fixed using a 'catch' statement, e.g.: prog envfrom do # Catch DNS errors catch e_temp_failure or e_failure do tempfail 451 4.1.1 "MX verification failed" done if $f mx matches "yahoo.com" foo() fi done Another common case are undefined Sendmail macros. In this case the 'e_macroundef' exception is generated: RUNTIME ERROR near foo.c:34: Macro not defined: {client_adr} These can be caused either by misspelling the macro name (as in the example message above) or by failing to export the required name in Sendmail milter configuration (*note exporting macros::). This error should be fixed either in your source code or in 'sendmail.cf' file, but if you wish to provide a special handling for it, you can use the following catch statement: catch e_macroundef do ... done Sometimes the location indicated with the runtime error message is not enough to trace the origin of the error. For example, an error can be generated explicitly with 'throw' statement (*note throw::): RUNTIME ERROR near match_cidr.mf:30: invalid CIDR (text) If you look in module 'match_cidr.mf', you will see the following code (line numbers added for reference): 23 func match_cidr(string ipstr, string cidr) returns number 24 do 25 number netmask 26 27 if cidr matches '^(([0-9]{1,3}\.){3}[0-9]{1,3})/([0-9][0-9]?)' 28 return inet_aton(ipstr) & len_to_netmask(\3) = inet_aton(\1) 29 else 30 throw invcidr "invalid CIDR (%cidr)" 31 fi 32 return 0 33 done Now, it is obvious that the value of 'cidr' argument to 'match_cidr' was wrong, but how to find the caller that passed the wrong value to it? The special command line option '--stack-trace' is provided for this. This option enables dumping "stack traces" when a fatal error occurs. The traces contain information about function calls. Continuing our example, using the '--stack-trace' option you will see the following diagnostics: RUNTIME ERROR near match_cidr.mf:30: invalid CIDR (127%) mailfromd: Stack trace: mailfromd: 0077: match_cidr.mf:30: match_cidr mailfromd: 0096: test.mf:13: bar mailfromd: 0110: mailfromd.mf:18: foo mailfromd: Stack trace finishes mailfromd: Execution of the configuration program was not finished Each trace line describes one stack frame. The lines appear in the order of most recently called to least recently called. Each frame consists of: 1. Value of the program counter at the time of its execution; 2. Source code location, if available; 3. Name of the function called. Thus, the example above can be read as: "the function 'match_cidr' was called by the function 'bar' in file 'test.mf' at line 13. This function was called from the function 'bar', in file 'test.mf' at line 13. In its turn, 'bar' was called by the function 'foo', in file 'mailfromd.mf' at line 18". Examining caller functions will help you localize the source of the error and fix it. You can also request a stack trace any place in your code, by calling the 'stack_trace' function. This can be useful for debugging, or in your 'catch' statements.  File: mailfromd.info, Node: Notes, Prev: Runtime errors, Up: Tutorial 3.20 Notes and Cautions ======================= This section discusses some potential culprits in the MFL. It is important to execute special caution when writing format strings for 'sprintf' (*note String formatting::) and 'strftime' (*note strftime::) functions. They use '%' as a character introducing conversion specifiers, while the same character is used to expand a MFL variable within a string. To prevent this misinterpretation, always enclose format specification in _single quotes_ (*note singe-vs-double::). To illustrate this, let's consider the following example: echo sprintf ("Mail from %s", $f) If a variable 's' is not declared, this line will produce the 'Variable s is not defined' error message, which will allow you to identify and fix the bug. The situation is considerably worse if 's' is declared. In that case you will see no warning message, as the statement is perfectly valid, but at the run-time the variable 's' will be interpreted within the format string, and its value will replace '%s'. To prevent this from happening, single quotes must be used: echo sprintf ('Mail from %s', $f) This does not limit the functionality, since there is no need to fall back to variable interpretation in format strings. Yet another dangerous feature of the language is the way to refer to variable and constant names within literal strings. To expand a variable or a constant the same notation is used (*Note Variables::, and *note Constants::). Now, lets consider the following code: const x 2 string x "X" prog envfrom do echo "X is %x" done Does '%x' in 'echo' refers to the variable or to the constant? The correct answer is 'to the variable'. When executed, this code will print 'X is X'. As of version 8.8, 'mailfromd' will always print a diagnostic message whenever it stumbles upon a variable having the same name as a previously defined constant or vice versa. The resolution of such name clashes is described in detail in *Note variable--constant shadowing::. Future versions of the program may provide a non-ambiguous way of referring to variables and constants from literal strings.  File: mailfromd.info, Node: MFL, Next: Library, Prev: Tutorial, Up: Top 4 Mail Filtering Language ************************* The "mail filtering language", or MFL, is a special language designed for writing filter scripts. It has a simple syntax, similar to that of Bourne shell. In contrast to the most existing programming languages, MFL does not have any special terminating or separating characters (like, e.g. newlines and semicolons in shell)(1). All syntactical entities are separated by any amount of white-space characters (i.e. spaces, tabulations or newlines). The following sections describe MFL syntax in detail. * Menu: * Comments:: Comments. * Pragmas:: Pragmatic comments. * Data Types:: * Numbers:: * Literals:: * Here Documents:: * Sendmail Macros:: * Constants:: * Variables:: * Back references:: * Handlers:: * begin/end:: * Functions:: Functions. * Expressions:: Expressions. * Shadowing:: Variable and Constant Shadowing. * Statements:: * Conditionals:: Conditional Statements. * Loops:: Loop Statements. * Exceptions:: Exceptional Conditions and their Handling. * Polling:: Sender Verification Tests. * Modules:: Modules are Collections of Useful Functions. * Preprocessor:: Input Text Is Preprocessed. * Filter Script Example:: A Working Filter Script Explained. * Reserved Words:: A Reference List of Reserved Words. ---------- Footnotes ---------- (1) There are two noteworthy exceptions: 'require' and 'from ... import' statements, which must be terminated with a period. *Note import::.  File: mailfromd.info, Node: Comments, Next: Pragmas, Up: MFL 4.1 Comments ============ Two types of comments are allowed: C-style, enclosed between '/*' and '*/', and shell-style, starting with '#' character and extending up to the end of line: /* This is a comment. */ # And this too. There are, however, several special cases, where the characters following '#' are not ignored. If the first line begins with '#!/' or '#! /', this is treated as a start of a multi-line comment, which is closed by the characters '!#' on a line by themselves. This feature allows for writing sophisticated scripts. *Note top-block::, for a detailed description. If '#' is followed by word 'include' (with optional whitespace between them), this statement requires inclusion of the specified file, as in C. There are two forms of the '#include' statement: 1. '#include ' 2. '#include "FILE"' The quotes around FILE in the second form quotes are optional. Both forms are equivalent if FILE is an absolute file name. Otherwise, the first form will look for FILE in the "include search path". The second one will look for it in the current working directory first, and, if not found there, in the include search path. The default include search path is: 1. 'PREFIX/share/mailfromd/8.8/include' 2. 'PREFIX/share/mailfromd/include' 3. '/usr/share/mailfromd/include' 4. '/usr/local/share/mailfromd/include' Where PREFIX is the installation prefix. New directories can be appended in front of it using '-I' ('--include') command line option, or 'include-path' configuration statement (*note include-path: conf-base.). For example, invoking $ mailfromd -I/var/mailfromd -I/com/mailfromd creates the following include search path 1. '/var/mailfromd' 2. '/com/mailfromd' 3. 'PREFIX/share/mailfromd/8.8/include' 4. 'PREFIX/share/mailfromd/include' 5. '/usr/share/mailfromd/include' 6. '/usr/local/share/mailfromd/include' Along with '#include', there is also a special form '#include_once', that has the same syntax: #include_once #include_once "FILE" This form works exactly as '#include', except that, if the FILE has already been included, it will not be included again. As the name suggests, it will be included only once. This form should be used to prevent re-inclusions of a code, which can cause problems due to function redefinitions, variable reassignments etc. A line in the form #line NUMBER "IDENTIFIER" causes the MFL compiler to believe, for purposes of error diagnostics, that the line number of the next source line is given by NUMBER and the current input file is named by IDENTIFIER. If the identifier is absent, the remembered file name does not change.  File: mailfromd.info, Node: Pragmas, Next: Data Types, Prev: Comments, Up: MFL 4.2 Pragmatic comments ====================== If '#' is immediately followed by word 'pragma' (with optional whitespace between them), such a construct introduces a "pragmatic comment", i.e. an instruction that controls some configuration setting. The available pragma types are described in the following subsections. * Menu: * prereq:: Pragma prereq. * stacksize:: Pragma stacksize. * regex:: Pragma regex. * dbprop:: Pragma dbprop. * greylist:: Pragma greylist. * miltermacros:: Pragma miltermacros. * provide-callout:: Pragma provide-callout.  File: mailfromd.info, Node: prereq, Next: stacksize, Up: Pragmas 4.2.1 Pragma prereq ------------------- The '#pragma prereq' statement ensures that the correct 'mailfromd' version is used to compile the source file it appears in. It takes version number as its arguments and produces a compilation error if the actual 'mailfromd' version number is earlier than that. For example, the following statement: #pragma prereq 7.0.94 results in error if compiled with 'mailfromd' version 7.0.93 or prior.  File: mailfromd.info, Node: stacksize, Next: regex, Prev: prereq, Up: Pragmas 4.2.2 Pragma stacksize ---------------------- The 'stacksize' pragma sets the initial size of the run-time stack and may also define the policy of its growing, in case it becomes full. The default stack size is 4096 words. You may need to increase this number if your configuration program uses recursive functions or does an excessive amount of string manipulations. -- pragma: stacksize size [incr [max]] Sets stack size to SIZE units. Optional INCR and MAX define stack growth policy (see below). The default "units" are words. The following example sets the stack size to 7168 words: #pragma stacksize 7168 The SIZE may end with a "unit size" suffix: Suffix Meaning ------------------------------------------------------------------- k Kiloword, i.e. 1024 words m Megawords, i.e. 1048576 words g Gigawords, t Terawords (ouch!) Table 4.1: Unit Size Suffix File suffixes are case-insensitive, so the following two pragmas are equivalent and set the stack size to '7*1048576 = 7340032' words: #pragma stacksize 7m #pragma stacksize 7M When the MFL engine notices that there is no more stack space available, it attempts to expand the stack. If this attempt succeeds, the operation continues. Otherwise, a runtime error is reported and the execution of the filter stops. The optional INCR argument to '#pragma stacksize' defines growth policy for the stack. Two growth policies are implemented: "fixed increment policy", which expands stack in a fixed number of "expansion chunks", and "exponential growth policy", which duplicates the stack size until it is able to accommodate the needed number of words. The fixed increment policy is the default. The default chunk size is 4096 words. If INCR is the word 'twice', the duplicate policy is selected. Otherwise INCR must be a positive number optionally suffixed with a size suffix (see above). This indicates the expansion chunk size for the fixed increment policy. The following example sets initial stack size to 10240, and expansion chunk size to 2048 words: #pragma stacksize 10M 2K The pragma below enables exponential stack growth policy: #pragma stacksize 10240 twice In this case, when the run-time evaluator hits the stack size limit, it expands the stack to twice the size it had before. So, in the example above, the stack will be sequentially expanded to the following sizes: 20480, 40960, 81920, 163840, etc. The optional MAX argument defines the maximum size of the stack. If stack grows beyond this limit, the execution of the script will be aborted. If you are concerned about the execution time of your script, you may wish to avoid stack reallocations. To help you find out the optimal stack size, each time the stack is expanded, 'mailfromd' issues a warning in its log file, which looks like this: warning: stack segment expanded, new size=8192 You can use these messages to adjust your stack size configuration settings.  File: mailfromd.info, Node: regex, Next: dbprop, Prev: stacksize, Up: Pragmas 4.2.3 Pragma regex ------------------ The '#pragma regex', controls compilation of expressions. You can use any number of such pragma directives in your 'mailfromd.mf'. The scope of '#pragma regex' extends to the next occurrence of this directive or to the end of the script file, whichever occurs first. -- pragma: regex [push|pop] flags The optional PUSH|POP parameter is one of the words 'push' or 'pop' and is discussed in detail below. The FLAGS parameter is a whitespace-separated list of "regex flags". Each regex-flag is a word specifying some regex feature. It can be preceded by '+' to enable this feature (this is the default), by '-' to disable it or by '=' to reset regex flags to its value. Valid regex-flags are: 'extended' Use POSIX Extended Regular Expression syntax when interpreting regex. If not set, POSIX Basic Regular Expression syntax is used. 'icase' Do not differentiate case. Subsequent regex searches will be case insensitive. 'newline' "Match-any-character" operators don't match a newline. A non-matching list ('[^...]') not containing a newline does not match a newline. "Match-beginning-of-line" operator ('^') matches the empty string immediately after a newline. "Match-end-of-line" operator ('$') matches the empty string immediately before a newline. For example, the following pragma enables POSIX extended, case insensitive matching (a good thing to start your 'mailfromd.mf' with): #pragma regex +extended +icase Optional modifiers 'push' and 'pop' can be used to maintain a stack of regex flags. The statement #pragma regex push [FLAGS] saves current regex flags on stack and then optionally modifies them as requested by FLAGS. The statement #pragma regex pop [FLAGS] does the opposite: restores the current regex flags from the top of stack and applies FLAGS to it. This statement is useful in module and include files to avoid disturbing user regex settings. E.g.: #pragma regex push +extended +icase . . . #pragma regex pop  File: mailfromd.info, Node: dbprop, Next: greylist, Prev: regex, Up: Pragmas 4.2.4 Pragma dbprop ------------------- -- pragma: dbprop pattern prop ... This pragma configures properties for a DBM database. *Note Database functions::, for its detailed description.  File: mailfromd.info, Node: greylist, Next: miltermacros, Prev: dbprop, Up: Pragmas 4.2.5 Pragma greylist --------------------- -- pragma: greylist type Selects the greylisting implementation to use. Allowed values for TYPE are: traditional gray Use the traditional greylisting implementation. This is the default. con-tassios ct Use Con Tassios greylisting implementation. *Note greylisting types::, for a detailed description of these greylisting implementations. Notice, that this pragma can be used only once. A second use of this pragma would constitute an error, because you cannot use both greylisting implementations in the same program.  File: mailfromd.info, Node: miltermacros, Next: provide-callout, Prev: greylist, Up: Pragmas 4.2.6 Pragma miltermacros ------------------------- -- pragma: miltermacros handler macro ... Declare that the Milter stage HANDLER uses MTA macro listed as the rest of arguments. The HANDLER must be a valid handler name (*note Handlers::). The 'mailfromd' parser collects the names of the macros referred to by a '$NAME' construct within a handler (*note Sendmail Macros::) and declares them automatically for corresponding handlers. It is, however, unable to track macros used in functions called from handler as well as those referred to via 'getmacro' and 'macro_defined' functions. Such macros should be declared using '#pragma miltermacros'. During initial negotiation with the MTA, 'mailfromd' will ask it to export the macro names declared automatically or by using the '#pragma miltermacros'. The MTA is free to honor or to ignore this request. In particular, Sendmail versions prior to 8.14.0 and Postfix versions prior to 2.5 do not support this feature. If you use one of these, you will need to export the needed macros explicitly in the MTA configuration. For more details, refer to the section in *note MTA Configuration:: corresponding to your MTA type.  File: mailfromd.info, Node: provide-callout, Prev: miltermacros, Up: Pragmas 4.2.7 Pragma provide-callout ---------------------------- The '#pragma provide-callout' statement is used in the 'callout' module to inform 'mailfromd' that the module has been loaded. Do not use this pragma.  File: mailfromd.info, Node: Data Types, Next: Numbers, Prev: Pragmas, Up: MFL 4.3 Data Types ============== The 'mailfromd' filter script language operates on entities of two types: numeric and string. The "numeric" type is represented internally as a signed long integer. Depending on the machine architecture, its size can vary. For example, on machines with Intel-based CPUs it is 32 bits long. A "string" is a string of characters of arbitrary length. Strings can contain any characters except ASCII NUL. There is also a "generic pointer", which is designed to facilitate certain operations. It appears only in 'body' handler. *Note body handler::, for more information about it.  File: mailfromd.info, Node: Numbers, Next: Literals, Prev: Data Types, Up: MFL 4.4 Numbers =========== A "decimal number" is any sequence of decimal digits, not beginning with '0'. An "octal number" is '0' followed by any number of octal digits ('0' through '7'), for example: '0340'. A "hex number" is '0x' or '0X' followed by any number of hex digits ('0' through '9' and 'a' through 'f' or 'A' through 'F'), for example: '0x3ef1'.  File: mailfromd.info, Node: Literals, Next: Here Documents, Prev: Numbers, Up: MFL 4.5 Literals ============ A literal is any sequence of characters enclosed in single or double quotes. After 'tempfail' and 'reject' actions two special kinds of literals are recognized: three-digit numeric values represent RFC 2821 reply codes, and literals consisting of tree digit groups separated by dots represent an extended reply code as per RFC 1893/2034. For example: 510 # A reply code 5.7.1 # An extended reply code Double-quoted strings --------------------- String literals enclosed in double quotation marks ("double-quoted strings") are subject to "backslash interpretation", "macro expansion", "variable interpretation" and "back reference interpretation". "Backslash interpretation" is performed at compilation time. It consists in replacing the following "escape sequences" with the corresponding single characters: Sequence Replaced with \a Audible bell character (ASCII 7) \b Backspace character (ASCII 8) \f Form-feed character (ASCII 12) \n Newline character (ASCII 10) \r Carriage return character (ASCII 13) \t Horizontal tabulation character (ASCII 9) \v Vertical tabulation character (ASCII 11) Table 4.2: Backslash escapes In addition, the sequence '\NEWLINE' has the same effect as '\n', for example: "a string with\ embedded newline" "a string with\n embedded newline" Any escape sequence of the form '\xHH', where H denotes any hex digit is replaced with the character whose ASCII value is HH. For example: "\x61nother" => "another" Similarly, an escape sequence of the form '\0OOO', where O is an octal digit, is replaced with the character whose ASCII value is OOO. Macro expansion and variable interpretation occur at run-time. During these phases all Sendmail macros (*note Sendmail Macros::), 'mailfromd' variables (*note Variables::), and constants (*note Constants::) referenced in the string are replaced by their actual values. For example, if the Sendmail macro 'f' has the value 'postmaster@gnu.org.ua' and the variable 'last_ip' has the value '127.0.0.1', then the string(1) "$f last connected from %last_ip;" will be expanded to "postmaster@gnu.org.ua last connected from 127.0.0.1;" A "back reference" is a sequence '\D', where D is a decimal number. It refers to the Dth parenthesized subexpression in the last 'matches' statement(2). Any back reference occurring within a double-quoted string is replaced by the value of the corresponding subexpression. *Note Special comparisons::, for a detailed description of this process. Back reference interpretation is performed at run time. Single-quoted strings --------------------- Any characters enclosed in single quotation marks are read unmodified. The following examples contain pairs of equivalent strings: "a string" 'a string' "\\(.*\\):" '\(.*\):' Notice the last example. Single quotes are particularly useful in writing regular expressions (*note Special comparisons::). ---------- Footnotes ---------- (1) Implementation note: actually, the references are not interpreted within the string, instead, each such string is split at compilation time into a series of concatenated atoms. Thus, our sample string will actually be compiled as: $f . " last connected from " . last_ip . ";" *Note Concatenation::, for a description of this construct. You can easily see how various strings are interpreted by using '--dump-tree' option (*note --dump-tree::). In this case, it will produce: CONCAT: CONCAT: CONCAT: SYMBOL: f CONSTANT: " last connected from " VARIABLE last_ip (13) CONSTANT: ";" (2) The subexpressions are numbered by the positions of their opening parentheses, left to right.  File: mailfromd.info, Node: Here Documents, Next: Sendmail Macros, Prev: Literals, Up: MFL 4.6 Here Documents ================== "Here-document" is a special form of a string literal is, allowing to specify multiline strings without having to use backslash escapes. The format of here-documents is: <<[FLAGS]WORD ... WORD The '< has tried to send %count mails. Please see docs for more info. EOT will be expanded to: has tried to send 10 mails. Please see docs for more info. If the WORD is quoted, either by enclosing it in single quote characters or by prepending it with a backslash, all interpretations and expansions within the document body are suppressed. For example: set s <<'EOT' The following line is read verbatim: <$f> has tried to send %count mails. Please see docs for more info. EOT Optional FLAGS in the here-document construct control the way leading white space is handled. If FLAGS is '-' (a dash), then all leading tab characters are stripped from input lines and the line containing WORD. Furthermore, if '-' is followed by a single space, all leading whitespace is stripped from them. This allows here-documents within configuration scripts to be indented in a natural fashion. Examples: <<- TEXT <$f> has tried to send %count mails. Please see docs for more info. TEXT Here-documents are particularly useful with 'reject' actions (*note reject::.  File: mailfromd.info, Node: Sendmail Macros, Next: Constants, Prev: Here Documents, Up: MFL 4.7 Sendmail Macros =================== Sendmail macros are referenced exactly the same way they are in 'sendmail.cf' configuration file, i.e. '$NAME', where NAME represents the macro name. Notice, that the notation is the same for both single-character and multi-character macro names. For consistency with the 'Sendmail' configuration the '${NAME}' notation is also accepted. Another way to reference Sendmail macros is by using function 'getmacro' (*note Macro access::). Sendmail macros evaluate to string values. Notice, that to reference a macro, you must properly export it in your MTA configuration. Attempt to reference a not exported macro will result in raising a 'e_macroundef' exception at the run time (*note uncaught exceptions::).  File: mailfromd.info, Node: Constants, Next: Variables, Prev: Sendmail Macros, Up: MFL 4.8 Constants ============= A "constant" is a symbolic name for an MFL value. Constants are defined using 'const' statement: [QUALIFIER] const NAME EXPR where NAME is an identifier, and EXPR is any valid MFL expression evaluating immediately to a constant literal or numeric value. Optional QUALIFIER defines the scope of visibility for that constant (*note scope of visibility::): either 'public' or 'static'. Once defined, any appearance of NAME in the program text is replaced by its value. For example: const x 10/5 const text "X is " defines the numeric constant 'x' with the value '5', and the literal constant 'text' with the value 'X is '. A special construct is provided to define a series of numeric constants (an "enumeration"): [QUALIFIER] const do NAME0 [EXPR0] NAME1 [EXPR1] ... NAMEN [EXPRN] done Each EXPRN, if present, must evaluate to a constant numeric expression. The resulting value will be assigned to constant NAMEN. If EXPRN is not supplied, the constant will be defined to the value of the previons constant plus one. If EXPR0 is not supplied, 0 is assumed. For example, consider the following statement const do A B C 10 D done This defines 'A' to 0, 'B' to 1, 'C' to 10 and 'D' to 11. As a matter of fact, EXPRN may also evaluate to a constant string expression, provided that all expressions in the enumeration 'const' statement are provided. That is, the following is correct: const do A "one" B "two" C "three" D "four" done whereas the following is not: const do A "one" B C "three" D "four" done Trying to compile the latter example will produce: mailfromd: FILENAME:5.3: initializer element is not numeric which means that 'mailfromd' was trying to create constant 'B' with the value of 'A' incremented by one, but was unable to do so, because the value in question was not numeric. Constants can be used in normal MFL expressions as well as in literals. To expand a constant within a literal string, prepend a percent sign to its name, e.g.: echo "New %text %x" => "New X is 2" This way of expanding constants creates an ambiguity if there happen to be a variable of the same name as the constant. *Note variable--constant clashes::, for more information of this case and ways to handle it. * Menu: * Built-in constants::  File: mailfromd.info, Node: Built-in constants, Up: Constants 4.8.1 Built-in constants ------------------------ Several constants are built into the MFL compiler. To discern them from user-defined ones, their names start and end with two underscores ('__'). The following constants are defined in 'mailfromd' version 8.8: -- Built-in constant: string __file__ Expands to the name of the current source file. -- Built-in constant: string __function__ Expands to the name of the current lexical context, i.e. the function or handler name. -- Built-in constant: string __git__ This built-in constant is defined for alpha versions only. Its value is the Git tag of the recent commit corresponding to that version of the package. If the release contains some uncommitted changes, the value of the '__git__' constant ends with the suffix '-dirty'. -- Built-in constant: number __line__ Expands to the current line number in the input source file. -- Built-in constant: number __major__ Expands to the major version number. The following example uses '__major__' constant to determine if some version-dependent feature can be used: if __major__ > 2 # Use some version-specific feature fi -- Built-in constant: number __minor__ Expands to the minor version number. -- Built-in constant: string __module__ Expands to the name of the current module (*note Modules::). -- Built-in constant: string __package__ Expands to the package name ('mailfromd') -- Built-in constant: number __patch__ For alpha versions and maintenance releases expands to the version patch level. For stable versions, expands to '0'. -- Built-in constant: string __defpreproc__ Expands to the default external preprocessor command line, if the preprocessor is used, or to an empty string if it is not, e.g.: __defpreproc__ => "/usr/bin/m4 -s" *Note Preprocessor::, for information on preprocessor and its features. -- Built-in constant: string __preproc__ Expands to the current external preprocessor command line, if the preprocessor is used, or to an empty string if it is not. Notice, that it equals '__defpreproc__', unless the preprocessor was redefined using '--preprocessor' command line option (*note -preprocessor: Preprocessor.). -- Built-in constant: string __version__ Expands to the textual representation of the program version (e.g. '3.0.90') -- Built-in constant: string __defstatedir__ Expands to the default state directory (*note statedir::). -- Built-in constant: string __statedir__ Expands to the current value of the program state directory (*note statedir::). Notice, that it is the same as '__defstatedir__' unless the state directory was redefined at run time. Built-in constants can be used as variables, this allows to expand them within strings or here-documents. The following example illustrates the common practice used for debugging configuration scripts: func foo(number x) do echo "%__file__:%__line__: foo called with arg %x" ... done If the function 'foo' were called in line 28 of the script file '/etc/mailfromd.mf', like this: 'foo(10)', you will see the following string in your logs: /etc/mailfromd.mf:28: foo called with arg 10  File: mailfromd.info, Node: Variables, Next: Back references, Prev: Constants, Up: MFL 4.9 Variables ============= Variables represent regions of memory used to hold variable data. These memory regions are identified by "variable names". A variable name must begin with a letter or underscore and must consist of letters, digits and underscores. Each variable is associated with its "scope of visibility", which defines the part of source code where it can be used (*note scope of visibility::). Depending on the scope, we discern three main classes of variables: public, static and automatic (or local). "Public variables" have indefinite lexical scope, so they may be referred to anywhere in the program. "Static" are variables visible only within their module (*note Modules::). "Automatic" or "local variables" are visible only within the given function or handler. Public and static variables are sometimes collectively called "global". These variable classes occupy separate "namespaces", so that an automatic variable can have the same name as an existing public or static one. In this case this variable is said to "shadow" its global counterpart. All references to such a name will refer to the automatic variable until the end of its scope is reached, where the global one becomes visible again. Likewise, a static variable may have the same name as a static variable defined in another module. However, it may not have the same name as a public variable. A variable is "declared" using the following syntax: [QUALIFIERS] TYPE NAME where NAME is the variable name, TYPE is the type of the data it is supposed to hold. It is 'string' for string variables and 'number' for numeric ones. For example, this is a declaration of a string variable 'var': string var Optional QUALIFIERS are allowed only in global declarations, i.e. in the variable declarations that appear outside of functions. They specify the scope of the variable. The 'public' qualifier declares the variable as public and the 'static' qualifier declares it as static. The default scope is 'public', unless specified otherwise in the module declaration (*note module structure::). Additionally, QUALIFIERS may contain the word 'precious', which instructs the compiler to mark this variable as "precious". (*note precious variables: rset.). The value of the precious variable is not affected by the SMTP 'RSET' command. If both scope qualifier and 'precious' are used, they may appear in any order, e.g.: static precious string rcpt_list or precious static string rcpt_list The declaration can be followed by any valid MFL expression, which supplies the "initial value" for the variable, for example: string var "test" If a variable declaration occurs within a function (*note User-defined: Functions.) or handler (*note Handlers::), it declares an automatic variable, local to this function or handler. Otherwise, it declares a global variable. A variable is assigned a value using 'set' statement: set NAME EXPR where NAME is the variable name and EXPR is a 'mailfromd' expression (*note Expressions::). The effect of this statement is that the EXPR is evaluated and the value it yields is assigned to the variable NAME. If the 'set' statement is located outside a function or handler definition, the EXPR must be a constant expression, i.e. the compiler should be able to evaluate it immediately. See optimizer. It is not an error to assign a value to a variable that is not declared. In this case the assignment first declares a global or automatic variable having the type of EXPR and then assigns a value to it. Automatic variable is created if the assignment occurs within a function or handler, global variable is declared if it occurs at topmost lexical level. This is called "implicit variable declaration". Variables are referenced using the notation '%NAME'. The variable being referenced must have been declared earlier (either explicitly or implicitly). * Menu: * Predefined variables::  File: mailfromd.info, Node: Predefined variables, Up: Variables 4.9.1 Predefined Variables -------------------------- Several variables are predefined. In 'mailfromd' version 8.8 these are: -- Variable: Predefined Variable number cache_used This variable is set by 'stdpoll' and 'strictpoll' built-ins (and, consequently, by the 'on poll' statement). Its value is '1' if the function used the cached data instead of directly polling the host, and '0' if the polling took place. *Note SMTP Callout functions::. You can use this variable to make your reject message more informative for the remote party. The common paradigm is to define a function, returning empty string if the result was obtained from polling, or some notice if cached data were used, and to use the function in the 'reject' text, for example: func cachestr() returns string do if cache_used return "[CACHED] " else return "" fi done Then, in 'prog envfrom' one can use: on poll $f do when not_found or failure: reject 550 5.1.0 cachestr() . "Sender validity not confirmed" done -- Predefined Variable: string clamav_virus_name Name of virus identified by 'ClamAV'. Set by 'clamav' function (*note ClamAV::). -- Predefined Variable: number greylist_seconds_left Number of seconds left to the end of greylisting period. Set by 'greylist' and 'is_greylisted' functions (*note Special test functions::). -- Predefined Variable: string ehlo_domain Name of the domain used by polling functions in SMTP 'EHLO' or 'HELO' command. Default value is the fully qualified domain name of the host where 'mailfromd' is run. *Note Polling::. -- Variable: Predefined Variable string last_poll_greeting Callout functions (*note SMTP Callout functions::) set this variable before returning. It contains the initial SMTP reply from the last polled host. -- Variable: Predefined Variable string last_poll_helo Callout functions (*note SMTP Callout functions::) set this variable before returning. It contains the reply to the 'HELO' ('EHLO') command, received from the last polled host. -- Variable: Predefined Variable string last_poll_host Callout functions (*note SMTP Callout functions::) set this variable before returning. It contains the host name or IP address of the last polled host. -- Variable: Predefined Variable string last_poll_recv Callout functions (*note SMTP Callout functions::) set this variable before returning. It contains the last SMTP reply received from the remote host. In case of multi-line replies, only the first line is stored. If nothing was received the variable contains the string 'nothing'. -- Variable: Predefined Variable string last_poll_sent Callout functions (*note SMTP Callout functions::) set this variable before returning. It contains the last SMTP command sent to the polled host. If nothing was sent, 'last_poll_sent' contains the string 'nothing'. -- Predefined Variable: string mailfrom_address Email address used by polling functions in SMTP 'MAIL FROM' command (*note Polling::.). Default is '<>'. Here is an example of how to change it: set mailfrom_address "postmaster@my.domain.com" You can set this value to a comma-separated list of email addresses, in which case the probing will try each address until either the remote party accepts it or the list of addresses is exhausted, whichever happens first. It is not necessary to enclose emails in angle brackets, as they will be added automatically where appropriate. The only exception is null return address, when used in a list of addresses. In this case, it should always be written as '<>'. For example: set mailfrom_address "postmaster@my.domain.com, <>" -- Predefined Variable: number sa_code Spam score for the message, set by 'sa' function (*note sa::). -- Predefined Variable: number rcpt_count The variable 'rcpt_count' keeps the number of recipients given so far by 'RCPT TO' commands. It is defined only in 'envrcpt' handlers. -- Predefined Variable: number sa_threshold Spam threshold, set by 'sa' function (*note sa::). -- Predefined Variable: string sa_keywords Spam keywords for the message, set by 'sa' function (*note sa::). -- Predefined Variable: number safedb_verbose This variable controls the verbosity of the exception-safe database functions. *Note safedb_verbose::.  File: mailfromd.info, Node: Back references, Next: Handlers, Prev: Variables, Up: MFL 4.10 Back references ==================== A "back reference" is a sequence '\D', where D is a decimal number. It refers to the Dth parenthesized subexpression in the last 'matches' statement(1). Any back reference occurring within a double-quoted string is replaced with the value of the corresponding subexpression. For example: if $f matches '.*@\(.*\)\.gnu\.org\.ua' set host \1 fi If the value of 'f' macro is 'smith@unza.gnu.org.ua', the above code will assign the string 'unza' to the variable 'host'. Notice, that each occurrence of 'matches' will reset the table of back references, so try to use them as early as possible. The following example illustrates a common error, when the back reference is used after the reference table has been reused by another matching: # Wrong! if $f matches '.*@\(.*\)\.gnu\.org\.ua' if $f matches 'some.*' set host \1 fi fi This will produce the following run time error: mailfromd: RUNTIME ERROR near file.mf:3: Invalid back-reference number because the inner match ('some.*') does not have any parenthesized subexpressions. *Note Special comparisons::, for more information about 'matches' operator. ---------- Footnotes ---------- (1) The subexpressions are numbered by the positions of their opening parentheses, left to right.  File: mailfromd.info, Node: Handlers, Next: begin/end, Prev: Back references, Up: MFL 4.11 Handlers ============= "Milter stage handler" (or "handler", for short) is a subroutine responsible for processing a particular milter state. There are eight handlers available. Their order of invocation and arguments are described in *note Figure 3.1: milter-control-flow. A handler is defined using the following construct: prog HANDLER-NAME do HANDLER-BODY done where HANDLER-NAME is the name of the handler (*note handler names::), HANDLER-BODY is the list of filter statements composing the handler body. Some handlers take arguments, which can be accessed within the HANDLER-BODY using the notation $N, where N is the ordinal number of the argument. Here we describe the available handlers and their arguments: -- Handler: connect (string $1, number $2, number $3, string $4) Invocation: This handler is called once at the beginning of each SMTP connection. Arguments: 1. 'string'; The host name of the message sender, as reported by MTA. Usually it is determined by a reverse lookup on the host address. If the reverse lookup fails, '$1' will contain the message sender's IP address enclosed in square brackets (e.g. '[127.0.0.1]'). 2. 'number'; Socket address family. You need to require the 'status' module to get symbolic definitions for the address families. Supported families are: Constant Value Meaning ------------------------------------------------------------ FAMILY_STDIO 0 Standard input/output (the MTA is run with '-bs' option) FAMILY_UNIX 1 UNIX socket FAMILY_INET 2 IPv4 protocol FAMILY_INET6 3 IPv6 protocol Table 4.3: Supported socket families 3. 'number'; Port number if '$2' is 'FAMILY_INET'. 4. 'string'; Remote IP address if '$2' is 'FAMILY_INET' or full file name of the socket if '$2' is 'FAMILY_UNIX'. If '$2' is 'FAMILY_STDIO', '$4' is an empty string. The actions (*note Actions::) appearing in this handler are handled by Sendmail in a special way. First of all, any textual message is ignored. Secondly, the only action that immediately closes the connection is 'tempfail 421'. Any other reply codes result in Sendmail switching to "nullserver" mode, where it accepts any commands, but answers with a failure to any of them, except for the following: 'QUIT', 'HELO', 'NOOP', which are processed as usual. The following table summarizes the Sendmail behavior depending on the action used: 'tempfail 421 EXCODE MESSAGE' The caller is returned the following error message: 421 4.7.0 HOSTNAME closing connection Both EXCODE and MESSAGE are ignored. 'tempfail 4XX EXCODE MESSAGE' (where XX represents any digits, except '21') Both EXCODE and MESSAGE are ignored. Sendmail switches to nullserver mode. Any subsequent command, excepting the ones listed above, is answered with 454 4.3.0 Please try again later 'reject 5XX EXCODE MESSAGE' (where XX represents any digits). All arguments are ignored. Sendmail switches to nullserver mode. Any subsequent command, excepting ones listed above, is answered with 550 5.0.0 Command rejected Regarding reply codes, this behavior complies with RFC 2821 (section 3.9), which states: An SMTP server _must not_ intentionally close the connection except: [...] - After detecting the need to shut down the SMTP service and returning a 421 response code. This response code can be issued after the server receives any command or, if necessary, asynchronously from command receipt (on the assumption that the client will receive it after the next command is issued). However, the RFC says nothing about textual messages and extended error codes, therefore Sendmail's ignoring of these is, in my opinion, absurd. My practice shows that it is often reasonable, and even necessary, to return a meaningful textual message if the initial connection is declined. The opinion of 'mailfromd' users seems to support this view. Bearing this in mind, 'mailfromd' is shipped with a patch for Sendmail, which makes it honor both extended return code and textual message given with the action. Two versions are provided: 'etc/sendmail-8.13.7.connect.diff', for Sendmail versions 8.13.x, and 'etc/sendmail-8.14.3.connect.diff', for Sendmail versions 8.14.3. -- Handler: helo (string $1) Invocation: This handler is called whenever the SMTP client sends 'HELO' or 'EHLO' command. Depending on the actual MTA configuration, it can be called several times or even not at all. Arguments: 1. 'string'; Argument to 'HELO' ('EHLO') commands. Notes: According to RFC 28221, '$1' must be domain name of the sending host, or, in case this is not available, its IP address enclosed in square brackets. Be careful when taking decisions based on this value, because in practice many hosts send arbitrary strings. We recommend to use 'heloarg_test' function (*note heloarg_test::) if you wish to analyze this value. -- Handler: envfrom (string $1, string $2) Invocation: Called when the SMTP client sends 'MAIL FROM' command, i.e. once at the beginning of each message. Arguments: 1. 'string'; First argument to the 'MAIL FROM' command, i.e. the email address of the sender. 2. 'string'; Rest of arguments to 'MAIL FROM' separated by space character. This argument can be '""'. Notes 1. '$1' is not the same as '$f' Sendmail variable, because the latter contains the sender email after address rewriting and normalization, while '$1' contains exactly the value given by sending party. 2. When the array type is implemented, '$2' will contain an array of arguments. -- Handler: envrcpt (string $1, string $2) Invocation: Called once for each 'RCPT TO' command, i.e. once for each recipient, immediately after 'envfrom'. Arguments: 1. 'string'; First argument to the 'RCPT TO' command, i.e. the email address of the recipient. 2. 'string'; Rest of arguments to 'RCPT TO' separated by space character. This argument can be '""'. Notes: When the array type is implemented, '$2' will contain an array of arguments. -- Handler: data () Invocation: Called after the MTA receives SMTP 'DATA' command. Notice that this handler is not supported by Sendmail versions prior to 8.14.0 and Postfix versions prior to 2.5. Arguments: None -- Handler: header (string $1, string $2) Invocation: Called once for each header line received after SMTP 'DATA' command. Arguments: 1. 'string'; Header field name. 2. 'string'; Header field value. The content of the header may include folded white space, i.e., multiple lines with following white space where lines are separated by LF (ASCII 10). The trailing line terminator (CR/LF) is removed. -- Handler: eoh Invocation: This handler is called once per message, after all headers have been sent and processed. Arguments: None. -- Handler: body (pointer $1, number $2) Invocation: This header is called zero or more times, for each piece of the message body obtained from the remote host. Arguments: 1. 'pointer'; Piece of body text. See 'Notes' below. 2. 'number'; Length of data pointed to by '$1', in bytes. Notes: The first argument points to the body chunk. Its size may be quite considerable and passing it as a string may be costly both in terms of memory and execution time. For this reason it is not passed as a string, but rather as a "generic pointer", i.e. an object having the same size as 'number', which can be used to retrieve the actual contents of the body chunk if the need arises. A special function 'body_string' is provided to convert this object to a regular MFL string (*note Mail body functions::). Using it you can collect the entire body text into a single global variable, as illustrated by the following example: string text prog body do set text text . body_string($1,$2) done The text collected this way can then be used in the 'eom' handler (see below) to parse and analyze it. If you wish to analyze both the headers and mail body, the following code fragment will do that for you: string text # Collect all headers. prog header do set text text . $1 . ": " . $2 . "\n" done # Append terminating newline to the headers. prog eoh do set text "%text\n" done # Collect message body. prog body do set text text . body_string($1, $2) done -- Handler: eom Invocation: This handler is called once per message, when the terminating dot after 'DATA' command has been received. Arguments: None Notes: This handler is useful for calling "message capturing" functions, such as 'sa' or 'clamav'. For more information about these, refer to *note Interfaces to Third-Party Programs::. For your reference, the following table shows each handler with its arguments: Handler $1 $2 $3 $4 --------------------------------------------------------------------------- connect Hostname Socket Port Remote Family address helo 'HELO' N/A N/A N/A domain envfrom Sender email Rest of N/A N/A address arguments envrcpt Recipient Rest of N/A N/A email arguments address header Header name Header value N/A N/A eoh N/A N/A N/A N/A body Body segment Length of N/A N/A (pointer) the segment (numeric) eom N/A N/A N/A N/A Table 4.4: State Handler Arguments  File: mailfromd.info, Node: begin/end, Next: Functions, Prev: Handlers, Up: MFL 4.12 The 'begin' and 'end' special handlers =========================================== Apart from the milter handlers described in the previous section, MFL defines two special handlers, called 'begin' and 'end', which supply startup and cleanup instructions for the filter program. The 'begin' special handler is executed once for each SMTP session, after the connection has been established but before the first milter handler has been called. Similarly, the 'end' handler is executed exactly once, after the connection has been closed. Neither of them takes any arguments. The two handlers are defined using the following syntax: # Begin handler begin do ... done # End handler end do ... done where '...' represent any MFL statements. An MFL program may have multiple 'begin' and 'end' definitions. They can be intermixed with other definitions. The compiler combines all 'begin' statements into a single one, in the order they appear in the sources. Similarly, all 'end' blocks are concatenated together. The resulting 'begin' is called once, at the beginning of each SMTP session, and 'end' is called once at its termination. Multiple 'begin' and 'end' handlers are a useful feature for writing modules (*note Modules::), because each module can thus have its own initialization and cleanup blocks. Notice, however, that in this case the order in which subsequent 'begin' and 'end' blocks are executed is not defined. It is only warranted that all 'begin' blocks are executed at startup and all 'end' blocks are executed at shutdown. It is also warranted that all 'begin' and 'end' blocks defined within a compilation unit (i.e. a single abstract source file, with all '#include' and '#include_once' statements expanded in place) are executed in order of their appearance in the unit. Due to their special nature, the startup and cleanup blocks impose certain restrictions on the statements that can be used within them: 1. 'return' cannot be used in 'begin' and 'end' handlers. 2. The following Sendmail actions cannot be used in them: 'accept', 'continue', 'discard', 'reject', 'tempfail'. They can, however, be used in 'catch' statements, declared in 'begin' blocks (see example below). 3. Header manipulation actions (*note header manipulation::) cannot be used in 'end' handler. The 'begin' handlers are the usual place to put global initialization code to. For example, if you do not want to use DNS caching, you can do it this way: begin do db_set_active("dns", 0) done Additionally, you can set up global exception handling routines there. For example, the following 'begin' statement disables DNS cache and, for all exceptions not handled otherwise, installs a handler that logs the exception along with the stack trace and continues processing the message: begin do db_set_active("dns", 0) catch * do echo "Caught exception $1: $2" stack_trace() continue done done  File: mailfromd.info, Node: Functions, Next: Expressions, Prev: begin/end, Up: MFL 4.13 Functions ============== A "function" is a named 'mailfromd' subroutine, which takes zero or more "parameters" and optionally returns a certain value. Depending on the return value, functions can be subdivided into "string functions" and "number functions". A function may have "mandatory" and "optional parameters". When invoked, the function must be supplied exactly as many "actual arguments" as the number of its mandatory parameters. Functions are invoked using the following syntax: NAME (ARGS) where NAME is the function name and ARGS is a comma-separated list of expressions. For example, the following are valid function calls: foo(10) interval("1 hour") greylist("/var/my.db", 180) The number of parameters a function takes and their data types compose the "function signature". When actual arguments are passed to the function, they are converted to types of the corresponding formal parameters. There are two major groups of functions: "built-in" functions, that are implemented in the 'mailfromd' binary, and "user-defined" functions, that are written in MFL. The invocation syntax is the same for both groups. 'Mailfromd' is shipped with a rich set of "library functions". These are described in *note Library::. In addition to these you can define your own functions. Function definitions can appear anywhere between the handler declarations in a filter program, the only requirement being that the function definition occur before the place where the function is invoked. The syntax of a function definition is: [QUALIFIER] func NAME (PARAM-DECL) returns DATA-TYPE do FUNCTION-BODY done where NAME is the name of the function to define, PARAM-DECL is a comma-separated list of parameter declarations. The syntax of the latter is the same as that of variable declarations (*note Variable declarations: Variables.), i.e.: TYPE NAME declares the parameter NAME having the type TYPE. The TYPE is 'string' or 'number'. Optional QUALIFIER declares the scope of visibility for that function (*note scope of visibility::). It is similar to that of variables, except that functions cannot be local (i.e. you cannot declare function within another function). The 'public' qualifier declares a function that may be referred to from any module, whereas the 'static' qualifier declares a function that may be called only from the current module (*note Modules::). The default scope is 'public', unless specified otherwise in the module declaration (*note module structure::). For example, the following declares a function 'sum', that takes two numeric arguments and returns a numeric value: func sum(number x, number y) returns number Similarly, the following is a declaration of a static function: static func sum(number x, number y) returns number Parameters are referenced in the FUNCTION-BODY by their name, the same way as other variables. Similarly, the value of a parameter can be altered using 'set' statement. A function can be declared to take a certain number of "optional arguments". In a function declaration, optional abstract arguments must be placed after the mandatory ones, and must be separated from them with a semicolon. The following example is a definition of function 'foo', which takes two mandatory and two optional arguments: func foo(string msg, string email; number x, string pfx) Mandatory parameters are: 'msg' and 'email'. Optional parameters are: 'x' and 'pfx'. The actual number of arguments supplied to the function is returned by a special construct '$#'. In addition, the special construct '@ARG' evaluates to the ordinal number of variable ARG in the list of formal parameters (the first argument has number '0'). These two constructs can be used to verify whether an argument is supplied to the function. When an actual argument for parameter 'n' is supplied, the number of actual arguments ('$#') is greater than the ordinal number of that parameter in the declaration list ('@N'). Thus, the following construct can be used to check if an optional argument ARG is actually supplied: func foo(string msg, string email; number x, string arg) do if $# > @arg ... fi The default 'mailfromd' installation provides a special macro for this purpose: *note defined::. Using it, the example above could be rewritten as: func foo(string msg, string email; number x, string arg) do if defined(arg) ... fi Within a function body, optional arguments are referenced exactly the same way as the mandatory ones. Attempt to dereference an optional argument for which no actual parameter was supplied, results in an undefined value, so be sure to check whether a parameter is passed before dereferencing it. A function can also take variable number of arguments (such functions are called "variadic"). This is indicated by the use of ellipsis as the last abstract parameter. The statement below defines a function 'foo' taking one mandatory, one optional and any number of additional arguments: func foo (string a ; string b, ...) All actual arguments passed in a list of variable arguments are coerced to string data type. To refer to these arguments in the function body, the following construct is used: $(EXPR) where EXPR is any valid MFL expression, evaluating to a number N. This construct refers to the value of Nth actual parameter from the variable argument list. Parameters are numbered from '1', so the first variable parameter is '$(1)', and the last one is '$($# - NM - NO)', where NM and NO are numbers of mandatory and optional parameters to the function. For example, the function below prints all its arguments: func pargs (string text, ...) do echo "text=%text" loop for number i 1, while i <= $# - 1, set i i + 1 do echo "arg %i=" . $(i) done done Note the loop limits. The last variable argument has number '$# - 1', because the function takes one mandatory argument. The FUNCTION-BODY is any list of valid 'mailfromd' statements. In addition to the statements discussed below (*note Statements::) it can also contain the 'return' statement, which is used to return a value from the function. The syntax of the return statement is return VALUE As an example of this, consider the following code snippet that defines the function 'sum' to return a sum of its two arguments: func sum(number x, number y) returns number do return x + y done The 'returns' part in the function declaration is optional. A declaration lacking it defines a "procedure", or "void function", i.e. a function that is not supposed to return any value. Such functions cannot be used in expressions, instead they are used as statements (*note Statements::). The following example shows a function that emits a customized temporary failure notice: func stdtf() do tempfail 451 4.3.5 "Try again later" done A function may have several names. An alternative name (or "alias") can be assigned to a function by using 'alias' keyword, placed after PARAM-DECL part, for example: func foo() alias bar returns string do ... done After this declaration, both 'foo()' and 'bar()' will refer to the same function. The number of function aliases is unlimited. The following fragment declares a function having three names: func foo() alias bar alias baz returns string do ... done Although this feature is rarely needed, there are sometimes cases when it may be necessary. A variable declared within a function becomes a local variable to this function. Its lexical scope ends with the terminating 'done' statement. Parameters, local variables and global variables are using separate namespaces, so a parameter name can coincide with the name of a global, in which case a parameter is said to "shadow" the global. All references to its name will refer to the parameter, until the end of its scope is reached, where the global one becomes visible again. Consider the following example: number x func foo(string x) do echo "foo: %x" done prog envfrom do set x "Global" foo("Local") echo x done Running 'mailfromd --test' with this configuration will display: foo: Local Global * Menu: * Some Useful Functions::  File: mailfromd.info, Node: Some Useful Functions, Up: Functions 4.13.1 Some Useful Functions ---------------------------- To illustrate the concept of user-defined functions, this subsection shows the definitions of some of the library functions shipped with 'mailfromd'(1). These functions are contained in modules installed along with the 'mailfromd' binary. To use any of them in your code, require the appropriate module as described in *note import::, e.g. to use the 'revip' function, do 'require 'revip''. Functions and their definitions: 1. 'revip' The function 'revip' (*note revip::) is implemented as follows: func revip(string ip) returns string do return inet_ntoa(ntohl(inet_aton(ip))) done Previously it was implemented using regular expressions. Below we include this variant as well, as an illustration for the use of regular expressions: #pragma regex push +extended func revip(string ip) returns string do if ip matches '([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)' return "\4.\3.\2.\1" fi return ip done #pragma regex pop 2. 'strip_domain_part' This function returns at most N last components of the domain name DOMAIN (*note strip_domain_part::). #pragma regex push +extended func strip_domain_part(string domain, number n) returns string do if n > 0 and domain matches '.*((\.[^.]+){' . $2 . '})' return substring(\1, 1, -1) else return domain fi done #pragma regex pop 3. 'valid_domain' *Note valid_domain::, for a description of this function. Its definition follows: require dns func valid_domain(string domain) returns number do return not (resolve(domain) = "0" and not hasmx(domain)) done 4. 'match_dnsbl' The function 'match_dnsbl' (*note match_dnsbl::) is defined as follows: require dns require match_cidr #pragma regex push +extended func match_dnsbl(string address, string zone, string range) returns number do string rbl_ip if range = 'ANY' set rbl_ip '127.0.0.0/8' else set rbl_ip range if not range matches '^([0-9]{1,3}\.){3}[0-9]{1,3}$' return 0 fi fi if not (address matches '^([0-9]{1,3}\.){3}[0-9]{1,3}$' and address != range) return 0 fi if address matches '^([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})$' if match_cidr (resolve ("\4.\3.\2.\1", zone), rbl_ip) return 1 else return 0 fi fi # never reached done ---------- Footnotes ---------- (1) Notice that these are intended for educational purposes and do not necessarily coincide with the actual definitions of these functions in Mailfromd version 8.8.  File: mailfromd.info, Node: Expressions, Next: Shadowing, Prev: Functions, Up: MFL 4.14 Expressions ================ Expressions are language constructs, that evaluate to a value, that can subsequently be echoed, tested in a conditional statement, assigned to a variable or passed to a function. * Menu: * Constant expressions:: String and Numeric Constants. * Function calls:: A Function Call is an Expression. * Concatenation:: String Concatenation. * Arithmetic operations:: '+', '-', etc. * Bitwise shifts:: '<<' and '>>'. * Relational expressions:: '=', '<', etc. * Special comparisons:: 'matches', 'mx matches', etc. * Boolean expressions:: 'and', 'or', 'not'. * Precedence:: How various operators nest. * Type casting::  File: mailfromd.info, Node: Constant expressions, Next: Function calls, Up: Expressions 4.14.1 Constant Expressions --------------------------- Literals and numbers are "constant expressions". They evaluate to string and numeric types.  File: mailfromd.info, Node: Function calls, Next: Concatenation, Prev: Constant expressions, Up: Expressions 4.14.2 Function Calls --------------------- A function call is an expression. Its type is the return type of the function.  File: mailfromd.info, Node: Concatenation, Next: Arithmetic operations, Prev: Function calls, Up: Expressions 4.14.3 Concatenation -------------------- Concatenation operator is '.' (a dot). For example, if '$f' is 'smith', and '$client_addr' is '10.10.1.1', then: $f . "-" . $client_addr => "smith-10.10.1.1" Any two adjacent literal strings are concatenated, producing a new string, e.g. "GNU's" " not " "UNIX" => "GNU's not UNIX"  File: mailfromd.info, Node: Arithmetic operations, Next: Bitwise shifts, Prev: Concatenation, Up: Expressions 4.14.4 Arithmetic Operations ---------------------------- The filter script language offers the common arithmetic operators: '+', '-', '*' and '/'. In addition, the '%' is a "modulo" operator, i.e. it computes the remainder of division of its operands. All of them follow usual precedence rules and work as you would expect them to.  File: mailfromd.info, Node: Bitwise shifts, Next: Relational expressions, Prev: Arithmetic operations, Up: Expressions 4.14.5 Bitwise shifts --------------------- The '<<' represents a "bitwise shift left" operation, which shifts the binary representation of the operand on its left by the number of bits given by the operand on its right. Similarly, the '>>' represents a "bitwise shift right".  File: mailfromd.info, Node: Relational expressions, Next: Special comparisons, Prev: Bitwise shifts, Up: Expressions 4.14.6 Relational Expressions ----------------------------- Relational expressions are: Expression Result -------------------------------------------------------------------------- X '<' Y True if X is less than Y. X '<=' Y True if X is less than or equal to Y. X '>' Y True if X is greater than Y. X '>=' Y True if X is greater than or equal to Y. X '=' Y True if X is equal to Y. X '!=' Y True if X is not equal to Y. Table 4.5: Relational Expressions The relational expressions apply to string as well as to numbers. When a relational operation applies to strings, case-sensitive comparison is used, e.g.: "String" = "string" => False "String" < "string" => True  File: mailfromd.info, Node: Special comparisons, Next: Boolean expressions, Prev: Relational expressions, Up: Expressions 4.14.7 Special Comparisons -------------------------- In addition to the traditional relational operators, described above, 'mailfromd' provides two operators for regular expression matching: Expression Result -------------------------------------------------------------------------- X 'matches' Y True if the string X matches the regexp denoted by Y. X 'fnmatches' Y True if the string X matches the globbing pattern denoted by Y. Table 4.6: Regular Expression Matching The type of the regular expression used by 'matches' operator is controlled by '#pragma regex' (*note pragma regex::). For example: $f => "gray@gnu.org.ua" $f matches '.*@gnu\.org\.ua' => true $f matches '.*@GNU\.ORG\.UA' => false #pragma regex +icase $f matches '.*@GNU\.ORG\.UA' => true The 'fnmatches' operator compares its left-hand operand with a globbing pattern (see 'glob(7)') given as its right-hand side operand. For example: $f => "gray@gnu.org.ua" $f fnmatches "*ua" => true $f fnmatches "*org" => false $f fnmatches "*org*" => true Both operators have a special form, for "'MX' pattern matching". The expression: X mx matches Y is evaluated as follows: first, the expression X is analyzed and, if it is an email address, its domain part is selected. If it is not, its value is used verbatim. Then the list of 'MX's for this domain is looked up. Each of 'MX' names is then compared with the regular expression Y. If any of the names matches, the expression returns true. Otherwise, its result is false. Similarly, the expression: X mx fnmatches Y returns true only if any of the 'MX's for (domain or email) X match the globbing pattern Y. Both 'mx matches' and 'mx fnmatches' can signal the following exceptions: 'e_temp_failure', 'e_failure'. The value of any parenthesized subexpression occurring within the right-hand side argument to 'matches' or 'mx matches' can be referenced using the notation '\D', where D is the ordinal number of the subexpression (subexpressions are numbered from left to right, starting at 1). This notation is allowed in the program text as well as within double-quoted strings and here-documents, for example: if $f matches '.*@\(.*\)\.gnu\.org\.ua' set message "Your host name is \1;" fi Remember that the grouping symbols are '\(' and '\)' for basic regular expressions, and '(' and ')' for extended regular expressions. Also make sure you properly escape all special characters (backslashes in particular) in double-quoted strings, or use single-quoted strings to avoid having to do so (*note singe-vs-double::, for a comparison of the two forms).  File: mailfromd.info, Node: Boolean expressions, Next: Precedence, Prev: Special comparisons, Up: Expressions 4.14.8 Boolean Expressions -------------------------- A "boolean expression" is a combination of relational or matching expressions using the boolean operators 'and', 'or' and 'not', and, eventually, parentheses to control nesting: Expression Result -------------------------------------------------------------------------- X 'and' Y True only if both X and Y are true. X 'or' Y True if any of X or Y is true. 'not' X True if X is false. table 4.1: Boolean Operators Binary boolean expressions are computed using "shortcut evaluation": 'X and Y' If 'X => false', the result is 'false' and Y is not evaluated. 'X or Y' If 'X => true', the result is 'true' and Y is not evaluated.  File: mailfromd.info, Node: Precedence, Next: Type casting, Prev: Boolean expressions, Up: Expressions 4.14.9 Operator Precedence -------------------------- Operator "precedence" is an abstract value associated with each language operator, that determines the order in which operators are executed when they appear together within a single expression. Operators with higher precedence are executed first. For example, '*' has a higher precedence than '+', therefore the expression 'a + b * c' is evaluated in the following order: first 'b' is multiplied by 'c', then 'a' is added to the product. When operators of equal precedence are used together they are evaluated from left to right (i.e., they are "left-associative"), except for comparison operators, which are non-associative (these are explicitly marked as such in the table below). This means that you cannot write: if 5 <= x <= 10 Instead, you should write: if 5 <= x and x <= 10 The precedences of the 'mailfromd' operators where selected so as to match that used in most programming languages.(1) The following table lists all operators in order of decreasing precedence: '(...)' Grouping '$ %' 'Sendmail' macros and 'mailfromd' variables '* /' Multiplication, division '+ -' Addition, subtraction '<< >>' Bitwise shift left and right '< <= >= >' Relational operators (non-associative) '= != matches fnmatches' Equality and special comparison (non-associative) '&' Logical (bitwise) AND '^' Logical (bitwise) XOR '|' Logical (bitwise) OR 'not' Boolean negation 'and' Logical 'and'. 'or' Logical 'or' '.' String concatenation ---------- Footnotes ---------- (1) The only exception is 'not', whose precedence in MFL is much lower than usual (in most programming languages it has the same precedence as unary '-'). This allows to write conditional expressions in more understandable manner. Consider the following condition: if not x < 2 and y = 3 It is understood as "if 'x' is not less than 2 and 'y' equals 3", whereas with the usual precedence for 'not' it would have meant "if negated 'x' is less than 2 and 'y' equals 3".  File: mailfromd.info, Node: Type casting, Prev: Precedence, Up: Expressions 4.14.10 Type Casting -------------------- When two operands on each side of a binary expression have different type, 'mailfromd' evaluator coerces them to a common type. This is known as "implicit type casting". The rules for implicit type casting are: 1. Both arguments to an arithmetical operation are cast to numeric type. 2. Both arguments to the concatenation operation are cast to string. 3. Both arguments to 'match' or 'fnmatch' function are cast to string. 4. The argument of the unary negation (arithmetical or boolean) is cast to numeric. 5. Otherwise the right-hand side argument is cast to the type of the left-hand side argument. The construct for explicit type cast is: TYPE(EXPR) where TYPE is the name of the type to coerce EXPR to. For example: string(2 + 4*8) => "34"  File: mailfromd.info, Node: Shadowing, Next: Statements, Prev: Expressions, Up: MFL 4.15 Variable and Constant Shadowing ==================================== When any two named entities happen to have the same name we say that a "name clash" occurs. The handling of name clashes depends on types of the entities involved in it. function - any -------------- A name of a constant or variable can coincide with that of a function, it does not produce any warnings or errors because functions, variables and constants use different namespaces. For example, the following code is correct: const a 4 func a() do echo a done When executed, it prints '4'. function - function, handler - function, and function - handler --------------------------------------------------------------- Redefinition of a function or using a predefined handler name (*note Handlers::) as a function name results in a fatal error. For example, compiling this code: func a() do echo "1" done func a() do echo "2" done causes the following error message: mailfromd: sample.mf:9: syntax error, unexpected FUNCTION_PROC, expecting IDENTIFIER handler - variable ------------------ A variable name can coincide with a handler name. For example, the following code is perfectly OK: string envfrom "M" prog envfrom do echo envfrom done handler - handler ----------------- If two handlers with the same name are defined, the definition that appears further in the source text replaces the previous one. A warning message is issued, indicating locations of both definitions, e.g.: mailfromd: sample.mf:116: Warning: Redefinition of handler `envfrom' mailfromd: sample.mf:34: Warning: This is the location of the previous definition variable - variable ------------------- Defining a variable having the same name as an already defined one results in a warning message being displayed. The compilation succeeds. The second variable "shadows" the first, that is any subsequent references to the variable name will refer to the second variable. For example: string x "Text" number x 1 prog envfrom do echo x done Compiling this code results in the following diagnostics: mailfromd: sample.mf:4: Redeclaring `x' as different data type mailfromd: sample.mf:2: This is the location of the previous definition Executing it prints '1', i.e. the value of the last definition of 'x'. The scope of the shadowing depends on storage classes of the two variables. If both of them have external storage class (i.e. are global ones), the shadowing remains in effect until the end of input. In other words, the previous definition of the variable is effectively forgotten. If the previous definition is a global, and the shadowing definition is an automatic variable or a function parameter, the scope of this shadowing ends with the scope of the second variable, after which the previous definition (global) becomes visible again. Consider the following code: set x "initial" func foo(string x) returns string do return x done prog envfrom do echo foo("param") echo x done Its compilation produces the following warning: mailfromd: sample.mf:3: Warning: Parameter `x' is shadowing a global When executed, it produces the following output: param initial State envfrom: continue variable - constant ------------------- If a constant is defined which has the same name as a previously defined variable (the constant "shadows" the variable), the compiler prints the following diagnostic message: FILE:LINE: Warning: Constant name `NAME' clashes with a variable name FILE:LINE: Warning: This is the location of the previous definition A similar diagnostics is issued if a variable is defined whose name coincides with a previously defined constant (the variable shadows the constant). In any case, any subsequent notation %NAME refers to the last defined symbol, be it variable or constant. Notice, that shadowing occurs only when using %NAME notation. Referring to the constant using its name without '%' allows to avoid shadowing effects. If a variable shadows a constant, the scope of the shadowing depends on the storage class of the variable. For automatic variables and function parameters, it ends with the final 'done' closing the function. For global variables, it lasts up to the end of input. For example, consider the following code: const a 4 func foo(string a) do echo a done prog envfrom do foo(10) echo a done When run, it produces the following output: $ mailfromd --test sample.mf mailfromd: sample.mf:3: Warning: Variable name `a' clashes with a constant name mailfromd: sample.mf:1: Warning: This is the location of the previous definition 10 4 State envfrom: continue constant - constant ------------------- Redefining a constant produces a warning message. The latter definition shadows the former. Shadowing remains in effect until the end of input.  File: mailfromd.info, Node: Statements, Next: Conditionals, Prev: Shadowing, Up: MFL 4.16 Statements =============== Statements are language constructs, that, unlike expressions, do not return any value. Statements execute some actions, such as assigning a value to a variable, or serve to control the execution flow in the program. * Menu: * Actions:: Actions control the handling of the mail. * Assignments:: * Pass:: * Echo::  File: mailfromd.info, Node: Actions, Next: Assignments, Up: Statements 4.16.1 Action Statements ------------------------ An "action" statement instructs 'mailfromd' to perform a certain action over the message being processed. There are two kinds of actions: return actions and header manipulation actions. Reply Actions ............. Reply actions tell 'Sendmail' to return given response code to the remote party. There are five such actions: 'accept' Return an 'accept' reply. The remote party will continue transmitting its message. 'reject CODE EXCODE MESSAGE-EXPR' 'reject (CODE-EXPR, EXCODE-EXPR, MESSAGE-EXPR)' Return a 'reject' reply. The remote party will have to cancel transmitting its message. The three arguments are optional, their usage is described below. 'tempfail CODE EXCODE MESSAGE' 'tempfail (CODE-EXPR, EXCODE-EXPR, MESSAGE-EXPR)' Return a 'temporary failure' reply. The remote party can retry to send its message later. The three arguments are optional, their usage is described below. 'discard' Instructs 'Sendmail' to accept the message and silently discard it without delivering it to any recipient. 'continue' Stops the current handler and instructs 'Sendmail' to continue processing of the message. Two actions, 'reject' and 'tempfail' can take up to three optional parameters. There are two forms of supplying these parameters. In the first form, called "literal" or "traditional" notation, the arguments are supplied as additional words after the action name, and are separated by whitespace. The first argument is a three-digit RFC 2821 reply code. It must begin with '5' for 'reject' and with '4' for 'tempfail'. If two arguments are supplied, the second argument must be either an "extended reply code" (RFC 1893/2034) or a textual string to be returned along with the SMTP reply. Finally, if all three arguments are supplied, then the second one must be an extended reply code and the third one must give the textual string. The following examples illustrate the possible ways of using the 'reject' statement: reject reject 503 reject 503 5.0.0 reject 503 "Need HELO command" reject 503 5.0.0 "Need HELO command" The notion "textual string", used above means either a literal string or an MFL expression that evaluates to string. However, both code and extended code must always be literal. The second form of supplying arguments is called "functional" notation, because it resembles the function syntax. When used in this form, the action word is followed by a parenthesized group of exactly three arguments, separated by commas. Each argument is a MFL expression. The meaning and ordering of the arguments is the same as in literal form. Any or all of these three arguments may be absent, in which case it will be replaced by the default value. To illustrate this, here are the statements from the previous example, written in functional notation: reject(,,) reject(503,,) reject(503, 5.0.0) reject(503, , "Need HELO command") reject(503, 5.0.0, "Need HELO command") Notice that there is an important difference between the two notations. The functional notation allows to compute both reply codes at run time, e.g.: reject(500 + dig2*10 + dig3, "5.%edig2.%edig2") Header Actions .............. Header manipulation actions provide basic means to add, delete or modify the message RFC 2822 headers. 'add NAME STRING' Add the header NAME with the value STRING. E.g.: add "X-Seen-By" "Mailfromd 8.8" (notice argument quoting) 'replace NAME STRING' The same as 'add', but if the header NAME already exists, it will be removed first, for example: replace "X-Last-Processor" "Mailfromd 8.8" 'delete NAME' Delete the header named NAME: delete "X-Envelope-Date" These actions impose some restrictions. First of all, their first argument must be a literal string (not a variable or expression). Secondly, there is no way to select a particular header instance to delete or replace, which may be necessary to properly handle multiple headers (e.g. 'Received'). For more elaborate ways of header modifications, see *note Header modification functions::.  File: mailfromd.info, Node: Assignments, Next: Pass, Prev: Actions, Up: Statements 4.16.2 Variable Assignments --------------------------- An "assignment" is a special statement that assigns a value to the variable. It has the following syntax: set NAME VALUE where NAME is the variable name and VALUE is the value to be assigned to it. Assignment statements can appear in any part of a filter program. If an assignment occurs outside of function or handler definition, the VALUE must be a literal value (*note Literals::). If it occurs within a function or handler definition, VALUE can be any valid 'mailfromd' expression (*note Expressions::). In this case, the expression will be evaluated and its value will be assigned to the variable. For example: set delay 150 prog envfrom do set delay delay * 2 ... done  File: mailfromd.info, Node: Pass, Next: Echo, Prev: Assignments, Up: Statements 4.16.3 The 'pass' statement --------------------------- The 'pass' statement has no effect. It is used in places where no statement is needed, but the language syntax requires one: on poll $f do when success: pass when not_found or failure: reject 550 done  File: mailfromd.info, Node: Echo, Prev: Pass, Up: Statements 4.16.4 The 'echo' statement --------------------------- The 'echo' statement concatenates all its arguments into a single string and sends it to the 'syslog' using the priority 'info'. It is useful for debugging your script, in conjunction with built-in constants (*note Built-in constants::), for example: func foo(number x) do echo "%__file__:%__line__: foo called with arg %x" ... done  File: mailfromd.info, Node: Conditionals, Next: Loops, Prev: Statements, Up: MFL 4.17 Conditional Statements =========================== "Conditional expressions", or conditionals for short, test some conditions and alter the control flow depending on the result. There are two kinds of conditional statements: "if-else" branches and "switch" statements. The syntax of an "if-else" branching construct is: if CONDITION THEN-BODY [else ELSE-BODY] fi Here, CONDITION is an expression that governs control flow within the statement. Both THEN-BODY and ELSE-BODY are lists of 'mailfromd' statements. If CONDITION is true, THEN-BODY is executed, if it is false, ELSE-BODY is executed. The 'else' part of the statement is optional. The condition is considered false if it evaluates to zero, otherwise it is considered true. For example: if $f = "" accept else reject fi This will accept the message if the value of the 'Sendmail' macro '$f' is an empty string, and reject it otherwise. Both THEN-BODY and ELSE-BODY can be compound statements including other 'if' statements. Nesting level of conditional statements is not limited. To facilitate writing complex conditional statements, the 'elif' keyword can be used to introduce alternative conditions, for example: if $f = "" accept elif $f = "root" echo "Mail from root!" else reject fi Another type of branching instruction is 'switch' statement: switch CONDITION do case X1 [or X2 ...]: STMT1 case Y1 [or Y2 ...]: STMT2 . . . [default: STMT] done Here, X1, X2, Y1, Y2 are literal expressions; STMT1, STMT2 and STMT are arbitrary 'mailfromd' statements (possibly compound); CONDITION is the controlling expression. The vertical dotted row represent another eventual 'case' branches. This statement is executed as follows: the CONDITION expression is evaluated and if its value equals X1 or X2 (or any other X from the first 'case'), then STMT1 is executed. Otherwise, if CONDITION evaluates to Y1 or Y2 (or any other Y from the second 'case'), then STMT2 is executed. Other 'case' branches are tried in turn. If none of them matches, STMT (called the "default branch") is executed. There can be as many 'case' branches as you wish. The 'default' branch is optional. There can be at most one 'default' branch. An example of 'switch' statement follows: switch x do case 1 or 3: add "X-Branch" "1" accept case 2 or 4 or 6: add "X-Branch" "2" default: reject done If the value of 'mailfromd' variable 'x' is 2 or 3, it will accept the message immediately, and add a 'X-Branch: 1' header to it. If 'x' equals 2 or 4 or 6, this code will add 'X-Branch: 2' header to the message and will continue processing it. Otherwise, it will reject the message. The controlling condition of a 'switch' statement may evaluate to numeric or string type. The type of the condition governs the type of comparisons used in 'case' branches: for numeric types, numeric equality will be used, whereas for string types, string equality is used.  File: mailfromd.info, Node: Loops, Next: Exceptions, Prev: Conditionals, Up: MFL 4.18 Loop Statements ==================== The loop statement allows for repeated execution of a block of code, controlled by some conditional expression. It has the following form: loop [LABEL] [for STMT1] [,while EXPR1] [,STMT2] do STMT3 done [while EXPR2] where STMT1, STMT2, and STMT3 are statement lists, EXPR1 and EXPR2 are expressions. The control flow is as follows: 1. If STMT1 is specified, execute it. 2. Evaluate EXPR1. If it is zero, go to 6. Otherwise, continue. 3. Execute STMT3. 4. If STMT2 is supplied, execute it. 5. If EXPR2 is given, evaluate it. If it is zero, go to 6. Otherwise, go to 2. 6. End. Thus, STMT3 is executed until either EXPR1 or EXPR2 yield a zero value. The "loop body" - STMT3 - can contain special statements: 'break [LABEL]' Terminates the loop immediately. Control passes to '6' (End) in the formal definition above. If LABEL is supplied, the statement terminates the loop statement marked with that label. This allows to break from nested loops. It is similar to 'break' statement in C or shell. 'next [LABEL]' Initiates next iteration of the loop. Control passes to '4' in the formal definition above. If LABEL is supplied, the statement starts next iteration of the loop statement marked with that label. This allows to request next iteration of an upper-level loop from a nested loop statement. The 'loop' statement can be used to create iterative statements of arbitrary complexity. Let's illustrate it in comparison with C. The statement: loop do STMT-LIST done creates an infinite loop. The only way to exit from such a loop is to call 'break' (or 'return', if used within a function), somewhere in STMT-LIST. The following statement is equivalent to 'while (EXPR1) STMT-LIST' in C: loop while EXPR do STMT-LIST done The C construct 'for (EXPR1; EXPR2; EXPR3)' is written in MFL as follows: loop for STMT1, while EXPR2, STMT2 do STMT3 done For example, to repeat STMT3 10 times: loop for set i 0, while i < 10, set i i + 1 do STMT3 done Finally, the C 'do' loop is implemented as follows: loop do STMT-LIST done while EXPR As a real-life example of a loop statement, let's consider the implementation of function 'ptr_validate', which takes a single argument IPSTR, and checks its validity using the following algorithm: Perform a DNS reverse-mapping for IPSTR, looking up the corresponding 'PTR' record in 'in-addr.arpa'. For each record returned, look up its IP addresses (A records). If IPSTR is among the returned IP addresses, return 1 ('true'), otherwise return 0 ('false'). The implementation of this function in MFL is: #pragma regex push +extended func ptr_validate(string ipstr) returns number do loop for string names dns_getname(ipstr) . " " number i index(names, " "), while i != -1, set names substr(names, i + 1) set i index(names, " ") do loop for string addrs dns_getaddr(substr(names, 0, i)) . " " number j index(addrs, " "), while j != -1, set addrs substr(addrs, j + 1) set j index(addrs, " ") do if ipstr == substr(addrs, 0, j) return 1 fi done done return 0 done  File: mailfromd.info, Node: Exceptions, Next: Polling, Prev: Loops, Up: MFL 4.19 Exceptional Conditions =========================== When the running program encounters a condition it is not able to handle, it signals an "exception". To illustrate the concept, let's consider the execution of the following code fragment: if primitive_hasmx(domainpart($f)) accept fi The function 'primitive_hasmx' (*note primitive_hasmx::) tests whether the domain name given as its argument has any 'MX' records. It should return a boolean value. However, when querying the Domain Name System, it may fail to get a definite result. For example, the DNS server can be down or temporary unavailable. In other words, 'primitive_hasmx' can be in a situation when, instead of returning 'yes' or 'no', it has to return 'don't know'. It has no way of doing so, therefore it signals an "exception". Each exception is identified by "exception type", an integer number associated with it. * Menu: * Built-in Exceptions:: * User-defined Exceptions:: * Catch and Throw::  File: mailfromd.info, Node: Built-in Exceptions, Next: User-defined Exceptions, Up: Exceptions 4.19.1 Built-in Exceptions -------------------------- The first 20 exception numbers are reserved for "built-in exceptions". These are declared in module 'status.mf'. The following table summarizes all built-in exception types implemented by 'mailfromd' version 8.8. Exceptions are listed in lexicographic order. 'e_badmmq' The called function cannot finish its task because an uncompatible message modification function was called at some point before it. For details, *note MMQ and dkim_sign::. 'e_dbfailure' General database failure. For example, the database cannot be opened. This exception can be signaled by any function that queries any DBM database. 'e_divzero' Division by zero. 'e_exists' This exception is emitted by 'dbinsert' built-in if the requested key is already present in the database (*note dbinsert: Database functions.). 'e_eof' Function reached end of file while reading. *Note I/O functions::, for a description of functions that can signal this exception. 'e_failure' 'failure' 'e_failure' A general failure has occurred. In particular, this exception is signaled by DNS lookup functions when any permanent failure occurs. This exception can be signaled by any DNS-related function ('hasmx', 'poll', etc.) or operation ('mx matches'). 'e_format' Invalid input format. This exception is signaled if input data to a function are improperly formatted. In version 8.8 it is signaled by 'message_burst' function if its input message is not formatted according to RFC 934. *Note Message digest functions::. 'e_invcidr' Invalid CIDR notation. This is signaled by 'match_cidr' function when its second argument is not a valid CIDR. 'e_invip' Invalid IP address. This is signaled by 'match_cidr' function when its first argument is not a valid IP address. 'e_invtime' Invalid time interval specification. It is signaled by 'interval' function if its argument is not a valid time interval (*note time interval specification::). 'e_io' An error occurred during the input-output operation. *Note I/O functions::, for a description of functions that can signal this exception. 'e_macroundef' A Sendmail macro is undefined. 'e_noresolve' The argument of a DNS-related function cannot be resolved to host name or IP address. Currently only 'ismx' (*note ismx::) raises this exception. 'e_range' The supplied argument is outside the allowed range. This is signalled, for example, by 'substring' function (*note substring::). 'e_regcomp' Regular expression cannot be compiled. This can happen when a regular expression (a right-hand argument of a 'matches' operator) is built at the runtime and the produced string is an invalid regex. 'e_ston_conv' String-to-number conversion failed. This can be signaled when a string is used in numeric context which cannot be converted to the numeric data type. For example: set x "10a" if x / 2 ... The 'if' condition will signal 'ston_conv', since '10a' cannot be converted to a number. 'e_temp_failure' 'temp_failure' 'e_temp_failure' A temporary failure has occurred. This can be signaled by DNS-related functions or operations. 'e_url' The supplied URL is invalid. *Note Interfaces to Third-Party Programs::. In addition to these, two symbols are defined that are not exception types in the strict sense of the world, but are provided to make writing filter scripts more convenient. These are 'success', meaning successful return from a function, and 'not_found', meaning that the required entity (e.g. domain name or email address) was not found. *Note Figure 4.1: figure-poll-wrapper, for an illustration on how these can be used. For consistency with other exception codes, these can be spelled as 'e_success' and 'e_not_found'.  File: mailfromd.info, Node: User-defined Exceptions, Next: Catch and Throw, Prev: Built-in Exceptions, Up: Exceptions 4.19.2 User-defined Exceptions ------------------------------ You can define your own exception types using the 'dclex' statement: dclex TYPE In this statement, TYPE must be a valid MFL identifier, not used for another constant (*note Constants::). The 'dclex' statement defines a new exception identified by the constant TYPE and allocates a new exception number for it. The TYPE can subsequently be used in 'throw' and 'catch' statements, for example: dclex myrange number fact(number val) returns number do if val < 0 throw myrange "fact argument is out of range" fi ... done  File: mailfromd.info, Node: Catch and Throw, Prev: User-defined Exceptions, Up: Exceptions 4.19.3 Exception Handling ------------------------- Normally when an exception is signalled, the program execution is terminated and the MTA is returned a 'tempfail' status. Additional information regarding the exception is then output to the logging channel (*note Logging and Debugging::). However, the user can intercept any exception by installing his own exception-handling routines. An exception-handling routine is introduced by a "try-catch" statement, which has the following syntax: try do STMTLIST done catch EXCEPTION-LIST do HANDLER-BODY done where STMTLIST and HANDLER-BODY are sequences of MFL statements and EXCEPTION-LIST is the list of exception types, separated by the word 'or'. A special EXCEPTION-LIST '*' is allowed and means all exceptions. This construct works as follows. First, the statements from STMTLIST are executed. If the execution finishes successfully, control is passed to the first statement after the 'catch' block. Otherwise, if an exception is signalled and this exception is listed in EXCEPTION-LIST, the execution is passed to the HANDLER-BODY. If the exception is not listed in EXCEPTION-LIST, it is handled as usual. The following example shows a 'try--catch' construct used for handling eventual exceptions, signalled by 'primitive_hasmx'. try do if primitive_hasmx(domainpart($f)) accept else reject fi done catch e_failure or e_temp_failure do echo "primitive_hasmx failed" continue done The 'try--catch' statement can appear anywhere inside a function or a handler, but it cannot appear outside of them. It can also be nested within another 'try--catch', in either of its parts. Upon exit from a function or milter handler, all exceptions are restored to the state they had when it has been entered. A 'catch' block can also be used alone, without preceding 'try' part. Such a construct is called a "standalone catch". It is mostly useful for setting global exception handlers in a 'begin' statement (*note begin/end::). When used within a usual function or handler, the exception handlers set by a standalone catch remain in force until either another standalone catch appears further in the same function or handler, or an end of the function is encountered, whichever occurs first. A standalone catch defined within a function must return from it by executing 'return' statement. If it does not do that explicitly, the default value of 1 is returned. A standalone catch defined within a milter handler must end execution with any of the following actions: 'accept', 'continue', 'discard', 'reject', 'tempfail'. By default, 'continue' is used. It is not recommended to mix 'try--catch' constructs and standalone catches. If a standalone catch appears within a 'try--catch' statement, its scope of visibility is undefined. Upon entry to a HANDLER-BODY, two implicit positional arguments are defined, which can be referenced in HANDLER-BODY as '$1' and '$2'. The first argument gives the numeric code of the exception that has occurred. The second argument is a textual string containing a human-readable description of the exception. The following is an improved version of the previous example, which uses these parameters to supply more information about the failure: try do if primitive_hasmx(domainpart($f)) accept else reject fi done catch e_failure or e_temp_failure do echo "Caught exception $1: $2" continue done The following example defines the function 'hasmx' that returns true if the domain part of its argument has any 'MX' records, and false if it does not or if an exception occurs (1). func hasmx (string s) returns number do try do return primitive_hasmx(domainpart(s)) done catch * do return 0 done done The same function can written using standalone 'catch': func hasmx (string s) returns number do catch * do return 0 done return primitive_hasmx(domainpart(s)) done All variables remain visible within 'catch' body, with the exception of positional arguments of the enclosing handler. To access positional arguments of a handler from the 'catch' body, assign them to local variables prior to the 'try--catch' construct, e.g.: prog header do string hname $1 string hvalue $2 try do ... done catch * do echo "Exception $1 while processing header %hname: %hvalue" echo $2 tempfail done You can also generate (or "raise") exceptions explicitly in the code, using 'throw' statement: throw EXCODE DESCR The arguments correspond exactly to the positional parameters of the 'catch' statement: EXCODE gives the numeric code of the exception, DESCR gives its textual description. This statement can be used in complex scripts to create non-local exits from deeply nested statements. Notice, that the the EXCODE argument must be an immediate value: an exception identifier (either a built-in one or one declared previously using a 'dclex' statement). ---------- Footnotes ---------- (1) This function is part of the 'mailfromd' library, *Note hasmx::.  File: mailfromd.info, Node: Polling, Next: Modules, Prev: Exceptions, Up: MFL 4.20 Sender Verification Tests ============================== The filter script language provides a wide variety of functions for sender address verification or "polling", for short. These functions, which were described in *note SMTP Callout functions::, can be used to implement any sender verification method. The additional data that can be needed is normally supplied by two global variables: 'ehlo_domain', keeping the default domain for the 'EHLO' command, and 'mailfrom_address', which stores the sender address for probe messages (*note Predefined variables::). For example, a simplest way to implement standard polling would be: prog envfrom do if stdpoll($1, ehlo_domain, mailfrom_address) == 0 accept else reject 550 5.1.0 "Sender validity not confirmed" fi done However, this does not take into account exceptions that 'stdpoll' can signal. To handle them, one will have to use 'catch', for example thus: require status prog envfrom do try do if stdpoll($1, ehlo_domain, mailfrom_address) == 0 accept else reject 550 5.1.0 "Sender validity not confirmed" fi done catch e_failure or e_temp_failure do switch $1 do case failure: reject 550 5.1.0 "Sender validity not confirmed" case temp_failure: tempfail 450 4.1.0 "Try again later" done done done If polls are used often, one can define a wrapper function, and use it instead. The following example illustrates this approach: func poll_wrapper(string email) returns number do catch e_failure or e_temp_failure do return email done return stdpoll(email, ehlo_domain, mailfrom_address) done prog envfrom do switch poll_wrapper($f) do case success: accept case not_found or failure: reject 550 5.1.0 "Sender validity not confirmed" case temp_failure: tempfail 450 4.1.0 "Try again later" done done Figure 4.1: Building Poll Wrappers Notice the way 'envfrom' handles 'success' and 'not_found', which are not exceptions in the strict sense of the word. The above paradigm is so common that 'mailfromd' provides a special language construct to simplify it: the 'on' statement. Instead of manually writing the wrapper function and using it as a 'switch' condition, you can rewrite the above example as: prog envfrom do on stdpoll($1, ehlo_domain, mailfrom_address) do when success: accept when not_found or failure: reject 550 5.1.0 "Sender validity not confirmed" when temp_failure: tempfail 450 4.1.0 "Try again later" done done Figure 4.2: Standard poll example As you see the statement is pretty similar to 'switch'. The major syntactic difference is the use of the keyword 'when' to introduce conditional branches. General syntax of the 'on' statement is: on CONDITION do when X1 [or X2 ...]: STMT1 when Y1 [or Y2 ...]: STMT2 . . . done The CONDITION is either a function call or a special 'poll' statement (see below). The values used in 'when' branches are normally symbolic exception names (*note exception names::). When the compiler processes the 'on' statement it does the following: 1. Builds a unique wrapper function, similar to that described in *note Figure 4.1: figure-poll-wrapper.; The name of the function is constructed from the CONDITION function name and an unsigned number, called "exception mask", that is unique for each combination of exceptions used in 'when' branches; To avoid name clashes with the user-defined functions, the wrapper name begins and ends with '$' which normally is not allowed in the identifiers; 2. Translates the 'on' body to the corresponding 'switch' statement; A special form of the CONDITION is 'poll' keyword, whose syntax is: poll [for] EMAIL [host HOST] [from DOMAIN] [as EMAIL] The order of particular keywords in the 'poll' statement is arbitrary, for example 'as EMAIL' can appear before EMAIL as well as after it. The simplest form, 'poll EMAIL', performs the standard sender verification of email address EMAIL. It is translated to the following function call: stdpoll(EMAIL, ehlo_domain, mailfrom_address) The construct 'poll EMAIL host HOST', runs the strict sender verification of address EMAIL on the given host. It is translated to the following call: strictpoll(HOST, EMAIL, ehlo_domain, mailfrom_address) Other keywords of the 'poll' statement modify these two basic forms. The 'as' keyword introduces the email address to be used in the SMTP 'MAIL FROM' command, instead of 'mailfrom_address'. The 'from' keyword sets the domain name to be used in 'EHLO' command. So, for example the following construct: poll EMAIL host HOST from DOMAIN as ADDR is translated to strictpoll(HOST, EMAIL, DOMAIN, ADDR) To summarize the above, the code described in *note Figure 4.2: figure-stdpoll. can be written as: prog envfrom do on poll $f do when success: accept when not_found or failure: reject 550 5.1.0 "Sender validity not confirmed" when temp_failure: tempfail 450 4.1.0 "Try again later" done done  File: mailfromd.info, Node: Modules, Next: Preprocessor, Prev: Polling, Up: MFL 4.21 Modules ============ A "module" is a logically isolated part of code that implements a separate concern or feature and contains a collection of conceptually united functions and/or data. Each module occupies a separate compilation unit (i.e. file). The functionality provided by a module is incorporated into another module or the main program by "requiring" this module or by "importing" the desired components from it. * Menu: * module structure:: Declaring Modules * scope of visibility:: * import:: Require and Import  File: mailfromd.info, Node: module structure, Next: scope of visibility, Up: Modules 4.21.1 Declaring Modules ------------------------ A module file must begin with a "module declaration": module MODNAME [INTERFACE-TYPE]. Note the final dot. The MODNAME parameter declares the name of the module. It is recommended that it be the same as the file name without the '.mf' extension. The module name must be a valid MFL literal. It also must not coincide with any defined MFL symbol, therefore we recommend to always quote it (see example below). The optional parameter INTERFACE-TYPE defines the "default scope of visibility" for the symbols declared in this module. If it is 'public', then all symbols declared in this module are made public (importable) by default, unless explicitly declared otherwise (*note scope of visibility::). If it is 'static', then all symbols, not explicitly marked as public, become static. If the INTERFACE-TYPE is not given, 'public' is assumed. The actual MFL code follows the 'module' line. The module definition is terminated by the "logical end" of its compilation unit, i.e. either by the end of file, or by the keyword 'bye', whichever occurs first. Special keyword 'bye' may be used to prematurely end the current compilation unit before the physical end of the containing file. Any material between 'bye' and the end of file is ignored by the compiler. Let's illustrate these concepts by writing a module 'revip': module 'revip' public. func revip(string ip) returns string do return inet_ntoa(ntohl(inet_aton(ip))) done bye This text is ignored. You may put any additional documentation here.  File: mailfromd.info, Node: scope of visibility, Next: import, Prev: module structure, Up: Modules 4.21.2 Scope of Visibility -------------------------- "Scope of Visibility" of a symbol defines from where this symbol may be referred to. Symbols in MFL may have either of the following two scopes: "Public" Public symbols are visible from the current module, as well as from any external modules, including the main script file, provided that they are properly imported (*note import::). "Static" Static symbols are visible only from the current module. There is no way to refer to them from outside. The default scope of visibility for all symbols declared within a module is defined in the module declaration (*note module structure::). It may be overridden for any individual symbol by prefixing its declaration with an appropriate "qualifier": either 'public' or 'static'.  File: mailfromd.info, Node: import, Prev: scope of visibility, Up: Modules 4.21.3 Require and Import ------------------------- Functions or variables declared in another module must be "imported" prior to their actual use. MFL provides two ways of doing so: by "requiring" the entire module or by importing selected symbols from it. -- Module Import: require modname The 'require' statement instructs the compiler to locate the module MODNAME and to load all public interfaces from it. The compiler looks for the file 'MODNAME.mf' in the current search path (*note include search path::). If no such file is found, a compilation error is reported. For example, the following statement: require revip imports all interfaces from the module 'revip.mf'. Another, more sophisticated way to import from a module is to use the 'from ... import' construct: from MODULE import SYMBOLS. Note the final dot. The 'from' and 'module' statements are the only two constructs in MFL that require the delimiter. The MODULE has the same semantics as in the 'require' construct. The SYMBOLS is a comma-separated list of symbol names to import from MODULE. A symbol name may be given in several forms: 1. Literal Literals specify exact symbol names to import. For example, the following statement imports from module 'A.mf' symbols 'foo' and 'bar': from A import foo,bar. 2. Regular expression Regular expressions must be surrounded by slashes. A regular expression instructs the compiler to import all symbols whose names match that expression. For example, the following statement imports from 'A.mf' all symbols whose names begin with 'foo' and contain at least one digit after it: from A import '/^foo.*[0-9]/'. The type of regular expressions used in the 'from' statement is controlled by '#pragma regex' (*note regex::). 3. Regular expression with transformation Regular expression may be followed by a "s-expression", i.e. a 'sed'-like expression of the form: s/REGEXP/REPLACE/[FLAGS] where REGEXP is a "regular expression", REPLACE is a replacement for each part of the input that matches REGEXP. S-expressions and their parts are discussed in detail in *note s-expression::. The effect of such construct is to import all symbols that match the regular expression and apply the s-expression to their names. For example: from A import '/^foo.*[0-9]/s/.*/my_&/'. This statement imports all symbols whose names begin with 'foo' and contain at least one digit after it, and renames them, by prefixing their names with the string 'my_'. Thus, if 'A.mf' declared a function 'foo_1', it becomes visible under the name of 'my_foo_1'.  File: mailfromd.info, Node: Preprocessor, Next: Filter Script Example, Prev: Modules, Up: MFL 4.22 MFL Preprocessor ===================== Before compiling the script file, 'mailfromd' preprocesses it. The built-in preprocessor handles only file inclusion (*note include::), while the rest of traditional facilities, such as macro expansion, are supported via 'm4', which is used as an external preprocessor. The detailed description of 'm4' facilities lies far beyond the scope of this document. You will find a complete user manual in *note GNU M4 manual: (m4)Top. For the rest of this section we assume the reader is sufficiently acquainted with 'm4' macro processor. The external preprocessor is invoked with '-s' flag, instructing it to include line synchronization information in its output, which is subsequently used by MFL compiler for purposes of error reporting. The initial set of macro definitions is supplied in file 'pp-setup', located in the library search path(1), which is fed to the preprocessor input before the script file itself. The default 'pp-setup' file renames all 'm4' built-in macro names so they all start with the prefix 'm4_'(2). It changes comment characters to '/*', '*/' pair, and leaves the default quoting characters, grave ('`') and acute (''') accents without change. Finally, 'pp-setup' defines the following macros: -- M4 Macro: boolean defined (IDENTIFIER) The IDENTIFIER must be the name of an optional abstract argument to the function. This macro must be used only within a function definition. It expands to the MFL expression that yields 'true' if the actual parameter is supplied for IDENTIFIER. For example: func rcut(string text; number num) returns string do if (defined(num)) return substr(text, length(text) - num) else return text fi done This function will return last NUM characters of TEXT if NUM is supplied, and entire TEXT otherwise, e.g.: rcut("text string") => "text string" rcut("text string", 3) => "ing" Invoking the 'defined' macro with the name of a mandatory argument yields 'true' -- M4 Macro: printf (FORMAT, ...) Provides a 'printf' statement, that formats its optional parameters in accordance with FORMAT and sends the resulting string to the current log output (*note Logging and Debugging::). *Note String formatting::, for a description of FORMAT. Example usage: printf('Function %s returned %d', funcname, retcode) -- M4 Macro: string _ (MSGID) A convenience macro. Expands to a call to 'gettext' (*note NLS Functions::). -- M4 Macro: string_list_iterate (LIST, DELIM, VAR, CODE) This macro intends to compensate for the lack of array data type in MFL. It splits the string LIST into segments delimited by string DELIM. For each segment, the MFL code CODE is executed. The code can use the variable VAR to refer to the segment string. For example, the following fragment prints names of all existing directories listed in the 'PATH' environment variable: string path getenv("PATH") string seg string_list_iterate(path, ":", seg, ` if access(seg, F_OK) echo "%seg exists" fi') Care should be taken to properly quote its arguments. In the code below the string 'str' is treated as a comma-separated list of values. To avoid interpreting the comma as argument delimiter the second argument must be quoted: string_list_iterate(str, `","', seg, ` echo "next segment: " . seg') -- M4 Macro: N_ (MSGID) A convenience macro, that expands to MSGID verbatim. It is intended to mark the literal strings that should appear in the '.po' file, where actual call to 'gettext' (*note NLS Functions::) cannot be used. For example: /* Mark the variable for translation: cannot use gettext here */ string message N_("Mail accepted") prog envfrom do ... /* Translate and log the message */ echo gettext(message) You can obtain the preprocessed output, without starting actual compilation, using '-E' command line option: $ mailfromd -E file.mf The output is in the form of preprocessed source code, which is sent to the standard output. This can be useful, among others, to debug your own macro definitions. Macro definitions and deletions can be made on the command line, by using the '-D' and '-U' options. They have the following format: '-D NAME[=VALUE]' '--define=NAME[=VALUE]' Define a symbol NAME to have a value VALUE. If VALUE is not supplied, the value is taken to be the empty string. The VALUE can be any string, and the macro can be defined to take arguments, just as if it was defined from within the input using the 'm4_define' statement. For example, the following invocation defines symbol 'COMPAT' to have a value '43': $ mailfromf -DCOMPAT=43 '-U NAME' '--undefine=NAME' A counterpart of the '-D' option is the option '-U' ('--undefine'). It undefines a preprocessor symbol whose name is given as its argument. The following example undefines the symbol 'COMPAT': $ mailfromf -UCOMPAT The following two options are supplied mainly for debugging purposes: '--no-preprocessor' Disables the external preprocessor. '--preprocessor=COMMAND' Use COMMAND as external preprocessor. Be especially careful with this option, because 'mailfromd' cannot verify whether COMMAND is actually some kind of a preprocessor or not. ---------- Footnotes ---------- (1) It is usually located in '/usr/local/share/mailfromd/8.8/include/pp-setup'. (2) This is similar to GNU m4 '--prefix-builtin' options. This approach was chosen to allow for using non-GNU 'm4' implementations as well.  File: mailfromd.info, Node: Filter Script Example, Next: Reserved Words, Prev: Preprocessor, Up: MFL 4.23 Example of a Filter Script File ==================================== In this section we will discuss a working example of the filter script file. For the ease of illustration, it is divided in several sections. Each section is prefaced with a comment explaining its function. This filter assumes that the 'mailfromd.conf' file contains the following: relayed-domain-file (/etc/mail/sendmail.cw, /etc/mail/relay-domains); io-timeout 33; database cache { negative-expire-interval 1 day; positive-expire-interval 2 weeks; }; Of course, the exact parameter settings may vary, what is important is that they be declared. *Note Mailfromd Configuration::, for a description of 'mailfromd' configuration file syntax. Now, let's return to the script. Its first part defines the configuration settings for this host: #pragma regex +extended +icase set mailfrom_address "<>" set ehlo_domain "gnu.org.ua" The second part loads the necessary source modules: require 'status' require 'dns' require 'rateok' Next we define 'envfrom' handler. In the first two rules, it accepts all mails coming from the null address and from the machines which we relay: prog envfrom do if $f = "" accept elif relayed hostname($client_addr) accept elif hostname($client_addr) = $client_addr reject 550 5.7.7 "IP address does not resolve" Next rule rejects all messages coming from hosts with dynamic IP addresses. A regular expression used to catch such hosts is not 100% fail-proof, but it tries to cover most existing host naming patterns: elif hostname($client_addr) matches ".*(adsl|sdsl|hdsl|ldsl|xdsl|dialin|dialup|\ ppp|dhcp|dynamic|[-.]cpe[-.]).*" reject 550 5.7.1 "Use your SMTP relay" Messages coming from the machines whose host names contain something similar to an IP are subject to strict checking: elif hostname($client_addr) matches ".*[0-9]{1,3}[-.][0-9]{1,3}[-.][0-9]{1,3}[-.][0-9]{1,3}.*" on poll host $client_addr for $f do when success: pass when not_found or failure: reject 550 5.1.0 "Sender validity not confirmed" when temp_failure: tempfail done If the sender domain is relayed by any of the 'yahoo.com' or 'nameserver.com' 'MX's, no checks are performed. We will greylist this message in 'envrcpt' handler: elif $f mx fnmatches "*.yahoo.com" or $f mx fnmatches "*.namaeserver.com" pass Finally, if the message does not meet any of the above conditions, it is verified by the standard procedure: else on poll $f do when success: pass when not_found or failure: reject 550 5.1.0 "Sender validity not confirmed" when temp_failure: tempfail done fi At the end of the handler we check if the sender-client pair does not exceed allowed mail sending rate: if not rateok("$f-$client_addr", interval("1 hour 30 minutes"), 100) tempfail 450 4.7.0 "Mail sending rate exceeded. Try again later" fi done Next part defines the 'envrcpt' handler. Its primary purpose is to greylist messages from some domains that could not be checked otherwise: prog envrcpt do set gltime 300 if $f mx fnmatches "*.yahoo.com" or $f mx fnmatches "*.namaeserver.com" and not dbmap("/var/run/whitelist.db", $client_addr) if greylist("$client_addr-$f-$rcpt_addr", gltime) if greylist_seconds_left = gltime tempfail 450 4.7.0 "You are greylisted for %gltime seconds" else tempfail 450 4.7.0 "Still greylisted for " . %greylist_seconds_left . " seconds" fi fi fi done  File: mailfromd.info, Node: Reserved Words, Prev: Filter Script Example, Up: MFL 4.24 Reserved Words =================== For your reference, here is an alphabetical list of all reserved words: * __defpreproc__ * __defstatedir__ * __file__ * __function__ * __line__ * __major__ * __minor__ * __module__ * __package__ * __patch__ * __preproc__ * __statedir__ * __version__ * accept * add * and * alias * begin * break * bye * case * catch * const * continue * default * delete * discard * do * done * echo * end * elif * else * fi * fnmatches * for * from * func * if * import * loop * matches * module * next * not * number * on * or * pass * precious * prog * public * reject * replace * return * returns * require * set * static * string * switch * tempfail * throw * try * vaptr * when * while Several keywords are context-dependent: 'mx' is a keyword if it appears before 'matches' or 'fnmatches'. Following strings are keywords in 'on' context: * as * host * poll The following keywords are preprocessor macros: * defined * _ (an underscore) * N_ Any keyword beginning with a 'm4_' prefix is a reserved preprocessor symbol.  File: mailfromd.info, Node: Library, Next: Using MFL Mode, Prev: MFL, Up: Top 5 The MFL Library Functions *************************** This chapter describes library functions available in Mailfromd version 8.8. For the simplicity of explanation, we use the word 'boolean' to indicate variables of numeric type that are used as boolean values. For such variables, the term 'False' stands for the numeric 0, and 'True' for any non-zero value. * Menu: * Macro access:: * String manipulation:: * String formatting:: * Character Type:: * Email processing functions:: * Envelope modification functions:: * Header modification functions:: * Body Modification Functions:: * Message modification queue:: * Mail header functions:: * Mail body functions:: * EOM Functions:: * Current Message Functions:: * Mailbox functions:: * Message functions:: * Quarantine functions:: * SMTP Callout functions:: * Compatibility Callout functions:: * Internet address manipulation functions:: * DNS functions:: * Geolocation functions:: * Database functions:: * I/O functions:: * System functions:: * Passwd functions:: * Sieve Interface:: * Interfaces to Third-Party Programs:: * Rate limiting functions:: * Greylisting functions:: * Special test functions:: * Mail Sending Functions:: * Blacklisting Functions:: * SPF Functions:: * DKIM:: * Sockmaps:: * NLS Functions:: * Syslog Interface:: * Debugging Functions::  File: mailfromd.info, Node: Macro access, Next: String manipulation, Up: Library 5.1 Sendmail Macro Access Functions =================================== -- Built-in Function: string getmacro (string MACRO) Returns the value of Sendmail macro MACRO. If MACRO is not defined, raises the 'e_macroundef' exception. Calling 'getmacro(NAME)' is completely equivalent to referencing '${NAME}', except that it allows to construct macro names programmatically, e.g.: if getmacro("auth_%var") = "foo" ... fi -- Built-in Function: boolean macro_defined (string NAME) Return true if Sendmail macro NAME is defined. Notice, that if your MTA supports macro name negotiation(1), you will have to export macro names used by these two functions using '#pragma miltermacros' construct. Consider this example: func authcheck(string name) do string macname "auth_%name" if macro_defined(macname) if getmacro(macname) ... fi fi done #pragma miltermacros envfrom auth_authen prog envfrom do authcheck("authen") done In this case, the parser cannot deduce that the 'envfrom' handler will attempt to reference the 'auth_authen' macro, therefore the '#pragma miltermacros' is used to help it. ---------- Footnotes ---------- (1) That is, if it supports Milter protocol 6 and upper. Sendmail 8.14.0 and Postfix 2.6 and newer do. MeTA1 (via 'pmult') does as well. *Note MTA Configuration::, for more details.  File: mailfromd.info, Node: String manipulation, Next: String formatting, Prev: Macro access, Up: Library 5.2 String Manipulation Functions ================================= -- Built-in Function: string escape (string STR, [string CHARS]) Returns a copy of STR with the characters from CHARS escaped, i.e. prefixed with a backslash. If CHARS is not specified, '\"' is assumed. escape('"a\tstr"ing') => '\"a\\tstr\"ing' escape('new "value"', '\" ') => 'new\ \"value\"' -- Built-in Function: string unescape (string STR) Performs the reverse to 'escape', i.e. removes any prefix backslash characters. unescape('a \"quoted\" string') => 'a "quoted" string' -- Built-in Function: string unescape (string STR, [string CHARS]) -- Built-in Function: string domainpart (string STR) Returns the domain part of STR, if it is a valid email address, otherwise returns STR itself. domainpart("gray") => "gray" domainpart("gray@gnu.org.ua") => "gnu.org.ua" -- Built-in Function: number index (string S, string T) -- Built-in Function: number index (string S, string T, number START) Returns the index of the first occurrence of the string T in the string S, or -1 if T is not present. index("string of rings", "ring") => 2 Optional argument START, if supplied, indicates the position in string where to start searching. index("string of rings", "ring", 3) => 10 To find the last occurrence of a substring, use the function RINDEX (*note rindex::). -- Built-in Function: number interval (string STR) Converts STR, which should be a valid time interval specification (*note time interval specification::), to seconds. -- Built-in Function: number length (string STR) Returns the length of the string STR in bytes. length("string") => 6 -- Built-in Function: string dequote (string STR) Removes '<' and '>' surrounding STR. If STR is not enclosed by angle brackets or these are unbalanced, the argument is returned unchanged: dequote("") => "root@gnu.org.ua" dequote("root@gnu.org.ua") => "root@gnu.org.ua" dequote("there>") => "there>" -- Built-in Function: string localpart (string STR) Returns the local part of STR if it is a valid email address, otherwise returns STR unchanged. localpart("gray") => "gray" localpart("gray@gnu.org.ua") => "gray" -- Built-in Function: string replstr (string S, number N) Replicate a string, i.e. return a string, consisting of S repeated N times: replstr("12", 3) => "121212" -- Built-in Function: string revstr (string S) Returns the string composed of the characters from S in reversed order: revstr("foobar") => "raboof" -- Built-in Function: number rindex (string S, string T) -- Built-in Function: number rindex (string S, string T, number START) Returns the index of the last occurrence of the string T in the string S, or -1 if T is not present. rindex("string of rings", "ring") => 10 Optional argument START, if supplied, indicates the position in string where to start searching. E.g.: rindex("string of rings", "ring", 10) => 2 See also *note 'index' built-in function: index-built-in. -- Built-in Function: string substr (string STR, number START) -- Built-in Function: string substr (string STR, number START, number LENGTH) Returns the at most LENGTH-character substring of STR starting at START. If LENGTH is omitted, the rest of STR is used. If LENGTH is greater than the actual length of the string, the 'e_range' exception is signalled. substr("mailfrom", 4) => "from" substr("mailfrom", 4, 2) => "fr" -- Built-in Function: string substring (string STR, number START, number END) Returns a substring of STR between offsets START and END, inclusive. Negative END means offset from the end of the string. In other words, yo obtain a substring from START to the end of the string, use 'substring(STR, START, -1)': substring("mailfrom", 0, 3) => "mail" substring("mailfrom", 2, 5) => "ilfr" substring("mailfrom", 4, -1) => "from" substring("mailfrom", 4, length("mailfrom") - 1) => "from" substring("mailfrom", 4, -2) => "fro" This function signals 'e_range' exception if either START or END are outside the string length. -- Built-in Function: string tolower (string STR) Returns a copy of the string STR, with all the upper-case characters translated to their corresponding lower-case counterparts. Non-alphabetic characters are left unchanged. tolower("MAIL") => "mail" -- Built-in Function: string toupper (string STR) Returns a copy of the string STR, with all the lower-case characters translated to their corresponding upper-case counterparts. Non-alphabetic characters are left unchanged. toupper("mail") => "MAIL" -- Built-in Function: string ltrim (string STR[, string CSET) Returns a copy of the input string STR with any leading characters present in CSET removed. If the latter is not given, white space is removed (spaces, tabs, newlines, carriage returns, and line feeds). ltrim(" a string") => "a string" ltrim("089", "0") => "89" Note the last example. It shows how 'ltrim' can be used to convert decimal numbers in string representation that begins with '0'. Normally such strings will be treated as representing octal numbers. If they are indeed decimal, use 'ltrim' to strip off the leading zeros, e.g.: set dayofyear ltrim(strftime('%j', time()), "0") -- Built-in Function: string rtrim (string STR[, string CSET) Returns a copy of the input string STR with any trailing characters present in CSET removed. If the latter is not given, white space is removed (spaces, tabs, newlines, carriage returns, and line feeds). -- Built-in Function: number vercmp (string A, string B) Compares two strings as 'mailfromd' version numbers. The result is negative if B precedes A, zero if they refer to the same version, and positive if B follows A: vercmp("5.0", "5.1") => 1 vercmp("4.4", "4.3") => -1 vercmp("4.3.1", "4.3") => -1 vercmp("8.0", "8.0") => 0 -- Library Function: string sa_format_score (number CODE, number PREC) Format CODE as a floating-point number with PREC decimal digits: sa_format_score(5000, 3) => "5.000" This function is convenient for formatting SpamAssassin scores for use in message headers and textual reports. It is defined in module 'sa.mf'. *Note SpamAssassin: sa, for examples of its use. -- Library Function: string sa_format_report_header (string TEXT) Format a SpamAssassin report text in order to include it in a RFC 822 header. This function selects the score listing from TEXT, and prefixes each line with '* '. Its result looks like: * 0.2 NO_REAL_NAME From: does not include a real name * 0.1 HTML_MESSAGE BODY: HTML included in message *Note SpamAssassin: sa, for examples of its use. -- Library Function: string strip_domain_part (string DOMAIN, number N) Returns at most N last components of the domain name DOMAIN. If N is 0 the function returns DOMAIN. This function is defined in the module 'strip_domain_part.mf' (*note Modules::). Examples: require strip_domain_part strip_domain_part("puszcza.gnu.org.ua", 2) => "org.ua" strip_domain_part("puszcza.gnu.org.ua", 0) => "puszcza.gnu.org.ua" -- Library Function: boolean is_ip (string STR) Returns 'true' if STR is a valid IPv4 address. This function is defined in the module 'is_ip.mf' (*note Modules::). For example: require is_ip is_ip("1.2.3.4") => 1 is_ip("1.2.3.x") => 0 is_ip("blah") => 0 is_ip("255.255.255.255") => 1 is_ip("0.0.0.0") => 1 -- Library Function: string revip (string IP) Reverses octets in IP, which must be a valid string representation of an IPv4 address. Example: 'revip("127.0.0.1") => "1.0.0.127"' -- Library Function: string verp_extract_user (string EMAIL, string DOMAIN) If EMAIL is a valid VERP-style email address for DOMAIN, this function returns the user name, corresponding to that email. Otherwise, it returns empty string. verp_extract_user("gray=gnu.org.ua@tuhs.org", 'gnu\..*') => "gray"  File: mailfromd.info, Node: String formatting, Next: Character Type, Prev: String manipulation, Up: Library 5.3 String formatting ===================== -- Built-in Function: string sprintf (string FORMAT, ...) The function 'sprintf' formats its argument according to FORMAT (see below) and returns the resulting string. It takes varying number of parameters, the only mandatory one being FORMAT. Format string ------------- The format string is a simplified version of the format argument to C 'printf'-family functions. The format string is composed of zero or more "directives": ordinary characters (not '%'), which are copied unchanged to the output stream; and "conversion specifications", each of which results in fetching zero or more subsequent arguments. Each conversion specification is introduced by the character '%', and ends with a conversion specifier. In between there may be (in this order) zero or more "flags", an optional "minimum field width", and an optional "precision". Notice, that in practice that means that you should use single quotes with the FORMAT arguments, to protect conversion specifications from being recognized as variable references (*note singe-vs-double::). No type conversion is done on arguments, so it is important that the supplied arguments match their corresponding conversion specifiers. By default, the arguments are used in the order given, where each '*' and each conversion specifier asks for the next argument. If insufficiently many arguments are given, 'sprintf' raises 'e_range' exception. One can also specify explicitly which argument is taken, at each place where an argument is required, by writing '%M$', instead of '%' and '*M$' instead of '*', where the decimal integer M denotes the position in the argument list of the desired argument, indexed starting from 1. Thus, sprintf('%*d', width, num); and sprintf('%2$*1$d', width, num); are equivalent. The second style allows repeated references to the same argument. Flag characters --------------- The character '%' is followed by zero or more of the following "flags": '#' The value should be converted to an "alternate form". For 'o' conversions, the first character of the output string is made zero (by prefixing a '0' if it was not zero already). For 'x' and 'X' conversions, a non-zero result has the string '0x' (or '0X' for 'X' conversions) prepended to it. Other conversions are not affected by this flag. '0' The value should be zero padded. For 'd', 'i', 'o', 'u', 'x', and 'X' conversions, the converted value is padded on the left with zeros rather than blanks. If the '0' and '-' flags both appear, the '0' flag is ignored. If a precision is given, the '0' flag is ignored. Other conversions are not affected by this flag. '-' The converted value is to be left adjusted on the field boundary. (The default is right justification.) The converted value is padded on the right with blanks, rather than on the left with blanks or zeros. A '-' overrides a '0' if both are given. '' ' (a space)' A blank should be left before a positive number (or empty string) produced by a signed conversion. '+' A sign ('+' or '-') always be placed before a number produced by a signed conversion. By default a sign is used only for negative numbers. A '+' overrides a space if both are used. Field width ----------- An optional decimal digit string (with nonzero first digit) specifying a minimum field width. If the converted value has fewer characters than the field width, it will be padded with spaces on the left (or right, if the left-adjustment flag has been given). Instead of a decimal digit string one may write '*' or '*M$' (for some decimal integer M) to specify that the field width is given in the next argument, or in the M-th argument, respectively, which must be of numeric type. A negative field width is taken as a '-' flag followed by a positive field width. In no case does a non-existent or small field width cause truncation of a field; if the result of a conversion is wider than the field width, the field is expanded to contain the conversion result. Precision --------- An optional precision, in the form of a period ('.') followed by an optional decimal digit string. Instead of a decimal digit string one may write '*' or '*M$' (for some decimal integer M) to specify that the precision is given in the next argument, or in the M-th argument, respectively, which must be of numeric type. If the precision is given as just '.', or the precision is negative, the precision is taken to be zero. This gives the minimum number of digits to appear for 'd', 'i', 'o', 'u', 'x', and 'X' conversions, or the maximum number of characters to be printed from a string for the 's' conversion. Conversion specifier -------------------- A character that specifies the type of conversion to be applied. The conversion specifiers and their meanings are: d i The numeric argument is converted to signed decimal notation. The precision, if any, gives the minimum number of digits that must appear; if the converted value requires fewer digits, it is padded on the left with zeros. The default precision is '1'. When '0' is printed with an explicit precision '0', the output is empty. o u x X The numeric argument is converted to unsigned octal ('o'), unsigned decimal ('u'), or unsigned hexadecimal ('x' and 'X') notation. The letters 'abcdef' are used for 'x' conversions; the letters 'ABCDEF' are used for 'X' conversions. The precision, if any, gives the minimum number of digits that must appear; if the converted value requires fewer digits, it is padded on the left with zeros. The default precision is '1'. When '0' is printed with an explicit precision 0, the output is empty. s The string argument is written to the output. If a precision is specified, no more than the number specified of characters are written. % A '%' is written. No argument is converted. The complete conversion specification is '%%'.  File: mailfromd.info, Node: Character Type, Next: Email processing functions, Prev: String formatting, Up: Library 5.4 Character Type ================== These functions check whether all characters of STR fall into a certain character class according to the 'C' ('POSIX') locale(1). 'True' (1) is returned if they do, 'false' (0) is returned otherwise. In the latter case, the global variable 'ctype_mismatch' is set to the index of the first character that is outside of the character class (characters are indexed from 0). -- Built-in Function: boolean isalnum (string STR) Checks for alphanumeric characters: isalnum("a123") => 1 isalnum("a.123") => 0 (ctype_mismatch = 1) -- Built-in Function: boolean isalpha (string STR) Checks for an alphabetic character: isalnum("abc") => 1 isalnum("a123") => 0 -- Built-in Function: boolean isascii (string STR) Checks whether all characters in STR are 7-bit ones, that fit into the ASCII character set. isascii("abc") => 1 isascii("ab\0200") => 0 -- Built-in Function: boolean isblank (string STR) Checks if STR contains only blank characters; that is, spaces or tabs. -- Built-in Function: boolean iscntrl (string STR) Checks for control characters. -- Built-in Function: boolean isdigit (string STR) Checks for digits (0 through 9). -- Built-in Function: boolean isgraph (string STR) Checks for any printable characters except spaces. -- Built-in Function: boolean islower (string STR) Checks for lower-case characters. -- Built-in Function: boolean isprint (string STR) Checks for printable characters including space. -- Built-in Function: boolean ispunct (string STR) Checks for any printable characters which are not a spaces or alphanumeric characters. -- Built-in Function: boolean isspace (string STR) Checks for white-space characters, i.e.: space, form-feed ('\f'), newline ('\n'), carriage return ('\r'), horizontal tab ('\t'), and vertical tab ('\v'). -- Built-in Function: boolean isupper (string STR) Checks for uppercase letters. -- Built-in Function: boolean isxdigit (string STR) Checks for hexadecimal digits, i.e. one of '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f', 'A', 'B', 'C', 'D', 'E', 'F'. ---------- Footnotes ---------- (1) Support for other locales is planned for future versions.  File: mailfromd.info, Node: Email processing functions, Next: Envelope modification functions, Prev: Character Type, Up: Library 5.5 Email processing functions. =============================== -- Built-in Function: number email_map (string EMAIL) Parses EMAIL and returns a bitmap, consisting of zero or more of the following flags: 'EMAIL_MULTIPLE' EMAIL has more than one email address. 'EMAIL_COMMENTS' EMAIL has comment parts. 'EMAIL_PERSONAL' EMAIL has personal part. 'EMAIL_LOCAL' EMAIL has local part. 'EMAIL_DOMAIN' EMAIL has domain part. 'EMAIL_ROUTE' EMAIL has route part. These constants are declared in the 'email.mf' module. The function 'email_map' returns 0 if its argument is not a valid email address. -- Library Function: boolean email_valid (string EMAIL) Returns 'True' (1) if EMAIL is a valid email address, consisting of local and domain parts only. E.g.: email_valid("gray@gnu.org") => 1 email_valid("gray") => 0 email_valid('"Sergey Poznyakoff ') => 0 This function is defined in 'email.mf' (*note Modules::).  File: mailfromd.info, Node: Envelope modification functions, Next: Header modification functions, Prev: Email processing functions, Up: Library 5.6 Envelope Modification Functions =================================== Envelope modification functions set sender and add or delete recipient addresses from the message envelope. This allows MFL scripts to redirect messages to another addresses. -- Built-in Function: void set_from (string EMAIL [, string ARGS]) Sets envelope sender address to EMAIL, which must be a valid email address. Optional ARGS supply arguments to ESMTP 'MAIL FROM' command. -- Built-in Function: void rcpt_add (string ADDRESS) Add the e-mail ADDRESS to the envelope. -- Built-in Function: void rcpt_delete (string ADDRESS) Remove ADDRESS from the envelope. The following example code uses these functions to implement a simple alias-like capability: prog envrcpt do string alias dbget(aliasdb, $1, "NULL", 1) if alias != "NULL" rcpt_delete($1) rcpt_add(alias) fi done  File: mailfromd.info, Node: Header modification functions, Next: Body Modification Functions, Prev: Envelope modification functions, Up: Library 5.7 Header Modification Functions ================================= There are two ways to modify message headers in a MFL script. First is to use header actions, described in *note Actions::, and the second way is to use message modification functions. Compared with the actions, the functions offer a series of advantages. For example, using functions you can construct the name of the header to operate upon (e.g. by concatenating several arguments), something which is impossible when using actions. Moreover, apart from three basic operations (add, modify and remove), as supported by header actions, header functions allow to insert a new header into a particular place. -- Built-in Function: void header_add (string NAME, string VALUE) Adds a header 'NAME: VALUE' to the message. In contrast to the 'add' action, this function allows to construct the header name using arbitrary MFL expressions. -- Built-in Function: void header_add (string NAME, string VALUE, number IDX) This syntax is preserved for backward compatibility. It is equivalent to 'header_insert', which see. -- Built-in Function: void header_insert (string NAME, string VALUE, number IDX) This function inserts a header 'NAME: 'value'' at IDXth header position in the internal list of headers maintained by the MTA. That list contains headers added to the message either by the filter or by the MTA itself, but not the headers included in the message itself. Some of the headers in this list are conditional, e.g. the ones added by the 'H?COND?' directive in 'sendmail.cf'. MTA evaluates them after all header modifications have been done and removes those of headers for which they yield false. This means that the position at which the header added by 'header_insert' will appear in the final message will differ from IDX. -- Built-in Function: void header_delete (string NAME [, number INDEX]) Delete header NAME from the envelope. If INDEX is given, delete INDEXth instance of the header NAME. Notice the differences between this function and the 'delete' action: 1. It allows to construct the header name, whereas 'delete' requires it to be a literal string. 2. Optional INDEX argument allows to select a particular header instance to delete. -- Built-in Function: void header_replace (string NAME, string VALUE [, number INDEX]) Replace the value of the header NAME with VALUE. If INDEX is given, replace INDEXth instance of header NAME. Notice the differences between this function and the 'replace' action: 1. It allows to construct the header name, whereas 'replace' requires it to be a literal string. 2. Optional INDEX argument allows to select a particular header instance to replace. -- Library Function: void header_rename (string NAME, string NEWNAME[, number IDX]) Defined in the module 'header_rename.mf'. Available only in the 'eom' handler. Renames the IDXth instance of header NAME to NEWNAME. If IDX is not given, assumes 1. If the specified header or the IDX instance of it is not present in the current message, the function silently returns. All other errors cause run-time exception. The position of the renamed header in the header list is not preserved. The example below renames 'Subject' header to 'X-Old-Subject': require 'header_rename' prog eom do header_rename("Subject", "X-Old-Subject") done -- Library Function: void header_prefix_all (string NAME [, string PREFIX]) Defined in the module 'header_rename.mf'. Available only in the 'eom' handler. Renames all headers named NAME by prefixing them with PREFIX. If PREFIX is not supplied, removes all such headers. All renamed headers will be placed in a continuous block in the header list. The absolute position in the header list will change. Relative ordering of renamed headers will be preserved. -- Library Function: void header_prefix_pattern (string PATTERN, string PREFIX) Defined in the module 'header_rename.mf'. Available only in the 'eom' handler. Renames all headers with names matching PATTERN (in the sense of 'fnmatch', *note fnmatches: Special comparisons.) by prefixing them with PREFIX. All renamed headers will be placed in a continuous block in the header list. The absolute position in the header list will change. Relative ordering of renamed headers will be preserved. If called with one argument, removes all headers matching PATTERN. For example, to prefix all headers beginning with 'X-Spamd-' with an additional 'X-': require 'header_rename' prog eom do header_prefix_pattern("X-Spamd-*", "X-") done  File: mailfromd.info, Node: Body Modification Functions, Next: Message modification queue, Prev: Header modification functions, Up: Library 5.8 Body Modification Functions =============================== Body modification is an experimental feature of MFL. The version 8.8 provides only one function for that purpose. -- Built-in Function: void replbody (string TEXT) Replace the body of the message with TEXT. Notice, that TEXT must not contain RFC 822 headers. See the previous section if you want to manipulate message headers. Example: replbody("Body of this message has been removed by the mail filter.") No restrictions are imposed on the format of TEXT. -- Built-in Function: void replbody_fd (number FD) Replaces the body of the message with the content of the stream FD. Use this function if the body is very big, or if it is returned by an external program. Notice that this function starts reading from the current position in FD. Use 'rewind' if you wish to read from the beginning of the stream. The example below shows how to preprocess the body of the message using external program '/usr/bin/mailproc', which is supposed to read the body from its standard input and write the processed text to its standard output: number fd # Temporary file descriptor prog data do # Open the temporary file set fd tempfile() done prog body do # Write the body to it. write_body(fd, $1, $2) done prog eom do # Use the resulting stream as the stdin to the mailproc # command and read the new body from its standard output. rewind(fd) replbody_fd(spawn("" message_header_encode(string, "ISO-8859-1") => "=?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= " -- Built-in Function: string message_header_decode (string TEXT, [string CHARSET]) TEXT must be a header value encoded in accordance with RFC 2047. The function returns the decoded string. If the decoding fails, it raises 'e_failure' exception. The optional argument CHARSET specifies the character set to use (default - 'UTF-8'). set string "=?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= " message_header_decode(string) => "Keld Jørn Simonsen " -- Built-in Function: string unfold (string TEXT) If TEXT is a "folded" multi-line RFC 2822 header value, unfold it. If TEXT is a single-line string, return its unchanged copy. For example, suppose that the message being processed contained the following header: List-Id: Sent bugreports to Then, applying 'unfold' to its value(1) will produce: Sent bugreports to ---------- Footnotes ---------- (1) For example: prog header do echo unfold($2) done  File: mailfromd.info, Node: Mail body functions, Next: EOM Functions, Prev: Mail header functions, Up: Library 5.11 Mail Body Functions ======================== -- Built-in Function: string body_string (pointer TEXT, number COUNT) Converts first COUNT bytes from the memory location pointed to by TEXT into a regular string. This function is intended to convert the '$1' argument passed to a 'body' handler to a regular MFL string. For more information about its use, see *note body handler::. -- Built-in Function: bool body_has_nulls (pointer TEXT, number COUNT) Returns 'True' if first COUNT bytes of the string pointed to by TEXT contain ASCII NUL characters. Example: prog body do if body_has_nulls($1, $2) reject fi done