02 March 2006

Thoughts on MTA throughput Part 1

A short while ago I was on a conference call with a certain MTA vendor who shall remain nameless (yes that means I am not terribly impressed. Nor even slightly, if at all.). They claimed their server was "500 times faster than Postfix" in "relay only mode". They also claimed to have a throughput of "two million messages per hour". On the same OS/Hardware.

Needless to say, as a Postfix admin I was stunned. I was not stunned that anything could be faster. After all I could write a simple relay only MTA that outperforms Postfix. What stuns me is the simple math in the above statement. Combine the claim of "two million messages/hour" with "500 times faster than Postfix" and you can see the source of my astonishment. They have been making this claim for years.

For those who did not just whip out a calculator or do some simple mental math, that means they believe Postfix is only capable of about 4,000 messages per hour. *sniff-sniff*. Yeah, I smell it too.

First, let us consider some organization we know that use Postfix, and estimate whether or not the claim of Postfix only handling 4,000 Messages/Hour bears any validity.

Postini. Now here is a company that handles a lot of email. Their servers are Postfix servers. I do not have actual numbers of course, but they claim on their main page to handle a billion messages per day. Something tells me they are not running a over 10,000 Postfix servers. Bear in mind as well that Postini does spam, virus, and other types of validation as well. As anyone who has been in the trenches of the spam and trojan/virus battefield will tell you, more checking
means less throughout/performance.

One of the main contributors to Postfix (code and book) works for one of the larger financial services companies in the US. They use it. Suffice it to say that Postfix is demonstrably fast and capable. Oh, did I mention said vendor's MTA is written in .... JAVA?

I've personally witnessed Postfix servers processing over a million messages with a load of email checks involved. On much lesser hardware. With a full development environment and compiles running during the testing.

Am I surprised at the claim? Other than the sheer outrageous number of it, no. Marketing will do anything to bag a big-name client. Am I irritated? Yes. Yes I am. I am irritated in part because I am a Postfix guy. But what irritates me more is the complete lack of documentation and details on MTA performance testing.

What exactly is meant by "2,000,000" messages per hour? Some would say the number of emails going through the server in an hour. That is a deceptively simple and naive definition. You see, there is as much difference in real performance as there is between a Chevy Suburban with a 350 HP Motor and a Chevy Corvette with a 350HP motor.

Consider the type of emails, the details if you will. What size messages? How many recipients? How much concurrency? How many connections? What was the sustained versus burst? How long were the messages on the system before being relayed out? Over what type of network? Was connection caching used? How about DNS setup? Recipient verification?

I'll start with a few of the simple but major ones above and save the rest for later.

Recipients

How many recipients per message? This detail is one of the more important ones. A single message sent to 200 recipients is a far different scenario to handle than 200 messages sent to a single recipient each. The difference is enormous.

Consider some basic anti-spam weaponry: recipient verification, subject, and body checking. If I send a single atomic message with 200 recipients, my server system will perform 200 recipient validations, one subject check, and one body check. Or if I have multiple checks it will be M*1, where M is the number of checks on the subject or header.

On the other hand, if I am to process 200 messages, one to each recipient, I have a situation where I am doing 200 recipient checks (no change), 200 subject (M*N where M=checks and N=messages) and 200 (M*N again) body checks. All for the same message.[1]

Which one will perform better? Which one will see a larger performance drop when filtering is enabled versus "relay mode"?

On top of that, there is the issue of connections made. One message to two hundred recipients is a single connection to the server. Two hundred identical messages to a single recipient is two hundred connections. Any server admin worth her salt will tell you the difference betwtixt the two is more than a little significant. In general and w/o regard to any verification/filtering as discussed above the single instance, multiple recipient will have higher "throughput".

More later ...



Footnotes:
1. In theory I could make a hash (md5sum for example) of the message body and first check to see if I have already checked this message before. This would save me from repeating all those checks but it adds the overhead of generating, storing, retrieving and checking for the hash of the message for each message.

No comments: