Hi List, I have an interesting little story that all of you might find quite interesting. It has a happy ending but I'd also like a little feedback from the rest of you on other possible solutions to this. Six months ago I built, and pushed into production a mail server, running on 5 FreeBSD 4.1 servers, running qmail, vpopmail, sqwebmail, courier-imap, and all the trimmings. The original design intent was to develop a server that would support roughly a million email users. Scalability was, of course, of paramount importance in such a solution. The architecture is pretty standard for large shared environments. Once machine is a file server. It's got 300GB of RAID storage hanging off a scsi card and connected to the other 4 machines via a gigabit ethernet controller. That should last for quite some time I'm thinking. :-) Once I exceed that file servers ability I can slide up to 25 more file servers into the equation for nearly limitless storage and several T3's worth of mail bandwidth. That should be enough for a while. ;-) Anyway, since that time, the main problem I've been having has been the implementation of the pop before smtp authentication for relaying. The way it's implemented, by default, is pretty simple. A user POP auth's, and upon successful authentication we stuff their IP address into a file ~vpopmail/etc/open-smtp and compile that into the tcp.smtp.cdb database which tcpserver consults to determine if the IP is allowed to relay. Pretty simple stuff really. That all worked fine and dandy until somewhere around 1300 domains. I'm not sure how many users that equated to but I'll guess around 3,000. So, I had 4 mail servers, all configured identically, all sharing the same file system for local user mail spools (via NFS), and all sharing a common ~vpopmail/etc/tcp.smtp.cdb file to determine if a user is allowed to relay. At around 1300 domains we started seeing the ~vpopmail/etc/open-smtp file getting munged. At that time, each machine was seeing nearly one POP auth per second at peak times and, consequently, trying to update that file. As a result, the file got munged quite often during the middle of the day, users couldn't relay, and the phones in support started to ring. Since I already had 1300 vpasswd files strewn around the file system, the idea of converting entirely to MySQL wasn't really an appealing option. The solution then was to hack up vpopmail to use the pop-auth code that stuffed the IP's into a MySQL table. So, I quickly hacked up the code, recompiled vpopmail and shoved the new programs into production. Wahoo, the table got populated quite rapidly with hundreds of IP's and life was happy again, for a while. Two weeks ago I left work for France to spend a while with friends, drinking wine, eating well, and skiing in the Pyrenees. While I was gone, a new problem surfaced. While the IP table is being stored in MySQL, it still gets recompiled into the ~vpopmail/etc/tcp.smtp every time a POP session authenticates successfully. At this time I have some 2600 domains and over 10,000 users on the system (I wrote a perl script to figure that out by finding all the vpasswd files and adding up all the lines in the files :-)). Now that all four servers are seeing in excess of one POP auth per second, that file was getting written up to four times per second. Tcpserver would try to access the tcp.smtp.cdb file and get a stale NFS file handle and drop the connection. So, the phones started ringing because the SMTP server was intermittently dropping the connections. What to do? Well, we chose the most obvious solution. Hack up tcpserver to check our MySQL table directly instead of the .cdb file. I had one of our senior programmers tackle this and the results are great. The new enhanced tcpserver, when run with the -S flag, checks for /var/qmail/control/sql and open finding it, follows it's instructions for connecting to the sql server. Then, for every incoming SMTP connection, it checks the database for the IP and, if found, sets the RELAYCLIENT environment variable. It's pretty darned cool and works like a charm. Consequences? So far, so good. I've removed the -x tcp.smtp.cdb flag from tcpserver and only have it consult the database. The -x stuff still works, except that now I have to go back and hack up my hacked vpopmail so that it's stops rebuilding the tcp.smtp.cdb file. Shouldn't be a big deal. Then life should be good for a while. So, has anyone else run into a problem of this sort? How did you solve it?. I've emailed Dan to see if he might (not likely) like to include the SQL stuff in a released version of tcpserver but the odds of even getting a response are pretty slim. So, failing that I guess I'll release a custom version of tcpserver with SQL support. Other ideas? Matt