diff options
author | John Ankarström <john@ankarstrom.se> | 2021-05-20 03:28:57 +0200 |
---|---|---|
committer | John Ankarström <john@ankarstrom.se> | 2021-05-20 03:28:57 +0200 |
commit | eb5b7969f4d455307f5e3d82807c461ebe18fb86 (patch) | |
tree | 4661d319ebbf57878d23468a32d3d3d216e07cfd | |
parent | cdf4df2126bb0fe5c33bcb77992f3877da290c17 (diff) | |
download | mum-eb5b7969f4d455307f5e3d82807c461ebe18fb86.tar.gz |
Add Message-Length header, remove custom headers from mbox
This way, the mbox file follows the traditional mbox format.
-rw-r--r-- | doc/mum.ms | 139 | ||||
-rwxr-xr-x | src/pop | 26 |
2 files changed, 128 insertions, 37 deletions
@@ -1,44 +1,147 @@ +.de Q +\\$3\*Q\\$1\*U\\$2 +.. +.de EX +.LD +.ft C +.. +.de EE +.ft +.DE +.. .TL mum \(en modern UNIX mail interface .AU John Ankarström -.SH +.SH \" ----------------------------------------------------------------- Introduction .LP Mum is a text-based e-mail client for UNIX and UNIX-like operating systems that supports both plain-text and HTML e-mail. -It introduces a new method for local storage of e-mail, called -indexed mbox. -Furthermore, it uses -.I views +It introduces a couple of innovations to the landscape of UNIX +e-mail clients: +.IP \h'2n'1. +Reasonable support for HTML e-mail out of the box. +.IP \h'2n'2. +A new method for local storage of e-mail, called +.I "indexed mbox" . +.IP \h'2n'3. +.I Views \(en simple scripts that filter messages \(en instead of folders. -.PP +.LP In this document, the fundamental concepts of mum are explained. -.SH +.SH \" ----------------------------------------------------------------- The indexed mbox format .LP There are two popular methods for local storage of e-mail on UNIX systems: mbox and Maildir. Maildir is a powerful but complicated solution, while mbox is a -simple but inefficient solution. The "indexed mbox" format introduced -by mum builds on the mboxcl2 format, but enhances it with an -additional file called an +simple but inefficient solution. +.PP +The +.Q "indexed mbox" +format introduced by mum builds on the traditional mbox format, but +enhances it with an additional file called an .I index , which carries the same name as the mbox plus the extension .I .i . The index contains all headers from the mbox file, including the .I From_ line, without the actual contents of the corresponding messages. -Each block of headers contains an additional header called +Further, each block of headers contains three additional headers: +.IP \h'2n'1. +.I UID , +containing the unique identifier of the message provided by the +mail server (optional). +.IP \h'2n'2. .I Offset , -which contains the position of the corresponding message in the -mbox file, described as a byte offset. -Additionally, a -.I Content-Length -header is included in both mbox and mbox.i. -(Note further that the mbox and mbox.i files are append-only.) -.PP +containing the starting position of the corresponding message in +the original mbox file, described as a byte offset. +(It is important to note that the mbox and mbox.i files are +append-only.) +.IP \h'2n'3. +.I Message-Length , +containing the length of the entire message in the mbox file, +including both headers and body, in number of bytes. +.LP Mum and its associated view scripts use the index for most operations. Whenever it is time to read the actual contents of a message, the message is retrieved from the mbox using the offset specified in the index. +.SH \" ----------------------------------------------------------------- +Retrieval methods +.LP +Being extensible by nature, mum supports a potentially infinite +number of methods for e-mail retrieval. +By default, included with mum is a script called +.I pop , +which downloads messages from a mail server via POP3, simultaneously +creating an index for them. +The Post Office Protocol or POP is the recommended e-mail retrieval +method for mum. +.PP +If you want to save a copy of sent messages on the server, you can +use IMAP instead of POP. +The +.I imap +script, included with mum, synchronizes the mbox file with the mail +server in an intelligent way: +.IP \h'2n'a) +For any new messages in the remote INBOX folder, it downloads and +appends them to the mbox file. +.IP \h'2n'b) +For any messages in the mbox file that are sent by your own e-mail +address, it uploads them to the remote Sent folder. +.LP +The default +.I imap +script does not support any folders other than INBOX and Sent, as +mum eschews the concept of folders for scriptable views. +.PP +Additionally, mum can be used with locally stored mbox files. +The default mum distribution includes the +.I index +script, which builds an index from a pre-existing mbox file. +It supports a variety of mbox formats. +.SH \" ----------------------------------------------------------------- +Views +.LP +What mum calls +.Q views +are simple scripts that filter the messages in the mbox index +according to some criteria. +The IMAP protocol, along with many e-mail clients, has a concept +of folders: incoming mail is put in the Inbox folder, outgoing mail +in the Sent folder, junk mail in the Junk folder and so forth. +In mum, views serve the same purpose: a script named +.I inbox +extracts all mail sent from e-mail addresses other than your own, +a script named +.I sent +all mail sent from your own e-mail address, a script named +.I junk +all mail with a certain header indicating that it is junk, and so +forth. +.PP +However, because views are scripts, they are much more powerful and +dynamic. +One might have a script called +.I amazon +that extracts all mail sent from Amazon, or even a script called +.I services +that extracts all mail sent from a range of companies and services. +The author of this document, for example, uses plus-addressing to +separate mail sent from different vendors. +With that assumption in mind, a +.I services +script might look like the following: +.EX +.in +5n +#!/usr/bin/perl -00 -n +print if /^Delivered-To: [^@]+\\+(amazon|apple|ebay|...)\\@/m +.EE +.LP +On the author's system, this script takes circa 0.07 seconds to +filter through an mbox index with 2000 messages (or, in other words, +slightly less than the average time it takes for the Python interpreter +just to start). @@ -96,25 +96,11 @@ for my $id (@ids) { $from = 'MAILER-DAEMON@' . hostname if not $from; my $from_ = "From $from $date"; - # Add UID header - unshift @msg, "UID: $uids{$id}"; $j++; - - # Add Content-Length header - my ($header_length, $body_length, $content_length); - $header_length += length($_)+1 for (@msg[0..$j-1]); + # Calculate message length + my ($head_length, $body_length, $message_length); + $head_length += length($_)+1 for (@msg[0..$j-1]); $body_length += length($_)+1 for (@msg[$j..$#msg]); - $content_length = length($from_) + 1 + $header_length + $body_length; - - # - Add length of Content-Length header to Content-Length - my $new = $content_length; - my $prev = 0; - until ($new == $prev) { - $prev = $new; - $new = $content_length + length "Content-Length: $content_length\n"; - } - $content_length = $new; - - unshift @msg, "Content-Length: $content_length"; $j++; + $message_length = length($from_) + 1 + $head_length + $body_length; # Append message to mbox and index files local $" = "\n"; @@ -126,7 +112,9 @@ $from_ MBOX print $index <<INDEX; $from_ +UID: $uids{$id} Offset: $offset +Message-Length: $message_length @msg[0..$j-1] INDEX @@ -134,7 +122,7 @@ INDEX exit 130 if $sigint; # Set offset for next message - $offset += $content_length + 1; + $offset += $message_length + 1; } print STDERR "\n"; |