Mime4j: DefaultMessageBuilder fails to parse mbox content

1.3k views Asked by At

I've downloaded mime4j 0.8.0 snapshot from subversion and built it with maven. The relevant jars I generated can be found here.

Now I try to parse a toy mbox file from mime4j test.

I use this sample code. Briefly:

final File mbox = new File("c:\\mbox.rlug");
int count = 0;
for (CharBufferWrapper message : MboxIterator.fromFile(mbox).charset(ENCODER.charset()).build()) {
    System.out.println(messageSummary(message.asInputStream(ENCODER.charset())));
    count++;
}
System.out.println("Found " + count + " messages");

+

private static String messageSummary(InputStream messageBytes) throws IOException, MimeException {
    MessageBuilder builder = new DefaultMessageBuilder();
    Message message = builder.parseMessage(messageBytes);
    return String.format("\nMessage %s \n" +
            "Sent by:\t%s\n" +
            "To:\t%s\n",
            message.getSubject(),
            message.getSender(),
            message.getTo());
}

The output is:

Message null Sent by: null To: null

Message null Sent by: null To: null

Message null Sent by: null To: null

Message null Sent by: null To: null

Message null Sent by: null To: null

Found 5 messages

There are indeed 5 messages, but why are all fields null?

3

There are 3 answers

1
zvisofer On BEST ANSWER

I found the problem.

DefaultMessageBuilder fails to parse mbox files that have windows line separator \r\n. When replacing them with UNIX line separator \n the parsing works.

This is a critical issue, since the mbox files downloaded from Gmail use \r\n.

1
GreyBeardedGeek On

I downloaded your jar files, the sample code that you pointed to, and the sample mbox file that you pointed to, compiled the sample (with no changes) and ran it against the sample mbox file.

It worked as expected (fields contained the expected data, not nulls). This was on a Mac with Java 1.6_0_65, and also with 1.8.0_11

Output was as follows:

$ java -cp .:apache-mime4j-core-0.8.0-SNAPSHOT.jar:apache-mime4j-dom-0.8.0-SNAPSHOT.jar:apache-mime4j-mbox-iterator-0.8.0-SNAPSHOT.jar IterateOverMbox mbox.rlug.txt

Message Din windows ma pot, din LINUX NU ma pot conecta (la ZAPP) Sent by: [email protected] To: [[email protected]]

Message Re: RH 8.0 boot floppy Sent by: [email protected] To: [[email protected]]

Message Qmail mysql virtualusers +ssl + smtp auth +pop3 Sent by: [email protected] To: [[email protected]]

Message Re: Din windows ma pot, din LINUX NU ma pot conecta (la ZAPP) Sent by: [email protected] To: [[email protected]]

Message LSTP problem - solved Sent by: [email protected] To: [[email protected]]

Found 5 messages Done in: 108 milis

3
ToYonos On

Based on @zvisofer answer, I found the guilty piece of code in BufferedLineReaderInputStream:

@Override
public int readLine(final ByteArrayBuffer dst)
        throws MaxLineLimitException, IOException {
    if (dst == null) {
        throw new IllegalArgumentException("Buffer may not be null");
    }
    if (!readAllowed()) return -1;

    int total = 0;
    boolean found = false;
    int bytesRead = 0;
    while (!found) {
        if (!hasBufferedData()) {
            bytesRead = fillBuffer();
            if (bytesRead == -1) {
                break;
            }
        }
        int i = indexOf((byte)'\n');
        int chunk;
        if (i != -1) {
            found = true;
            chunk = i + 1 - pos();
        } else {
            chunk = length();
        }
        if (chunk > 0) {
            dst.append(buf(), pos(), chunk);
            skip(chunk);
            total += chunk;
        }
        if (this.maxLineLen > 0 && dst.length() >= this.maxLineLen) {
            throw new MaxLineLimitException("Maximum line length limit exceeded");
        }
    }
    if (total == 0 && bytesRead == -1) {
        return -1;
    } else {
        return total;
    }
}

The best thing to do would be to report the bug but here is a fix, a little dirty but it's working fine

Create the class org.apache.james.mime4j.io.BufferedLineReaderInputStream in your project

Replace the method public int readLine(final ByteArrayBuffer dst) by this one :

@Override
public int readLine(final ByteArrayBuffer dst)
        throws MaxLineLimitException, IOException {
    if (dst == null) {
        throw new IllegalArgumentException("Buffer may not be null");
    }
    if (!readAllowed()) return -1;

    int total = 0;
    boolean found = false;
    int bytesRead = 0;
    while (!found) {
        if (!hasBufferedData()) {
            bytesRead = fillBuffer();
            if (bytesRead == -1) {
                break;
            }
        }

        int chunk;
        int i = indexOf((byte)'\r');
        if (i != -1) {
            found = true;
            chunk = i + 2 - pos();
        } else {
            i = indexOf((byte)'\n');
            if (i != -1) {
                found = true;
                chunk = i + 1 - pos();
            } else {
                chunk = length();
            }
        }
        if (chunk > 0) {
            dst.append(buf(), pos(), chunk);
            skip(chunk);
            total += chunk;
        }
        if (this.maxLineLen > 0 && dst.length() >= this.maxLineLen) {
            throw new MaxLineLimitException("Maximum line length limit exceeded");
        }
    }
    if (total == 0 && bytesRead == -1) {
        return -1;
    } else {
        return total;
    }
}

Enjoy both unix and dos files :)