Event-based parsing in MIME4J - how to populate a new Message from the InputStream?

765 views Asked by At

I am working with MIME4J to read MIME events from an email stack dump. I am attempting to read a given message event, as defined by the START_MESSAGE and END_MESSAGE headers, as an entire event, as I will be moving the process to a distributed filesystem eventually and need to plan for file-split boundary traversing.

For the event-based parsing in mime4j, a ContentHandler interface is required, and methods are called from it by the parser, which requires the handler be set to it. I have experimented with a sample Handler from another SO answer that extends the mime4j packaged SimpleContentHandler, but that one really only parses headers.

I am trying to build my custom ContentHandler class to gather the complete message as one event. I would then need to have the event in a temporary object so I could parse headers, their fields, and the contents of the fields out of it. The end goal is to adapt this behavior into MapReduce, so coping with the chance that one part of the email will be on one filesplit, and another part in a different filesplit is necessary.

For my custom ContentHandler, I've gotten as far as:

public class CustomContentHandler extends AbstractContentHandler {}

And for a main, I am using:

    public class Reader 
    {
     public static void main( String[] args ) throws FileNotFoundException, IOException,
     MimeException
    {

   QaContentHandler handler = new CustomContentHandler();
   MimeConfig config = new MimeConfig();
   MimeStreamParser parser = new MimeStreamParser(config);
   InputStream stream = new FileInputStream("/home/javadev1/lib/INBOX");

   parser.setContentHandler(handler);
   try 
   {
    do 
    {   
   parser.parse(stream);
   }
    while (stream.read() != -1);
       } 
   finally 
    {
           stream.close();
       }
    }

    }

So, any help on how to build the information in the handler would be really helpful. I've tried setting a new MessageImpl, then using a builder to copy a parsed stream into it, and I have also tried to build a newMessage from a parse of the stream, and then print the Message when the END_MESSAGE header is read, but it printed nulls.

I may be experiencing a conceptual blind spot, too. If that's the case, I am ok with it being pointed out. Thanks!

1

There are 1 answers

0
Wolfgang Fahl On BEST ANSWER

Here is a code excerpt that works for me. As soon as I find an interesting message with the statebased parser i switch to the dom parser to create a message object.

/**
 * check the MessageStream
 * 
 * @param in - the inputstream to check
 * @param id - the id of a message to search for
 * @param encoding - the encoding of the stream e.g. ISO-8859
 * @return - the message with the given id of null if none is found
 * @throws IOException
 * @throws MimeException
 */
public Message checkMessageStream(InputStream in, String id, Charset encoding)
        throws IOException, MimeException {
    // https://james.apache.org/mime4j/usage.html
    String messageString = new String(StreamUtils.getBytes(in));
    messageString = fixMessageString(messageString);
    InputStream instream = new ByteArrayInputStream(
            messageString.getBytes(encoding));
    MimeTokenStream stream = new MimeTokenStream();
    stream.parse(instream);
    for (EntityState state = stream.getState(); state != EntityState.T_END_OF_STREAM; state = stream
            .next()) {
        switch (state) {
        case T_BODY:
            if (debug) {
                System.out.println("Body detected, contents = "
                        + stream.getInputStream() + ", header data = "
                        + stream.getBodyDescriptor());
            }
            break;
        case T_FIELD:
            Field field = stream.getField();
            if (debug) {
                System.out.println("Header field detected: " + stream.getField());
            }
            if (field.getName().toLowerCase().equals("message-id")) {
                // System.out.println("id: " + field.getBody() + "=" + id + "?");
                if (field.getBody().equals("<" + id + ">")) {
                    InputStream messageStream = new ByteArrayInputStream(
                            messageString.getBytes(encoding));
                    Message message = MessageServiceFactory.newInstance()
                            .newMessageBuilder().parseMessage(messageStream);
                    return message;
                } else {
                    return null;
                }
            }

            break;
        case T_START_MULTIPART:
            if (debug) {
                System.out.println("Multipart message detexted," + " header data = "
                        + stream.getBodyDescriptor());
            }
            break;
        }
    }
    return null;
}