apache-mime4j: UnstructuredField cannot be cast to ContentTypeField

852 views Asked by At

I have the below groovy code which checks the mime type of a byte [], below apache references are part of apache-mime4j-0.6.jar

import org.apache.james.mime4j.message.Message
import org.apache.james.mime4j.message.Multipart
import org.apache.james.mime4j.message.BodyPart

def processFiledata(filedata){
    Message file = new Message(new ByteArrayInputStream(fileData));
    for (BodyPart part : ((Multipart) file.getBody()).getBodyParts()) {
        if (part.getMimeType().equalsIgnoreCase("text/plain")) { //exception is thrown from this line
            //some logic
        }
}

This code used to work and am not really sure why its throwing below exception now

    java.lang.ClassCastException: org.apache.james.mime4j.field.UnstructuredField cannot be cast to org.apache.james.mime4j.field.ContentTypeField
    at org.apache.james.mime4j.message.Entity.getMimeType(Entity.java:289)

Below is my sample mime message which am reading as ByteArrayInputStream and trying to parse its mime type

MIME-Version: 1.0
Date: Tue, 28 Feb 2017 21:54:17 +1
Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg="SHA256"; boundary="b2971ac914bc41038c7e8412fee3c44c"


--b2971ac914bc41038c7e8412fee3c44c
Content-Type: text/plain; charset=us-ascii

LEDES98BI V2[]
INVOICE_DATE|INVOICE_NUMBER|CLIENT_ID|LAW_FIRM_MATTER_ID|INVOICE_TOTAL|BILLING_START_DATE|BILLING_END_DATE|INVOICE_DESCRIPTION|LINE_ITEM_NUMBER|EXP/FEE/INV_ADJ_TYPE|LINE_ITEM_NUMBER_OF_UNITS|LINE_ITEM_ADJUSTMENT_AMOUNT|LINE_ITEM_TOTAL|LINE_ITEM_DATE|LINE_ITEM_TASK_CODE|LINE_ITEM_EXPENSE_CODE|LINE_ITEM_ACTIVITY_CODE|TIMEKEEPER_ID|LINE_ITEM_DESCRIPTION|LAW_FIRM_ID|LINE_ITEM_UNIT_COST|TIMEKEEPER_NAME|TIMEKEEPER_CLASSIFICATION|CLIENT_MATTER_ID|PO_NUMBER|CLIENT_TAX_ID|MATTER_NAME|INVOICE_TAX_TOTAL|INVOICE_NET_TOTAL|INVOICE_CURRENCY|TIMEKEEPER_LAST_NAME|TIMEKEEPER_FIRST_NAME|ACCOUNT_TYPE|LAW_FIRM_NAME|LAW_FIRM_ADDRESS_1|LAW_FIRM_ADDRESS_2|LAW_FIRM_CITY|LAW_FIRM_STATEorREGION|LAW_FIRM_POSTCODE|LAW_FIRM_COUNTRY|CLIENT_NAME|CLIENT_ADDRESS_1|CLIENT_ADDRESS_2|CLIENT_CITY|CLIENT_STATEorREGION|CLIENT_POSTCODE|CLIENT_COUNTRY|LINE_ITEM_TAX_RATE|LINE_ITEM_TAX_TOTAL|LINE_ITEM_TAX_TYPE|INVOICE_REPORTED_TAX_TOTAL|INVOICE_TAX_CURRENCY[]
19990225|96542|00711|0528|1684.45|19990101|19990131|For services rendered|1|F|2.00|-70|630|19990115|L510||A102|22547|Research Attorney's fees, Set off claim|24-6437381|350|Arnsley, Robert|PARTNR|423-987|77654|76-1235|Merten Merger|694.20|22240.25|GBP|Arnsley|Robert|O|||||||||||||||.16|100.80|VAT|100.80|[]
--b2971ac914bc41038c7e8412fee3c44c
Content-Transfer-Encoding: base64
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Disposition: attachment; filename="smime.p7s"

Can someone please help me to fix it?

1

There are 1 answers

0
Hugues M. On

That looks like a mime4j bug to me, so as requested in comments where I suggested to upgrade to the very different "new" version 0.7.2 (there is also an upcoming 0.8 in the works), here is an example.

You no longer build a Message object, there are 2 different styles as explained in usage page, I picked the second one because it was clearer to me how to get the mime type of a body part:

def fileInputStream = new FileInputStream("/path/to/message.msg");
def parser = new MimeStreamParser();
parser.setContentHandler(new AbstractContentHandler() {
    void body(BodyDescriptor bd, InputStream is) {
        if ("text/plain".equals(bd.getMimeType())) {
            println("Body: " + is.text);
        }
    }
})
parser.parse(fileInputStream);

If I run that on the following sample message file (courtesy of MSDN):

From: John Doe <[email protected]>
MIME-Version: 1.0
Content-Type: multipart/mixed;
        boundary="XXXXboundary text"

This is a multipart message in MIME format.

--XXXXboundary text 
Content-Type: text/plain

this is the body text

--XXXXboundary text 
Content-Type: text/plain;
Content-Disposition: attachment;
        filename="test.txt"

this is the attachment text

--XXXXboundary text--

I get:

Body: this is the body text

Body: this is the attachment text

Hopefully the bug (if there is one) is fixed in that version. If you still get an issue, we will need to get more information on the structural content of your file.

Edit: with the input you provided, the above works, and gives this:

Body: LEDES98BI V2[]
INVOICE_DATE|INVOICE_NUMBER|CLIENT_ID|LAW_FIRM_MATTER_ID|INVOICE_TOTAL|BILLING_START_DATE|BILLING_END_DATE|INVOICE_DESCRIPTION|LINE_ITEM_NUMBER|EXP/FEE/INV_ADJ_TYPE|LINE_ITEM_NUMBER_OF_UNITS|LINE_ITEM_ADJUSTMENT_AMOUNT|LINE_ITEM_TOTAL|LINE_ITEM_DATE|LINE_ITEM_TASK_CODE|LINE_ITEM_EXPENSE_CODE|LINE_ITEM_ACTIVITY_CODE|TIMEKEEPER_ID|LINE_ITEM_DESCRIPTION|LAW_FIRM_ID|LINE_ITEM_UNIT_COST|TIMEKEEPER_NAME|TIMEKEEPER_CLASSIFICATION|CLIENT_MATTER_ID|PO_NUMBER|CLIENT_TAX_ID|MATTER_NAME|INVOICE_TAX_TOTAL|INVOICE_NET_TOTAL|INVOICE_CURRENCY|TIMEKEEPER_LAST_NAME|TIMEKEEPER_FIRST_NAME|ACCOUNT_TYPE|LAW_FIRM_NAME|LAW_FIRM_ADDRESS_1|LAW_FIRM_ADDRESS_2|LAW_FIRM_CITY|LAW_FIRM_STATEorREGION|LAW_FIRM_POSTCODE|LAW_FIRM_COUNTRY|CLIENT_NAME|CLIENT_ADDRESS_1|CLIENT_ADDRESS_2|CLIENT_CITY|CLIENT_STATEorREGION|CLIENT_POSTCODE|CLIENT_COUNTRY|LINE_ITEM_TAX_RATE|LINE_ITEM_TAX_TOTAL|LINE_ITEM_TAX_TYPE|INVOICE_REPORTED_TAX_TOTAL|INVOICE_TAX_CURRENCY[]
19990225|96542|00711|0528|1684.45|19990101|19990131|For services rendered|1|F|2.00|-70|630|19990115|L510||A102|22547|Research Attorney's fees, Set off claim|24-6437381|350|Arnsley, Robert|PARTNR|423-987|77654|76-1235|Merten Merger|694.20|22240.25|GBP|Arnsley|Robert|O|||||||||||||||.16|100.80|VAT|100.80|[]