Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle embedded (INLINE) resources as attachments if the cid identifier doesn't occur in the HTML body #179

Closed
jkxyx205 opened this issue Oct 29, 2018 · 14 comments
Assignees
Milestone

Comments

@jkxyx205
Copy link

Hi, Benny Bottema

I have a problem, the details are as follows。

1. Scene description

Here is my hello.eml file which include a attachment and a embedded image exported by client application Foxmail. Open with outlook screenshot below:

a

2. Parse eml but can't get any information

Email oldEmail = EmailConverter.emlToEmail("/Users/rick/jkxyx205/tmp/hello.eml");

333

@bbottema
Copy link
Owner

Can you perform the same check for EmailConverter.emlToMimeMessage() please?

@jkxyx205
Copy link
Author

I can't parse any thing either.

return new MimeMessage(session, new ByteArrayInputStream(eml.getBytes(UTF_8)));

Maybe java.mail does not have the ability to parse this file but outlook can.

@bbottema
Copy link
Owner

Hmm, maybe this EML requires a specific javax.mail version? I'll give it a try as well, but I'm affraid this is a limitation of the underlying Java Mail framework.

@bbottema
Copy link
Owner

bbottema commented Oct 29, 2018

I'm getting a completely empty result without errors.

For what it's worth, the EML is not validating the following validation tools:

But mimevalidator.net thinks it's fine.

@bbottema
Copy link
Owner

bbottema commented Oct 29, 2018

Sorry, I got confused. It's working fine for me. Here's my result:

InputStream resourceAsStream = EmailHelper.class.getClassLoader().getResourceAsStream("test-messages/hello.eml");
Email e =  EmailConverter.emlToEmail(resourceAsStream);

image

Perhaps it wasn't clear from the API, but the method you used expetcs EML data, not a filepath string.

Email oldEmail = EmailConverter.emlToEmail("/Users/rick/jkxyx205/tmp/hello.eml"); // <-- this should be file content

@jkxyx205
Copy link
Author

Yes, you are right, what a ridiculous mistake I did.

But I found another problem, lost a embeded image as I metioned #173, but 2 attachments

image

@bbottema
Copy link
Owner

If an embedded attachment isn't actually embedded it is treated as an attachment. Is it being used in the body?

@jkxyx205
Copy link
Author

jkxyx205 commented Oct 29, 2018

Outlook treat the mail as a attachment and a embedded image.

Ok,I will continue to track it later. Thank you for your help, simple-java-mail excellent framework!!!

@jkxyx205
Copy link
Author

Hmm, finally I got the root cause.

When export EML by client foxmail which does't specify header Content-Disposition: inline; filename=image.png when it is a embedded image. If the disposition is not provided, the part be treated as attachment by simple-java-mail.

See org.simplejavamail.converter.internal.mimemessage.MimeMessageParser.java

private static void parseMimePartTree(@Nonnull final MimePart currentPart, @Nonnull final ParsedMimeMessageComponents parsedComponents) {
		for (final Header header : retrieveAllHeaders(currentPart)) {
			parseHeader(header, parsedComponents);
		}
		
		final String disposition = parseDisposition(currentPart);
		
		if (isMimeType(currentPart, "text/plain") && parsedComponents.plainContent == null && !Part.ATTACHMENT.equalsIgnoreCase(disposition)) {
			parsedComponents.plainContent = parseContent(currentPart);
		} else if (isMimeType(currentPart, "text/html") && parsedComponents.htmlContent == null && !Part.ATTACHMENT.equalsIgnoreCase(disposition)) {
			parsedComponents.htmlContent = parseContent(currentPart);
		} else if (isMimeType(currentPart, "multipart/*")) {
			final Multipart mp = parseContent(currentPart);
			for (int i = 0, count = countBodyParts(mp); i < count; i++) {
				parseMimePartTree(getBodyPartAtIndex(mp, i), parsedComponents);
			}
		} else {
			final DataSource ds = createDataSource(currentPart);
			// If the diposition is not provided, the part should be treated as attachment
			if (disposition == null || Part.ATTACHMENT.equalsIgnoreCase(disposition)) {
				parsedComponents.attachmentList.put(parseResourceName(parseContentID(currentPart), parseFileName(currentPart)), ds);
			} else if (Part.INLINE.equalsIgnoreCase(disposition)) {
				if (parseContentID(currentPart) != null) {
					parsedComponents.cidMap.put(parseContentID(currentPart), ds);
				} else {
					// contentID missing -> treat as standard attachment
					parsedComponents.attachmentList.put(parseResourceName(null, parseFileName(currentPart)), ds);
				}
			} else {
				throw new IllegalStateException("invalid attachment type");
			}
		}
	}

If I add Content-Disposition: inline; filename=image.png manually, it works fine. ˆ-ˆ

aaaaa

So my question is coming, why can't I treated as embedded image if disposition is not provided,behaves normally like Outlook.

@jkxyx205
Copy link
Author

BTW, Outlook, Mac Mail, Foxmail can parse hello.eml correctly, one attachment + one embedded image.

@bbottema bbottema reopened this Oct 31, 2018
bbottema pushed a commit that referenced this issue Nov 7, 2018
bbottema pushed a commit that referenced this issue Nov 7, 2018
@bbottema bbottema changed the title Can't parse .eml file to email correctly Handle embedded (INLINE) resources as attachments if the cid identifier doesn't occur in the HTML body Nov 7, 2018
@bbottema
Copy link
Owner

bbottema commented Nov 7, 2018

So my question is coming, why can't I treated as embedded image if disposition is not provided,behaves normally like Outlook.

The spec says the following about default Content-Disposition in case of absence:

The Content-Disposition Header Field

   Content-Disposition is an optional header field. In its absence, the
   MUA may use whatever presentation method it deems suitable.

So spec-wise, we're free to do as we see fit. However, I'm unsure of the best handling here. So all the clients you mention don't parse it 'correctly' so much as rather how they see fit. It's all correct.

Here's what I'm going to do: treat all resources with missing Content-Disposition header as INLINE. Then if an INLINE resource (like an embedded image) does not occur in the HTML body (ie. cid:myImage), treat it as an attachment instead.

This also treats embedded images with proper INLINE disposition as attachment if the image is not actually used in the HTML body.

@bbottema bbottema added this to the 5.1.0 milestone Nov 7, 2018
@bbottema bbottema self-assigned this Nov 7, 2018
bbottema pushed a commit that referenced this issue Nov 7, 2018
@bbottema
Copy link
Owner

bbottema commented Nov 7, 2018

@jkxyx205, I've released a new SNAPSHOT with the fix. Can you please verify (you'll need to add the snapshot repo).

@jkxyx205
Copy link
Author

I tested it, and the performance was as good as I expected. It was really great.

@bbottema
Copy link
Owner

Released in 5.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants