Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further improve X500 addresses detection, don't overwrite existing address but do retain X500 address if available #73

Closed
sanastasiadis opened this issue Mar 26, 2024 · 10 comments

Comments

@sanastasiadis
Copy link
Contributor

To detect X500 addresses, the regular expression is used: "/o=[^/]+/ou=[^/]+(?:/cn=[^/]+)?"
This regular expression accepts only one appearance of "cn=".

It is possible to have X500 addresses with multiple "cn=" parts, eg: "/o=Test/ou=Unit/cn=Recipients/cn=anassta".
In this case the above regular expression will not work.

Proposed solution is to change the regular expression to search for "one or more" "cn=" parts, by changing the final "?" into "+".
Eg: "/o=[^/]+/ou=[^/]+(?:/cn=[^/]+)+".

I tested with the proposed regular expression and it works for both cases:

  1. "/o=Test/ou=Unit/cn=Recipients/cn=anassta"
  2. "/o=Test/ou=Unit/cn=anassta"
sanastasiadis added a commit to sanastasiadis/outlook-message-parser that referenced this issue Mar 26, 2024
…tion

updated X500 addresses detection, to support multiple appearances of "cn=" part
@bbottema
Copy link
Owner

Hi, thanks for helping out!

To detect X500 addresses, the regular expression is used: "/o=[^/]+/ou=[^/]+(?:/cn=[^/]+)?"
This regular expression accepts only one appearance of "cn=".

To be specific, it means it accepts at most only one. So it's optional.

Proposed solution is to change the regular expression to search for "one or more" "cn=" parts, by changing the final "?" into "+".
Eg: "/o=[^/]+/ou=[^/]+(?:/cn=[^/]+)+".

This, however, makes it mandatory (one or more).

I guess, the correct way to handle both the case you found and to be backwards compatible, is to make it zero or more. So

/o=[^/]+/ou=[^/]+(?:/cn=[^/]+)*

What do you think?

@sanastasiadis
Copy link
Contributor Author

Yes, you are right. Using "*" was an afterthought for me as well.
I will align my PR.

sanastasiadis added a commit to sanastasiadis/outlook-message-parser that referenced this issue Mar 26, 2024
bbottema added a commit that referenced this issue Apr 4, 2024
#73 correction of regular expression for X500 addresses detection
@bbottema bbottema changed the title X500 addresses detection Further improve X500 addresses detection Apr 4, 2024
@bbottema bbottema added this to the 1.13.1 milestone Apr 4, 2024
@bbottema
Copy link
Owner

bbottema commented Apr 4, 2024

Change released in 1.13.1

@bbottema bbottema closed this as completed Apr 4, 2024
@bbottema
Copy link
Owner

bbottema commented Apr 4, 2024

I may have to revert this change, as it caused a test case in Simple Java Mail to fail.

java.lang.AssertionError:
Expecting actual:
[Recipient{name='Sven Sielenkemper', address='[email protected]', type=To},
Recipient{name='[email protected]', address='[email protected]', type=To}]
to contain exactly (and in same order):
[Recipient{name='Sven Sielenkemper', address='/o=otris/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=ad5e65c5e53d40e2825d738a39904682-sielenkemper', type=To},
Recipient{name='[email protected]', address='[email protected]', type=To}]
but some elements were not found:
[Recipient{name='Sven Sielenkemper', address='/o=otris/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=ad5e65c5e53d40e2825d738a39904682-sielenkemper', type=To}]
and others were not expected:
[Recipient{name='Sven Sielenkemper', address='[email protected]', type=To}]

For the .eml and .msg files in this zip: #486 TestValidSigned messages.zip (from #40)

Perhaps you can have a look at it?

@sanastasiadis
Copy link
Contributor Author

sanastasiadis commented Apr 5, 2024

It seems this error comes from the test class EmailConverterTest of simple-java-mail, and it is the test Github486_InvalidSignedOutlookMessage.

This test compares the results of parsing between the *.eml version of the file and the *.msg version of the file.
The parsed eml version of the file is represented as the "actual" here, and the parsed msg version of the file is the "expected" one.

The "actual" list of recipients comes from simple-java-mail: org.simplejavamail.converter.internal.mimemessage.MimeMessageParser.

The "expected" list of recipients comes from outlook-message-parser:

org.simplejavamail.outlookmessageparser.OutlookMessageParser.

Apparently the "expected" list of recipients is as expected (:smiley:) and here follows an explanation:

In the file "#486 TestInvalidSignedOutlookMessage.msg" It looks that the entry for

[Recipient{name='Sven Sielenkemper', address='[email protected]', type=To}]

has both properties:

0x3003: /o=otris/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=ad5e65c5e53d40e2825d738a39904682-sielenkemper

and

0x39fe: [email protected]

However the code handles it correctly because:

  1. The code in OutlookRecipient.handleNameAddressProperty, seems it doesn't overwrite the address value if it is not null.
  2. The properties are encountered and parsed in the following order: first the 0x3003 and then the 0x39fe. So, the value of 0x3003 is not overwritten (see pt1).
  3. The regular expression is correct to match multiple "cn" occurrences.

As I see the error produced by the unit test in your comment, the "actual" result could have been produced with the previous version of the regular expression (with the ?). I reproduced the issue parsing the msg file locally.

Using the previous version of the regular expression (with the ?), the X500 address is not recognised and the address field is left null to be filled with the address property of 0x39fe which comes next when parsing.
Output:

[name: Sven Sielenkemper, address: [email protected]]
[name: [email protected], address: [email protected]]

On the other hand, the suggested version of the regular expression (with the *) matches correctly the given X500 address, and as the 0x3003 property comes first, it fills the address field. Then when parsing the 0x39fe, it is skipped as the address field is not null anymore.
Output as expected:

[name: Sven Sielenkemper, address: /o=otris/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=ad5e65c5e53d40e2825d738a39904682-sielenkemper]
[name: [email protected], address: [email protected]]

I'm now thinking that since both properties 0x3003 and 0x39fe exist, maybe the latter should overwrite the first one, because the email address has "priority" over the X500 address?
What do you think?

bbottema added a commit that referenced this issue Apr 5, 2024
…ect, because that's not really clear to me right now)
bbottema added a commit that referenced this issue Apr 5, 2024
…ect, because that's not really clear to me right now)
@bbottema
Copy link
Owner

bbottema commented Apr 5, 2024

Thank you so much for having a good look, you explained it well.

Regarding multiple apparent recipients (I'm still getting used to X500, as it strikes me as totally alien), I'm not sure myself. For easy reproduction, I've added the email and test case in this project in the branch bugfix/#73-x500-recipient-issue.

This test turns ✔ green:

@Test
public void testGithubIssue73_AttachmentNameWithSpecialCharacters()
		throws IOException {
	OutlookMessage msg = parseMsgFile("test-messages/OutlookMessage with X500 dual address.msg");

	// just assert what currently comes back from the parser, not what the parser should do, because I'm not sure yet what the parser should do
	OutlookMessageAssert.assertThat(msg).hasOnlyRecipients(
		createRecipient("Sven Sielenkemper", "/o=otris/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=ad5e65c5e53d40e2825d738a39904682-sielenkemper"),
		createRecipient("[email protected]", "[email protected]")
	);
}

I'm not sure myself what we should be expecting here.

Thoughts? Perhaps we should just compare it to what Outlook does:

Sven Sielenkemper [email protected]
sven.sielenkemper [email protected]

image

@sanastasiadis
Copy link
Contributor Author

Yes, apparently Outlook has a place for every property to accommodate them. :-)
However, in the case of outlook-message-parser there is only a single "address" field for every recipient.

The most "proper" solution maybe would be to add an extra field to the OutlookRecipient to accommodate the X500 address in case it exists? The client then would pick-up whatever serves better to him. What do you think?

@bbottema
Copy link
Owner

bbottema commented Apr 5, 2024

Indeed, that may be the best way forward. Then the primary behaviour mimics Outlook, while preserving X500 data if available.

@bbottema
Copy link
Owner

bbottema commented Apr 5, 2024

I created PR #75 for this. This seems to work well. Perhaps you can have a look as well.

bbottema added a commit that referenced this issue Apr 5, 2024
@bbottema bbottema changed the title Further improve X500 addresses detection Further improve X500 addresses detection, don't overwrite existing address but do retain X500 address if available Apr 5, 2024
@bbottema
Copy link
Owner

bbottema commented Apr 5, 2024

New fix released in 1.13.2. Thanks again!

@bbottema bbottema closed this as completed Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants