Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Domains should not be able to end with a period #65

Closed
troyhunt opened this issue Nov 30, 2023 · 10 comments
Closed

Domains should not be able to end with a period #65

troyhunt opened this issue Nov 30, 2023 · 10 comments

Comments

@troyhunt
Copy link
Contributor

I just added a failing test for this (DomainEndingInPeriodIsInvalid) after finding some junk that had made its way through.

@KonajuGames
Copy link

A trailing period in a domain is valid. It represents the root hierarchy, and is usually assumed and not printed.

@troyhunt
Copy link
Contributor Author

Is that the case for either a publicly addressable domain accessed over HTTP or the domain part of an email address? Happy to be proven wrong, but I don't think there's a single website address that has a period immediately after the TLD, is there?

@KonajuGames
Copy link

It is one of those things that may have been required in the early days of the internet when subdomains were used without the TLD being specified, but has become so common to use the full domain and omit the root hierarchy that it is now the default. It may be a better approach to remove it if present before reporting.

@troyhunt
Copy link
Contributor Author

Yeah, thought that might be the case. The decision we keep running into with this project is strict adherence to RFCs versus filtering out junk. If I was to look at the occurrences of periods at the end of domains in everything from domain searches to breach parsing to people signing up to notifications, I bet 100% of them will be parsing or user input errors rather than compliance with an RFC edge case! I'm also of the view that if you could legitimately put a period at the end of say, an address, you'd be blocked in so many places you'd quickly stop doing that!

@KonajuGames
Copy link

That is a valid view, and related to the other issues where emails may start with { or /, but in real life the users impacted by a data breach will never have that. I would be in favour of stripping them per how we use them on the internet today rather than strict RFC compliance.

@Johno-ACSLive
Copy link

@troyhunt is this still an issue? I just created a dummy file with a period at the end - [email protected]. - and the extractor removed the period as it wasn't present in the output file.

@troyhunt
Copy link
Contributor Author

Yep, still an issue. In fact, this is our only failing test right now:

public async Task DomainEndingInPeriodIsInvalid()

@Johno-ACSLive
Copy link

Interesting, OK I'll investigate further.

@Johno-ACSLive
Copy link

Johno-ACSLive commented Sep 24, 2024

The extractor was already stripping the trailing period and if the other checks were passed we are left with presumably a valid email address. I would also be in favour of this behaviour in case a system exported email addresses with a trailing period (is it likely that an export would append or maybe even store an address with a trailing period? - I assume for the majority this would not be the case).

Have there been examples where the other checks allow invalid email addresses through when there is a trailing period?

I've created a PR which invalidates email addresses with a trailing period.

@troyhunt
Copy link
Contributor Author

troyhunt commented Oct 6, 2024

I think this works perfectly for now, I can't see any other places it would cause issues, certainly all the tests pass successfully. Thanks!

@troyhunt troyhunt closed this as completed Oct 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants