-
Notifications
You must be signed in to change notification settings - Fork 15
Some user hostnames can't be parsed #34
Comments
Seeing a lot of
|
Further discussion appears to point to ea3633f being the offending change that's suddenly causing a number of user hostnames to be unparseable. @Renegade334 thoughts on how to handle this? |
Merged @hashworks' PR to handle this for now by allowing colons and forward slashes in hostnames. Re-opening this issue because I'd like to have further discussion on a potentially better fix for this problem, preferably with @Renegade334's feedback since he contributed the original changes. |
I guess the question is whether or not to stay relatively true to the RFC for hostnames, with the addition of specific characters like Freenode cloaks, for example, are not RFC compliant in terms of characters. However, they are the "hostname" portion of the mask by virtue of being the remainder of the mask after the |
I'd say we should want the parser to work with as many networks as is feasible. For the moment, @hashworks' change makes it work on Freenode and Rizon. We may want to consider opening up the |
My personal feeling is that if the parser already has to make specific exceptions to the RFC for known use cases, then it may as well just allow any non-whitespace character. Whatever is there is clearly intended by the server to be the user's "hostname", regardless of what characters it contains. The hostname in a user mask is arguably one of the least useful components of an IRC message. I'd personally advocate taking a permissive approach, as the entire line will otherwise fail and be ignored if a non-compliant cloak "breaks" the regex. |
Fair points. Feel free to file an issue and/or PR and I'll be happy to merge it. |
For reference, the commit which caused these to break was a42375d at line 212. Previously, these cloaks would still have failed to be parsed as hostnames, but the whole |
There are 2 tests for |
@hashworks The original purpose of that test case was to ensure that invalid messages were handled correctly by the parser. However we handle this issue, the test should be updated such that the message is considered to be invalid by the parser in order to continue fulfilling the test's intent. The tests fail even with the pattern change because the proposed new pattern you've referenced requires that the first character of the hostname be alphanumeric, which Related: do we want the parser to care about the length of the hostname, so long as it is at least 1, or whether the first character of the hostname is alphanumeric or not, so long as it's separated from any subsequent message components by whitespace? |
The tests are supposed to fail, we expect Oh, I bet there are funny server operators out there who use |
Hm, I don't think I'm seeing what you're seeing, or I'm not making the same change you are.
diff --git a/src/Parser.php b/src/Parser.php
index 26cda8e..258da83 100644
--- a/src/Parser.php
+++ b/src/Parser.php
@@ -207,7 +207,7 @@ class Parser implements ParserInterface
$trailing = "(?: :?[^$null$crlf]*)";
$params = "(?P<params>$trailing?|(?:$middle{0,14}$trailing))";
$name = "[$letter$number](?:[$letter$number:\\/\\-]*[$letter$number])?";
- $host = "$name(?:\\.(?:$name)*)*";
+ $host = "[$letter$number]\S*";
$nick = "(?:[$letter$special][$letter$number$special-]*)";
$user = "(?:[^ $null$crlf@]+)";
$prefix = "(?:(?:(?P<nick>$nick)(?:!(?P<user>$user))?(?:@(?P<host>$host))?)|(?P<servername>$host))";
|
You're getting the same output that I'm getting. But I don't understand why it isn't |
Ah, I see why. + 'prefix' => ':nick!ident@-'
+ 'servername' => 'nick!ident@-' Notice that Now, look at the $prefix = "(?:(?:(?P<nick>$nick)(?:!(?P<user>$user))?(?:@(?P<host>$host))?)|(?P<servername>$host))"; The first |
The It's only the host component of the user mask that is affected by networks' cloaking policy etc. The change should therefore be made directly to the $host = "$name(?:\\.(?:$name)*)*";
$nick = "(?:[$letter$special][$letter$number$special-]*)";
$user = "(?:[^ $null$crlf@]+)";
- $prefix = "(?:(?:(?P<nick>$nick)(?:!(?P<user>$user))?(?:@(?P<host>$host))?)|(?P<servername>$host))";
+ $prefix = "(?:(?:(?P<nick>$nick)(?:!(?P<user>$user))?(?:@(?P<host>\S+))?)|(?P<servername>$host))"; ...or whatever pattern we want to allow there. |
Colons should be left in, I guess, for IPv6 address notation? (RFC2812 specifies |
Yeah, we will want to allow for the Freenode case of trailing dots, even though it's not standards-compliant. And allowing IPv6 notation also makes sense, so colons should probably stay. Not sure it makes sense to include slashes in |
Just thought I'd leave a note here, since it's still open. I know that at least Rizon supports hostnames with colour control codes directly in them, ie |
Some logs from Rizon:
It's probably color codes in hostname, like @DanielOaks said. In my opinion, it's not our job to be fully 101% compliant with RFC standards - it's servers job. Out bots should be just working with any network. My experience says, that IRC network complaint with all standards is very rare thing nowadays. So why bother and not just accept everything minus whitespaces as hostname? |
Honestly, when it comes to IRC mask splitting, the easiest way to handle it is to just allow most everything that's not whitespace in there (so long as you can split them properly from the nick/user parts). Mostly because of the issues with colour codes in hostnames, weird characters (and just plainly wrong IP addresses/hostnames) due to either people/admins setting special vanity hostnames or the weird cloaking characters and mechanisms different networks use. Trying to play whack-a-mole with the different things you'll run into and have to allow isn't usually worth it, imo. |
Hostnames on the Rizon network appear to allow IPv6 address-like strings (i.e. hexadecimal numbers with segments delimited by colons) postfixed by
:IP
, which the parser presently can't handle.The associated BNF notation from RFC 2812
From RFC 1123:
And from RFC-952:
We should probably modify the
$host
pattern to include a case for such hostnames.The text was updated successfully, but these errors were encountered: