-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
re docs should state exactly which whitespace is matched by \s #118508
Labels
Comments
All 3: Zl, Zp and Zs
>>> import sys, unicodedata
>>> for i in range(sys.maxunicode):
... char = chr(i)
... if char.isspace():
... print(i, repr(char), unicodedata.category(char))
...
9 '\t' Cc
10 '\n' Cc
11 '\x0b' Cc
12 '\x0c' Cc
13 '\r' Cc
28 '\x1c' Cc
29 '\x1d' Cc
30 '\x1e' Cc
31 '\x1f' Cc
32 ' ' Zs
133 '\x85' Cc
160 '\xa0' Zs
5760 '\u1680' Zs
8192 '\u2000' Zs
8193 '\u2001' Zs
8194 '\u2002' Zs
8195 '\u2003' Zs
8196 '\u2004' Zs
8197 '\u2005' Zs
8198 '\u2006' Zs
8199 '\u2007' Zs
8200 '\u2008' Zs
8201 '\u2009' Zs
8202 '\u200a' Zs
8232 '\u2028' Zl
8233 '\u2029' Zp
8239 '\u202f' Zs
8287 '\u205f' Zs
12288 '\u3000' Zs |
Would someone like to review my pull request? |
nedbat
pushed a commit
that referenced
this issue
Sep 2, 2024
miss-islington
pushed a commit
to miss-islington/cpython
that referenced
this issue
Sep 4, 2024
…GH-119155) Clarify re syntax (cherry picked from commit 22fdb8c) Co-authored-by: Nice Zombies <[email protected]>
miss-islington
pushed a commit
to miss-islington/cpython
that referenced
this issue
Sep 4, 2024
…GH-119155) Clarify re syntax (cherry picked from commit 22fdb8c) Co-authored-by: Nice Zombies <[email protected]>
This was referenced Sep 4, 2024
hauntsaninja
pushed a commit
that referenced
this issue
Sep 4, 2024
hauntsaninja
pushed a commit
that referenced
this issue
Sep 4, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Documentation
Currently, and since 3.0 it seems, it simply states that
\s
"Matches Unicode whitespace characters (which includes[ \t\n\r\f\v]
, and also many other characters, for example the non-breaking spaces mandated by typography rules in many languages)."But "Unicode whitespace characters" seems awfully vague. Exactly which General_Category value does that correspond to? Space_Separator (Zs)? Separator (Z)? Some Python-specific selection of ""Unicode whitespace characters""? It's not entirely clear.
The 2.7 docs were better, stating that "If
UNICODE
is set, this will match the characters[ \t\n\r\f\v]
plus whatever is classified as space in the Unicode character properties database."I'd like to believe that Python 3.x uses the exact same definition for
\s
as 2.7 did, and that therefore I already have the answer to my question. I'd like to believe a lot of things. But computers don't run on belief(s).No one should have to resort to digging thru the source code for the answer to such a simple but important question.
P.S. I did try searching the interwebs for an answer to "which whitespace is matched by \s in python". Unfortunately search engines seem entirely unwilling to help. Perhaps no one else knows or wants to know. That's a shame.
Linked PRs
\s
#119155\s
(GH-119155) #123670\s
(GH-119155) #123671The text was updated successfully, but these errors were encountered: