-
-
Notifications
You must be signed in to change notification settings - Fork 494
Url Encoding
From RFC 3986 (2005)
2.2 Reserved Characters
URIs include components and subcomponents that are delimited by characters in the "reserved" set. These characters are called "reserved" because they may (or may not) be defined as delimiters by the generic syntax, by each scheme-specific syntax, or by the implementation-specific syntax of a URI's dereferencing algorithm. If data for a URI component would conflict with a reserved character's purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed.
reserved = gen-delims / sub-delims gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
2.3 Unreserved Characters
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
2.4 When to Encode or Decode
Because the percent ("%") character serves as the indicator for percent-encoded octets, it must be percent-encoded as "%25" for that octet to be used as data within a URI
The following subsets are specified based on URI syntax component:
The following are two example URIs and their component parts:
foo://example.com:8042/over/there?name=ferret#nose
\_/ \______________/\_________/ \_________/ \__/
| | | | |
scheme authority path query fragment
| _____________________|__
/ \ / \
urn:example:animal:ferret:nose
scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
authority = [ userinfo "@" ] host [ ":" port ]
path = ... pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
query = *( pchar / "/" / "?" )
fragment = *( pchar / "/" / "?" )
While RFC 3986 defines what characters are permitted in a query string, data is encoded in the query string with form url-encoding. This encoding algorithm is defined by W3C application/x-www-form-urlencoded
encoding algorithm and RFC 1738 (1994).
For each character in the entry's name and value that cannot be expressed using the selected character encoding, replace the character by a string consisting of a U+0026 AMPERSAND character (&), a "#" (U+0023) character, one or more ASCII digits representing the Unicode code point of the character in base ten, and finally a ";" (U+003B) character.
For each byte in the entry's name and value, apply the appropriate subsubsteps from the following list:
If the byte is 0x20 (U+0020 SPACE if interpreted as ASCII) Replace the byte with a single 0x2B byte ("+" (U+002B) character if interpreted as ASCII). If the byte is in the range 0x2A, 0x2D, 0x2E, 0x30 to 0x39, 0x41 to 0x5A, 0x5F, 0x61 to 0x7A Leave the byte as is.
Otherwise Let
s
be a string consisting of a U+0025 PERCENT SIGN character (%) followed by uppercase ASCII hex digits representing the hexadecimal value of the byte in question (zero-padded if necessary).Encode the string
s
as US-ASCII, so that it is now a byte string.Replace the byte in question in the name or value being processed by the bytes in
s
, preserving their relative order.
The use of +
for space is a fairly controversial issue and there are cases where it has been weakened to +
or %20
.
Separately, OAuth 1.0 defines the required parameter encoding for preparing the signature and authorization header as defined in RFC 5849 3.6:
* Characters in the unreserved character set as defined by [RFC3986], Section 2.3
(ALPHA, DIGIT, "-", ".", "_", "~") MUST NOT be encoded.
* All other characters MUST be encoded.
* The two hexadecimal characters used to represent encoded
characters MUST be uppercase.
It includes the following note:
This method is different from the encoding scheme used by the "application/x-www-form-urlencoded" content-type (for example, it encodes space characters as "%20" and not using the "+" character).
There seems to be variation in cookie implementations, with specs ranging from the original (cookie_spec) to the more recent RFC 6265.
cookie_spec
NAME=VALUE
"This string is a sequence of characters excluding semi-colon, comma and white space. If there is a need to place such data in the name or value, some encoding method such as URL style %XX encoding is recommended, though no encoding is defined or required."
RFC 6265
cookie-pair = cookie-name "=" cookie-value
cookie-name = token
cookie-value = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )
cookie-octet = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
; US-ASCII characters excluding CTLs,
; whitespace DQUOTE, comma, semicolon,
; and backslash
(from RFC 2616)
token = 1*<any CHAR except CTLs or separators>
separators = "(" | ")" | "<" | ">" | "@"
| "," | ";" | ":" | "\" | <">
| "/" | "[" | "]" | "?" | "="
| "{" | "}" | SP | HT
Current usage of UrlEncode
in VBA-Web:
- OAuth1 - Encode signature and header
-
ConvertToUrlEncoded
- Encode key and value (used byWebFormat.UrlEncoded
and to encodeQuerystringParams
) - UrlSegments - Segment value is encoded during replacement
- Note: Currently cookies are not encoded for requests (should be in the future)
UrlDecode
:
-
ParseUrlEncoded
(used byWebFormat.UrlEncoded
and inFacebookAuthenticator
,TodoistAuthenticator
) - Cookies - (
PlusAsSpace:=False
)
The updated UrlEncode
and UrlDecode
will have the following "modes":
-
strict
:unreserved
-only (shared with OAuth 1.0) and is the default -
form-urlencoded
: used only forWebFormat.UrlEncoded
and uses+
for space -
querystring
: subset ofstrict
andform-urlencoded
=ALPHA / DIGIT / "-" / "." / "_"
cookie
path
For query strings, form-urlencoded
will be used for WebFormat.UrlEncoded
, otherwise querystring
will be used. Also, For UrlDecode
, querystring
converts +
to space
for backwards compatibility.
strict = ALPHA / DIGIT / "-" / "." / "_" / "~"
formurlencoded = ALPHA / DIGIT / "-" / "." / "_" / "*", (space) => "+"
query = ALPHA / DIGIT / "-" / "." / "_"
cookie = strict / "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "^" / "`" / "|"
path = strict / "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" / ":" / "@"
Public Enum UrlEncodingMode
StrictUrlEncoding
FormUrlEncoding
QueryUrlEncoding
CookieUrlEncoding
PathUrlEncoding
End Enum
Public Function UrlEncode(Text As String, ..., Optional EncodingMode As UrlEncodingMode = StrictEncoding)
' ...
End Function
Public Function UrlDecode(Encoded As String, ..., Optional EncodingMode As UrlEncodingMode = StrictEncoding)
End Function
- RFC 3986, https://tools.ietf.org/html/rfc3986
- Percent Encoding, https://en.wikipedia.org/wiki/Percent-encoding
- form-urlencoded encoding algorithm, https://www.w3.org/TR/html5/forms.html#application/x-www-form-urlencoded-encoding-algorithm
- RFC 1738, https://tools.ietf.org/html/rfc1738
- RFC 5849 - OAuth 1.0, http://tools.ietf.org/html/rfc5849#section-3.6
- cookie_spec, https://curl.haxx.se/rfc/cookie_spec.html
- RFC 6265, https://tools.ietf.org/html/rfc6265
- RFC 2616, https://tools.ietf.org/html/rfc2616#section-2.2