Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Definitive guide to parsing / stringifying querystrings #78

Open
buschtoens opened this issue Sep 9, 2013 · 2 comments
Open

Definitive guide to parsing / stringifying querystrings #78

buschtoens opened this issue Sep 9, 2013 · 2 comments

Comments

@buschtoens
Copy link
Collaborator

There are a bazillion ways a querystring could and should look. Currently our implementation has some weird oddities, hence all the issues. I'll write out a definitive guide on how we will handle parsing and stringifying in the future, that will be added to the Readme, so qs' behaviour is predictable and as close to current browser form serialization implementations as it can be.

I'll post it here first and everyone is invited to discuss about it. I want it to cover all possible edge cases.

But for now, I need to catch some sleep. Haha. 😉

@tj
Copy link
Owner

tj commented Sep 9, 2013

haha yeah, some proper docs in the readme as far as what to expect from each method would be great

@buschtoens
Copy link
Collaborator Author

Some RFCs

Specs on query strings are rare. I'll try and start to list all relevant parts here.

RFC 3986#3.4: Query

The query component contains non-hierarchical data that, along with data in the path component (Section 3.3), serves to identify a resource within the scope of the URI's scheme and naming authority (if any). The query component is indicated by the first question mark ("?") character and terminated by a number sign ("#") character or by the end of the URI.

query = *( pchar / "/" / "?" )

The characters slash ("/") and question mark ("?") may represent data within the query component. Beware that some older, erroneous implementations may not handle such data correctly when it is used as the base URI for relative references (Section 5.1), apparently because they fail to distinguish query data from path data when looking for hierarchical separators. However, as query components are often used to carry identifying information in the form of "key=value" pairs and one frequently used value is a reference to another URI, it is sometimes better for usability to avoid percent-encoding those characters.

RFC 2396#2: URI Characters and Escape Sequences

URI consist of a restricted set of characters, primarily chosen to aid transcribability and usability both in computer systems and in non-computer communications. Characters used conventionally as delimiters around URI were excluded. The restricted set of characters consists of digits, letters, and a few graphic symbols were chosen from those common to most of the character encodings and input facilities available to Internet users.

uric = reserved | unreserved | escaped

Within a URI, characters are either used as delimiters, or to represent strings of data (octets) within the delimited portions. Octets are either represented directly by a character (using the US-ASCII character for that octet ASCII]) or by an escape encoding. This representation is elaborated below.

Further reading:

RFC 2396#3.4: Query Component

The query component is a string of information to be interpreted by the resource.

query = *uric

Within a query component, the characters ";", "/", "?", ":", "@", "&", "=", "+", ",", and "$" are reserved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants