Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Magic parameters for canned queries #842

Closed
simonw opened this issue Jun 13, 2020 · 18 comments
Closed

Magic parameters for canned queries #842

simonw opened this issue Jun 13, 2020 · 18 comments

Comments

@simonw
Copy link
Owner

simonw commented Jun 13, 2020

Now that writable canned queries (#698) have landed, it would be neat if they supported "magic" parameters - parameters that are automatically populated with:

  • the current actor ID / other actor properties
  • the current date and time
  • the user's IP or user-agent

And maybe other things potentially added by plugins.

@simonw simonw added this to the Datasette 0.45 milestone Jun 13, 2020
@simonw
Copy link
Owner Author

simonw commented Jun 13, 2020

Two potential designs:

  • _actor_id, _request_ip, _now_timestamp - so special reserved parameters
  • a SQL function: update blah set up = special('ip')

I fee the first would be easier to implement.

@simonw
Copy link
Owner Author

simonw commented Jun 18, 2020

It would be nice if Datasette didn't have to do any additional work to find e.g. _request_ip if that parameter turned out not to be used by the query.

Could I do this with a custom class that implements __getitem__() and then gets passed as SQLite arguments?

@simonw
Copy link
Owner Author

simonw commented Jun 18, 2020

Yes that can work - and using __missing__ (new in Python 3) is nicer because then the regular dictionary gets checked first:

import sqlite3

conn = sqlite3.connect(":memory:")


class Magic(dict):
    def __missing__(self, key):
        return key.upper()


conn.execute("select :name", Magic()).fetchall()

Outputs:

[('NAME',)]

@simonw
Copy link
Owner Author

simonw commented Jun 18, 2020

The _actor_id param makes this a bit trickier, because we can't just say "if you see an unknown parameter called X call this function" - our magic parameter logic isn't adding single parameters, it might add a whole family of them.

@simonw
Copy link
Owner Author

simonw commented Jun 18, 2020

If every magic parameter has a prefix and suffix, like _request_ip and _actor_id, then plugins could register a function for a prefix. Register a function to _actor and actor("id")will be called for _actor_id.

But does it make sense for every magic parameter to be of form _a_b? I think so.

@simonw
Copy link
Owner Author

simonw commented Jun 18, 2020

Brainstorming more potential magic parameters:

  • _actor_id
  • _actor_name
  • _request_ip
  • _request_user_agent
  • _cookie_cookiename
  • _signedcookie_cookiename - reading signed cookies would be cool, not sure how to specify namespace though, maybe always use the same one? Or have the namespace come last, _signedcookie_cookiename_mynamespace. Might not need special signed cookie support since actor is already usually from a signed cookie.
  • _timestamp_unix (not happy with these names yet)
  • _timestamp_localtime
  • _timestamp_datetime
  • _timestamp_utc

@simonw
Copy link
Owner Author

simonw commented Jun 18, 2020

But then what kind of magic parameters might plugins want to add?

Here's a crazy idea: _scrapedcontent_url - it would look for the url column on the data being inserted, scrape the content from it and insert that. This does suggest that the magic resolving function scrapedcontent() would need to optionally be sent the full row dictionary being inserted too.

@simonw
Copy link
Owner Author

simonw commented Jun 18, 2020

I'd be OK with the first version of this not including a plugin hook.

@simonw
Copy link
Owner Author

simonw commented Jun 24, 2020

Another magic parameter that would be useful would be _random. Consider simonw/datasette-auth-tokens#1 for example - I'd like to be able to provide a writable canned query which can create new authentication tokens in the database, but ideally it would automatically populate a secure random secret for each one.

Maybe _random_chars_128 to create a 128 character long random string (using os.urandom(64).hex()).

This would be the first example of a magic parameter where part of the parameter name is used to configure the resulting value. Maybe neater to separate that with a different character? Unfortunately _random_chars:128 wouldn't work because these parameters are used in a SQLite query where : has special meaning: insert into blah (secret) values (:_random_chars:128) wouldn't make sense.

Actually this is already supported by the proposed design - _random_chars_128 would become random("chars_128") so the random() function could split off the 128 itself.

@simonw
Copy link
Owner Author

simonw commented Jun 24, 2020

I'm building this documentation-first - here's the documentation so far: https://github.com/simonw/datasette/blob/6fc8bd9c473f4a25e0a076f24c7e5a9b2f353bb8/docs/sql_queries.rst#magic-parameters

@simonw
Copy link
Owner Author

simonw commented Jun 26, 2020

_request_ip is actually quite hard to implement - should it take into account things like the x-forwarded-for header?

It probably should - but that means it now needs a bunch of extra configuration to tell it which of those headers can be trusted in the current environment.

As such I think I'll leave that for a plugin.

@simonw
Copy link
Owner Author

simonw commented Jun 26, 2020

Maybe I should ship a default _scope_headers_... parameter instead, which reads from a dictionary of scope["headers"] - https://asgi-scope.now.sh/ shows what those look like.

{'client': ('148.64.98.14', 0),
 'headers': [[b'host', b'asgi-scope.now.sh'],
             [b'x-forwarded-for', b'148.64.98.14'],
             [b'x-vercel-id', b'sw72x-1593215573008-024e4e603806'],
             [b'x-forwarded-host', b'asgi-scope.now.sh'],
             [b'accept',
              b'text/html,application/xhtml+xml,application/xml;q=0.9,image/'
              b'webp,*/*;q=0.8'],
             [b'x-real-ip', b'148.64.98.14'],
             [b'x-vercel-deployment-url', b'asgi-scope-9eyeojbek.now.sh'],
             [b'upgrade-insecure-requests', b'1'],
             [b'x-vercel-trace', b'sfo1'],
             [b'x-forwarded-proto', b'https'],
             [b'accept-language', b'en-US,en;q=0.5'],
             [b'user-agent',
              b'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:77.0) Gecko'
              b'/20100101 Firefox/77.0'],
             [b'x-vercel-forwarded-for', b'148.64.98.14'],
             [b'accept-encoding', b'gzip, deflate, br'],
             [b'dnt', b'1'],
             [b'te', b'trailers']],
 'http_version': '1.1',
 'method': 'GET',
 'path': '/',
 'query_string': b'',
 'raw_path': b'/',
 'root_path': '',
 'scheme': 'https',
 'server': ('asgi-scope.now.sh', 80),
 'type': 'http'}

I'm going to have _request_X actually mean "find the first value for X in scope["headers"]" - with underscores converted to hyphens.

@simonw
Copy link
Owner Author

simonw commented Jun 27, 2020

Security thought: make sure it's not possible to accidentally open up a security hole where an attacker can send a GET request that causes the magic parameter _cookie_ds_actor to be resolved and returned as JSON data that the attacker can see.

@simonw simonw pinned this issue Jun 27, 2020
simonw added a commit that referenced this issue Jun 27, 2020
@simonw
Copy link
Owner Author

simonw commented Jun 27, 2020

Security thought: make sure it's not possible to accidentally open up a security hole where an attacker can send a GET request that causes the magic parameter _cookie_ds_actor to be resolved and returned as JSON data that the attacker can see.

This is an open security hole in 94c1315 - it's useful for testing, but I need to remove it before I land that branch.

else: # Not a write
# TODO: REMOVE THIS for security:
params2 = MagicParameters(params, request, self.ds)
results = await self.ds.execute(
database, sql, params2, truncate=True, **extra_args
)
columns = [r[0] for r in results.description]

@simonw
Copy link
Owner Author

simonw commented Jun 27, 2020

I'm going to rename _request_X to _header_X as that better reflects what it now does.

simonw added a commit that referenced this issue Jun 27, 2020
simonw added a commit that referenced this issue Jun 28, 2020
Also added test coverage of GET on magic params in addition to POST
@simonw simonw added large and removed medium labels Jun 28, 2020
@simonw simonw closed this as completed in 563f5a2 Jun 28, 2020
@simonw
Copy link
Owner Author

simonw commented Jun 28, 2020

I'm going to add some canned queries to the metadata.json used by the live demo that illustrate this feature.

@simonw
Copy link
Owner Author

simonw commented Jun 28, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant