-
-
Notifications
You must be signed in to change notification settings - Fork 690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New pattern for views that return either JSON or HTML, available for plugins #878
Comments
The I'd like to turn this into a class that is documented and available to plugins as well. |
I think this is blocking #619 |
This has been blocking things for too long. If this becomes a documented pattern, things like adding a JSON output to https://github.com/dogsheep/dogsheep-beta becomes easier too. |
I came up with a slightly wild idea for this that would involve pytest-style dependency injection. Prototype here: https://gist.github.com/simonw/496b24fdad44f6f8b7237fe394a0ced7 Copying from my private notes:
|
Another idea I had: a view is a class that takes the |
Things this mechanism needs to be able to support:
|
This sounds really ambitious but also really awesome. I like the idea that basically any piece of a page could be selectively replaced. It sort of sounds like a python asyncio version of https://github.com/observablehq/runtime |
Here's the latest version of my weird dependency injection async class: import inspect
class AsyncMeta(type):
def __new__(cls, name, bases, attrs):
# Decorate any items that are 'async def' methods
_registry = {}
new_attrs = {"_registry": _registry}
for key, value in attrs.items():
if inspect.iscoroutinefunction(value) and not value.__name__ == "resolve":
new_attrs[key] = make_method(value)
_registry[key] = new_attrs[key]
else:
new_attrs[key] = value
# Topological sort of _registry by parameter dependencies
graph = {
key: {
p for p in inspect.signature(method).parameters.keys()
if p != "self" and not p.startswith("_")
}
for key, method in _registry.items()
}
new_attrs["_graph"] = graph
return super().__new__(cls, name, bases, new_attrs)
def make_method(method):
@wraps(method)
async def inner(self, **kwargs):
parameters = inspect.signature(method).parameters.keys()
# Any parameters not provided by kwargs are resolved from registry
to_resolve = [p for p in parameters if p not in kwargs and p != "self"]
missing = [p for p in to_resolve if p not in self._registry]
assert (
not missing
), "The following DI parameters could not be found in the registry: {}".format(
missing
)
results = {}
results.update(kwargs)
results.update(await self.resolve(to_resolve))
return await method(self, **results)
return inner
bad = [0]
class AsyncBase(metaclass=AsyncMeta):
async def resolve(self, names):
print(" resolve({})".format(names))
results = {}
# Resolve them in the correct order
ts = TopologicalSorter()
ts2 = TopologicalSorter()
print(" names = ", names)
print(" self._graph = ", self._graph)
for name in names:
if self._graph[name]:
ts.add(name, *self._graph[name])
ts2.add(name, *self._graph[name])
print(" static_order =", tuple(ts2.static_order()))
ts.prepare()
while ts.is_active():
print(" is_active, i = ", bad[0])
bad[0] += 1
if bad[0] > 20:
print(" Infinite loop?")
break
nodes = ts.get_ready()
print(" Do nodes:", nodes)
awaitables = [self._registry[name](self, **{
k: v for k, v in results.items() if k in self._graph[name]
}) for name in nodes]
print(" awaitables: ", awaitables)
awaitable_results = await asyncio.gather(*awaitables)
results.update({
p[0].__name__: p[1] for p in zip(awaitables, awaitable_results)
})
print(results)
for node in nodes:
ts.done(node)
return results Example usage: class Foo(AsyncBase):
async def graa(self, boff):
print("graa")
return 5
async def boff(self):
print("boff")
return 8
async def other(self, boff, graa):
print("other")
return 5 + boff + graa
foo = Foo()
await foo.other() Output:
|
This code is really fiddly. I just got to this version: import asyncio
from functools import wraps
import inspect
try:
import graphlib
except ImportError:
from . import vendored_graphlib as graphlib
class AsyncMeta(type):
def __new__(cls, name, bases, attrs):
# Decorate any items that are 'async def' methods
_registry = {}
new_attrs = {"_registry": _registry}
for key, value in attrs.items():
if inspect.iscoroutinefunction(value) and not value.__name__ == "resolve":
new_attrs[key] = make_method(value)
_registry[key] = new_attrs[key]
else:
new_attrs[key] = value
# Gather graph for later dependency resolution
graph = {
key: {
p
for p in inspect.signature(method).parameters.keys()
if p != "self" and not p.startswith("_")
}
for key, method in _registry.items()
}
new_attrs["_graph"] = graph
return super().__new__(cls, name, bases, new_attrs)
def make_method(method):
@wraps(method)
async def inner(self, _results=None, **kwargs):
print("inner - _results=", _results)
parameters = inspect.signature(method).parameters.keys()
# Any parameters not provided by kwargs are resolved from registry
to_resolve = [p for p in parameters if p not in kwargs and p != "self"]
missing = [p for p in to_resolve if p not in self._registry]
assert (
not missing
), "The following DI parameters could not be found in the registry: {}".format(
missing
)
results = {}
results.update(kwargs)
if to_resolve:
resolved_parameters = await self.resolve(to_resolve, _results)
results.update(resolved_parameters)
return_value = await method(self, **results)
if _results is not None:
_results[method.__name__] = return_value
return return_value
return inner
class AsyncBase(metaclass=AsyncMeta):
async def resolve(self, names, results=None):
print("\n resolve: ", names)
if results is None:
results = {}
# Resolve them in the correct order
ts = graphlib.TopologicalSorter()
for name in names:
ts.add(name, *self._graph[name])
ts.prepare()
async def resolve_nodes(nodes):
print(" resolve_nodes", nodes)
print(" (current results = {})".format(repr(results)))
awaitables = [
self._registry[name](
self,
_results=results,
**{k: v for k, v in results.items() if k in self._graph[name]},
)
for name in nodes
if name not in results
]
print(" awaitables: ", awaitables)
awaitable_results = await asyncio.gather(*awaitables)
results.update(
{p[0].__name__: p[1] for p in zip(awaitables, awaitable_results)}
)
if not ts.is_active():
# Nothing has dependencies - just resolve directly
print(" no dependencies, resolve directly")
await resolve_nodes(names)
else:
# Resolve in topological order
while ts.is_active():
nodes = ts.get_ready()
print(" ts.get_ready() returned nodes:", nodes)
await resolve_nodes(nodes)
for node in nodes:
ts.done(node)
print(" End of resolve(), returning", results)
return {key: value for key, value in results.items() if key in names} With this test: class Complex(AsyncBase):
def __init__(self):
self.log = []
async def c(self):
print("LOG: c")
self.log.append("c")
async def b(self, c):
print("LOG: b")
self.log.append("b")
async def a(self, b, c):
print("LOG: a")
self.log.append("a")
async def go(self, a):
print("LOG: go")
self.log.append("go")
return self.log
@pytest.mark.asyncio
async def test_complex():
result = await Complex().go()
# 'c' should only be called once
assert result == ["c", "b", "a", "go"] This test sometimes passes, and sometimes fails! Output for a pass:
Output for a fail:
I figured out why this is happening.
The code decides to run If If |
What should be happening here instead is it should resolve the full graph and notice that So maybe the algorithm I'm inheriting from https://docs.python.org/3/library/graphlib.html isn't the correct algorithm? |
My goal here is to calculate the most efficient way to resolve the different nodes, running them in parallel where possible. So for this class: class Complex(AsyncBase):
async def d(self):
pass
async def c(self):
pass
async def b(self, c, d):
pass
async def a(self, b, c):
pass
async def go(self, a):
pass A call to
|
But that does seem to be the plan that graph = {"go": {"a"}, "a": {"b", "c"}, "b": {"c", "d"}}
ts = TopologicalSorter(graph)
ts.prepare()
while ts.is_active():
nodes = ts.get_ready()
print(nodes)
ts.done(*nodes) Outputs:
Also: graph = {"go": {"d", "e", "f"}, "d": {"b", "c"}, "b": {"c"}}
ts = TopologicalSorter(graph)
ts.prepare()
while ts.is_active():
nodes = ts.get_ready()
print(nodes)
ts.done(*nodes) Gives:
I'm confident that |
New test: class Complex(AsyncBase):
def __init__(self):
self.log = []
async def d(self):
await asyncio.sleep(random() * 0.1)
print("LOG: d")
self.log.append("d")
async def c(self):
await asyncio.sleep(random() * 0.1)
print("LOG: c")
self.log.append("c")
async def b(self, c, d):
print("LOG: b")
self.log.append("b")
async def a(self, b, c):
print("LOG: a")
self.log.append("a")
async def go(self, a):
print("LOG: go")
self.log.append("go")
return self.log
@pytest.mark.asyncio
async def test_complex():
result = await Complex().go()
# 'c' should only be called once
assert tuple(result) in (
# c and d could happen in either order
("c", "d", "b", "a", "go"),
("d", "c", "b", "a", "go"),
) And this code passes it: import asyncio
from functools import wraps
import inspect
try:
import graphlib
except ImportError:
from . import vendored_graphlib as graphlib
class AsyncMeta(type):
def __new__(cls, name, bases, attrs):
# Decorate any items that are 'async def' methods
_registry = {}
new_attrs = {"_registry": _registry}
for key, value in attrs.items():
if inspect.iscoroutinefunction(value) and not value.__name__ == "resolve":
new_attrs[key] = make_method(value)
_registry[key] = new_attrs[key]
else:
new_attrs[key] = value
# Gather graph for later dependency resolution
graph = {
key: {
p
for p in inspect.signature(method).parameters.keys()
if p != "self" and not p.startswith("_")
}
for key, method in _registry.items()
}
new_attrs["_graph"] = graph
return super().__new__(cls, name, bases, new_attrs)
def make_method(method):
parameters = inspect.signature(method).parameters.keys()
@wraps(method)
async def inner(self, _results=None, **kwargs):
print("\n{}.{}({}) _results={}".format(self, method.__name__, kwargs, _results))
# Any parameters not provided by kwargs are resolved from registry
to_resolve = [p for p in parameters if p not in kwargs and p != "self"]
missing = [p for p in to_resolve if p not in self._registry]
assert (
not missing
), "The following DI parameters could not be found in the registry: {}".format(
missing
)
results = {}
results.update(kwargs)
if to_resolve:
resolved_parameters = await self.resolve(to_resolve, _results)
results.update(resolved_parameters)
return_value = await method(self, **results)
if _results is not None:
_results[method.__name__] = return_value
return return_value
return inner
class AsyncBase(metaclass=AsyncMeta):
async def resolve(self, names, results=None):
print("\n resolve: ", names)
if results is None:
results = {}
# Come up with an execution plan, just for these nodes
ts = graphlib.TopologicalSorter()
to_do = set(names)
done = set()
while to_do:
item = to_do.pop()
dependencies = self._graph[item]
ts.add(item, *dependencies)
done.add(item)
# Add any not-done dependencies to the queue
to_do.update({k for k in dependencies if k not in done})
ts.prepare()
plan = []
while ts.is_active():
node_group = ts.get_ready()
plan.append(node_group)
ts.done(*node_group)
print("plan:", plan)
results = {}
for node_group in plan:
awaitables = [
self._registry[name](
self,
_results=results,
**{k: v for k, v in results.items() if k in self._graph[name]},
)
for name in node_group
]
print(" results = ", results)
print(" awaitables: ", awaitables)
awaitable_results = await asyncio.gather(*awaitables)
results.update(
{p[0].__name__: p[1] for p in zip(awaitables, awaitable_results)}
)
print(" End of resolve(), returning", results)
return {key: value for key, value in results.items() if key in names} |
Wrote a TIL about what I learned using |
I'm going to continue working on this in a PR. |
I shipped that code as a new library, |
I'm going to build a brand new implementation of the I can maybe even run the tests against old |
New plan: I'm going to build a brand new implementation of It will reuse the existing HTML template but will be a completely new Python implementation, based on I'm going to start by just getting the table to show up on the page - then I'll add faceting, suggested facets, filters and so-on. Bonus: I'm going to see if I can get it to work for arbitrary SQL queries too (stretch goal). |
|
That's annoying: it looks like plugins can't use from datasette.utils.asgi import Response
from datasette import hookimpl
import html
async def table(request):
return Response.html("Hello from {}".format(
html.escape(repr(request.url_vars))
))
@hookimpl
def register_routes():
return [
(r"/(?P<db_name>[^/]+)/(?P<table_and_format>[^/]+?$)", table),
] I'll use a |
Problem: the fancy class Table(asyncinject.AsyncInjectAll):
async def view(self, request):
return Response.html("Hello from {}".format(
html.escape(repr(request.url_vars))
))
@hookimpl
def register_routes():
return [
(r"/t/(?P<db_name>[^/]+)/(?P<table_and_format>[^/]+?$)", Table().view),
] This failed with error: "Table.view() takes 1 positional argument but 2 were given" So I'm going to use |
This is working! from datasette.utils.asgi import Response
from datasette import hookimpl
import html
from asyncinject import AsyncInject, inject
class Table(AsyncInject):
@inject
async def database(self, request):
return request.url_vars["db_name"]
@inject
async def main(self, request, database):
return Response.html("Database: {}".format(
html.escape(database)
))
async def view(self, request):
return await self.main(request=request)
@hookimpl
def register_routes():
return [
(r"/t/(?P<db_name>[^/]+)/(?P<table_and_format>[^/]+?$)", Table().view),
] This project will definitely show me if I actually like the |
This exercise is proving so useful in getting my head around how the enormous and complex Here's where I've got to now - I'm systematically working through the variables that are returned for HTML and for JSON copying across code to get it to work: from datasette.database import QueryInterrupted
from datasette.utils import escape_sqlite
from datasette.utils.asgi import Response, NotFound, Forbidden
from datasette.views.base import DatasetteError
from datasette import hookimpl
from asyncinject import AsyncInject, inject
from pprint import pformat
class Table(AsyncInject):
@inject
async def database(self, request, datasette):
# TODO: all that nasty hash resolving stuff can go here
db_name = request.url_vars["db_name"]
try:
db = datasette.databases[db_name]
except KeyError:
raise NotFound(f"Database '{db_name}' does not exist")
return db
@inject
async def table_and_format(self, request, database, datasette):
table_and_format = request.url_vars["table_and_format"]
# TODO: be a lot smarter here
if "." in table_and_format:
return table_and_format.split(".", 2)
else:
return table_and_format, "html"
@inject
async def main(self, request, database, table_and_format, datasette):
# TODO: if this is actually a canned query, dispatch to it
table, format = table_and_format
is_view = bool(await database.get_view_definition(table))
table_exists = bool(await database.table_exists(table))
if not is_view and not table_exists:
raise NotFound(f"Table not found: {table}")
await check_permissions(
datasette,
request,
[
("view-table", (database.name, table)),
("view-database", database.name),
"view-instance",
],
)
private = not await datasette.permission_allowed(
None, "view-table", (database.name, table), default=True
)
pks = await database.primary_keys(table)
table_columns = await database.table_columns(table)
specified_columns = await columns_to_select(datasette, database, table, request)
select_specified_columns = ", ".join(
escape_sqlite(t) for t in specified_columns
)
select_all_columns = ", ".join(escape_sqlite(t) for t in table_columns)
use_rowid = not pks and not is_view
if use_rowid:
select_specified_columns = f"rowid, {select_specified_columns}"
select_all_columns = f"rowid, {select_all_columns}"
order_by = "rowid"
order_by_pks = "rowid"
else:
order_by_pks = ", ".join([escape_sqlite(pk) for pk in pks])
order_by = order_by_pks
if is_view:
order_by = ""
nocount = request.args.get("_nocount")
nofacet = request.args.get("_nofacet")
if request.args.get("_shape") in ("array", "object"):
nocount = True
nofacet = True
# Next, a TON of SQL to build where_params and filters and suchlike
# skipping that and jumping straight to...
where_clauses = []
where_clause = ""
if where_clauses:
where_clause = f"where {' and '.join(where_clauses)} "
from_sql = "from {table_name} {where}".format(
table_name=escape_sqlite(table),
where=("where {} ".format(" and ".join(where_clauses)))
if where_clauses
else "",
)
from_sql_params ={}
params = {}
count_sql = f"select count(*) {from_sql}"
sql_no_order_no_limit = (
"select {select_all_columns} from {table_name} {where}".format(
select_all_columns=select_all_columns,
table_name=escape_sqlite(table),
where=where_clause,
)
)
page_size = 100
offset = " offset 0"
sql = "select {select_specified_columns} from {table_name} {where}{order_by} limit {page_size}{offset}".format(
select_specified_columns=select_specified_columns,
table_name=escape_sqlite(table),
where=where_clause,
order_by=order_by,
page_size=page_size + 1,
offset=offset,
)
# Fetch rows
results = await database.execute(sql, params, truncate=True)
columns = [r[0] for r in results.description]
rows = list(results.rows)
# Fetch count
filtered_table_rows_count = None
if count_sql:
try:
count_rows = list(await database.execute(count_sql, from_sql_params))
filtered_table_rows_count = count_rows[0][0]
except QueryInterrupted:
pass
vars = {
"json": {
# THIS STUFF is from the regular JSON
"database": database.name,
"table": table,
"is_view": is_view,
# "human_description_en": human_description_en,
"rows": rows[:page_size],
"truncated": results.truncated,
"filtered_table_rows_count": filtered_table_rows_count,
# "expanded_columns": expanded_columns,
# "expandable_columns": expandable_columns,
"columns": columns,
"primary_keys": pks,
# "units": units,
"query": {"sql": sql, "params": params},
# "facet_results": facet_results,
# "suggested_facets": suggested_facets,
# "next": next_value and str(next_value) or None,
# "next_url": next_url,
"private": private,
"allow_execute_sql": await datasette.permission_allowed(
request.actor, "execute-sql", database, default=True
),
},
"html": {
# ... this is the HTML special stuff
# "table_actions": table_actions,
# "supports_search": bool(fts_table),
# "search": search or "",
"use_rowid": use_rowid,
# "filters": filters,
# "display_columns": display_columns,
# "filter_columns": filter_columns,
# "display_rows": display_rows,
# "facets_timed_out": facets_timed_out,
# "sorted_facet_results": sorted(
# facet_results.values(),
# key=lambda f: (len(f["results"]), f["name"]),
# reverse=True,
# ),
# "show_facet_counts": special_args.get("_facet_size") == "max",
# "extra_wheres_for_ui": extra_wheres_for_ui,
# "form_hidden_args": form_hidden_args,
# "is_sortable": any(c["sortable"] for c in display_columns),
# "path_with_replaced_args": path_with_replaced_args,
# "path_with_removed_args": path_with_removed_args,
# "append_querystring": append_querystring,
"request": request,
# "sort": sort,
# "sort_desc": sort_desc,
"disable_sort": is_view,
# "custom_table_templates": [
# f"_table-{to_css_class(database)}-{to_css_class(table)}.html",
# f"_table-table-{to_css_class(database)}-{to_css_class(table)}.html",
# "_table.html",
# ],
# "metadata": metadata,
# "view_definition": await db.get_view_definition(table),
# "table_definition": await db.get_table_definition(table),
},
}
# I'm just trying to get HTML to work for the moment
if format == "json":
return Response.json(dict(vars, locals=locals()), default=repr)
else:
return Response.html(repr(vars["html"]))
async def view(self, request, datasette):
return await self.main(request=request, datasette=datasette)
@hookimpl
def register_routes():
return [
(r"/t/(?P<db_name>[^/]+)/(?P<table_and_format>[^/]+?$)", Table().view),
]
async def check_permissions(datasette, request, permissions):
"""permissions is a list of (action, resource) tuples or 'action' strings"""
for permission in permissions:
if isinstance(permission, str):
action = permission
resource = None
elif isinstance(permission, (tuple, list)) and len(permission) == 2:
action, resource = permission
else:
assert (
False
), "permission should be string or tuple of two items: {}".format(
repr(permission)
)
ok = await datasette.permission_allowed(
request.actor,
action,
resource=resource,
default=None,
)
if ok is not None:
if ok:
return
else:
raise Forbidden(action)
async def columns_to_select(datasette, database, table, request):
table_columns = await database.table_columns(table)
pks = await database.primary_keys(table)
columns = list(table_columns)
if "_col" in request.args:
columns = list(pks)
_cols = request.args.getlist("_col")
bad_columns = [column for column in _cols if column not in table_columns]
if bad_columns:
raise DatasetteError(
"_col={} - invalid columns".format(", ".join(bad_columns)),
status=400,
)
# De-duplicate maintaining order:
columns.extend(dict.fromkeys(_cols))
if "_nocol" in request.args:
# Return all columns EXCEPT these
bad_columns = [
column
for column in request.args.getlist("_nocol")
if (column not in table_columns) or (column in pks)
]
if bad_columns:
raise DatasetteError(
"_nocol={} - invalid columns".format(", ".join(bad_columns)),
status=400,
)
tmp_columns = [
column for column in columns if column not in request.args.getlist("_nocol")
]
columns = tmp_columns
return columns |
OK, I managed to get a table to render! Here's the code I used - I had to copy a LOT of stuff. https://gist.github.com/simonw/281eac9c73b062c3469607ad86470eb2 I'm going to move this work into a new, separate issue. |
I'm going to see if I can come up with the simplest possible version of this pattern for the |
On revisiting https://gist.github.com/simonw/281eac9c73b062c3469607ad86470eb2 a few months later I'm having second thoughts about using But I still like the pattern as a way to resolve more complex cases like "to generate GeoJSON of the expanded view with labels, the label expansion code needs to run once at some before the GeoJSON formatting code does". So I'm going to stick with it a tiny bit longer, but maybe try to make it a lot more explicit when it's going to happen rather than having the main view methods themselves also use async DI. |
Can be part of #870 - refactoring existing views to use
register_routes()
.The text was updated successfully, but these errors were encountered: