-
-
Notifications
You must be signed in to change notification settings - Fork 690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Complete refactor of TableView and table.html template #1518
Comments
Here's where I got to with my hacked-together initial plugin prototype - it managed to render the table page with some rows on it (and a bunch of missing functionality such as filters): https://gist.github.com/simonw/281eac9c73b062c3469607ad86470eb2 |
Ideally I'd like to execute the existing test suite against the new implementation - that would require me to solve this so I can replace the view with the plugin version though: |
I was wrong about that, you CAN over-ride default routes already. |
A (likely incomplete) list of features on the table page:
|
My goal is to break up a lot of this functionality into separate methods. These methods can be executed in parallel by So the HTML version itself needs to be re-written to use those JSON extras. |
Right now the HTML version gets to cheat - it passes through objects that are not JSON serializable, including custom functions that can then be called by Jinja. I'm interested in maybe removing this cheating - if the HTML version could only request JSON-serializable extras those could be exposed in the API as well. It would also help cleanup the kind-of-nasty pattern I use in the current |
... and while I'm doing all of this I can rewrite the templates to not use those cheating magical functions AND document the template context at the same time, refs: |
Very confused by this piece of code here: datasette/datasette/views/table.py Lines 37 to 63 in 1c13e1a
I added it in 754836e - in the new world that should probably be replaced by pure JSON. Aha - this comment explains it: #521 (comment)
The goal was to support neater custom templates like this: {% for row in display_rows %}
<h2 class="scientist">{{ row["First_Name"] }} {{ row["Last_Name"] }}</h2>
... This may be an argument for continuing to allow non-JSON-objects through to the HTML templates. Need to think about that a bit more. |
I'm going to try leaning into the |
I can definitely support this using pure-JSON - I could make two versions of the row available, one that's an array of cell objects and the other that's an object mapping column names to column raw values. |
Two new requirements inspired by work on the |
I'm also going to use the new |
Aside: is there any reason this work can't complete the long-running goal of merging the TableView and QueryView, such that most of the features available for tables become available for arbitrary queries too? I had already mentally committed to implementing facets for queries, but I just realized that filters could work too - using either a CTE or a nested query. Pagination is the one holdout here, since table pagination uses keyset pagination over a known order. But maybe arbitrary queries can only be paginated off you order them first? |
(I could experiment with merging the two tables by adding a temporary undocumented |
If I break this up into
Are there any plugin hooks that would make sense to execute in parallel? Actually there might be: I don't think |
I have a hunch that the conclusion of this experiment may end up being that the It's possible |
Rebuilding |
The tests for |
I don't think this code is necessary any more: datasette/datasette/views/table.py Lines 396 to 399 in 492f983
That dates back from when Datasette was built on top of Sanic and Sanic didn't preserve those query parameters the way I needed it to: datasette/datasette/views/table.py Lines 202 to 206 in 1f69269
|
No, removing that gave me the following test failure:
|
Idea: in JSON output include a |
Ran into a problem prototyping that hook up for handling datasette/datasette/views/table.py Lines 457 to 463 in 0663d55
Maybe change to this? class FilterArguments(NamedTuple):
where_clauses: List[str]
params: Dict[str, Union[str, int, float]]
human_descriptions: List[str]
extra_context: Dict[str, Any] That might be necessary for |
I managed to extract both |
Got a TIL out of this: https://til.simonwillison.net/pluggy/multiple-hooks-same-file |
Documentation of the new hook in the PR: https://github.com/simonw/datasette/blob/54e9b3972f277431a001e685f78e5dd6403a6d8d/docs/plugin_hooks.rst#filters_from_requestrequest-database-table-datasette |
- New `filters_from_request` plugin hook, closes #473 - Used it to extract the logic from TableView that handles `_search` and `_through` and `_where` - refs #1518 Also needed for this plugin work: simonw/datasette-leaflet-freedraw#7
These changes so far are now in the 0.60a0 alpha: https://github.com/simonw/datasette/releases/tag/0.60a0 |
I sketched out a chained SQL builder pattern that might be useful for further tidying up this code - though with the new plugin hook I'm less excited about it than I was: class TableQuery:
def __init__(self, table, columns, pks, is_view=False, prev=None):
self.table = table
self.columns = columns
self.pks = pks
self.is_view = is_view
self.prev = prev
# These can be changed for different instances in the chain:
self._where_clauses = None
self._order_by = None
self._page_size = None
self._offset = None
self._select_columns = None
self.select_all_columns = '*'
self.select_specified_columns = '*'
@property
def where_clauses(self):
wheres = []
current = self
while current:
if current._where_clauses is not None:
wheres.extend(current._where_clauses)
current = current.prev
return list(reversed(wheres))
def where(self, where):
new_cls = TableQuery(self.table, self.columns, self.pks, self.is_view, self)
new_cls._where_clauses = [where]
return new_cls
@classmethod
async def introspect(cls, db, table):
return cls(
table,
columns = await db.table_columns(table),
pks = await db.primary_keys(table),
is_view = bool(await db.get_view_definition(table))
)
@property
def sql_from(self):
return f"from {self.table}{self.sql_where}"
@property
def sql_where(self):
if not self.where_clauses:
return ""
else:
return f" where {' and '.join(self.where_clauses)}"
@property
def sql_no_order_no_limit(self):
return f"select {self.select_all_columns} from {self.table}{self.sql_where}"
@property
def sql(self):
return f"select {self.select_specified_columns} from {self.table} {self.sql_where}{self._order_by} limit {self._page_size}{self._offset}"
@property
def sql_count(self):
return f"select count(*) {self.sql_from}"
def __repr__(self):
return f"<TableQuery sql={self.sql}>" Usage: from datasette.app import Datasette
ds = Datasette(memory=True, files=["/Users/simon/Dropbox/Development/datasette/fixtures.db"])
db = ds.get_database("fixtures")
query = await TableQuery.introspect(db, "facetable")
print(query.where("foo = bar").where("baz = 1").sql_count)
# 'select count(*) from facetable where foo = bar and baz = 1' |
I think I might be able to clean up a lot of the stuff in here using the datasette/datasette/views/table.py Lines 87 to 89 in 6b1384b
The catch with that hook - https://docs.datasette.io/en/stable/plugin_hooks.html#render-cell-value-column-table-database-datasette - is that it gets called for every single cell. I don't want the overhead of looking up the foreign key relationships etc once for every value in a specific column. But maybe I could extend the hook to include a shared cache that gets used for all of the cells in a specific table? Something like this: render_cell(value, column, table, database, datasette, cache)
It's a bit of a gross hack though, and would it ever be useful for plugins outside of the default plugin in Datasette which does the foreign key stuff? If I can think of one other potential application for this No, this optimization doesn't make sense: the most complex cell enrichment logic is the stuff that does a |
I think I can move this much higher up in the method, it's a bit confusing having it half way through: datasette/datasette/views/table.py Lines 414 to 436 in 6b1384b
|
Also the whole |
New short-term goal: get facets and suggested facets to execute in parallel with the main query. Generate a trace graph that proves that is happening using |
It looks like the count has to be executed before facets can be, because the facet_class constructor needs that total count figure: datasette/datasette/views/table.py Lines 660 to 671 in 6b1384b
It's used in facet suggestion logic here: Lines 172 to 178 in ace8656
|
I wrote code to execute those in parallel using diff --git a/datasette/views/table.py b/datasette/views/table.py
index 9808fd2..ec9db64 100644
--- a/datasette/views/table.py
+++ b/datasette/views/table.py
@@ -1,3 +1,4 @@
+import asyncio
import urllib
import itertools
import json
@@ -615,44 +616,37 @@ class TableView(RowTableShared):
if request.args.get("_timelimit"):
extra_args["custom_time_limit"] = int(request.args.get("_timelimit"))
- # Execute the main query!
- results = await db.execute(sql, params, truncate=True, **extra_args)
-
- # Calculate the total count for this query
- filtered_table_rows_count = None
- if (
- not db.is_mutable
- and self.ds.inspect_data
- and count_sql == f"select count(*) from {table} "
- ):
- # We can use a previously cached table row count
- try:
- filtered_table_rows_count = self.ds.inspect_data[database]["tables"][
- table
- ]["count"]
- except KeyError:
- pass
-
- # Otherwise run a select count(*) ...
- if count_sql and filtered_table_rows_count is None and not nocount:
- try:
- count_rows = list(await db.execute(count_sql, from_sql_params))
- filtered_table_rows_count = count_rows[0][0]
- except QueryInterrupted:
- pass
-
- # Faceting
- if not self.ds.setting("allow_facet") and any(
- arg.startswith("_facet") for arg in request.args
- ):
- raise BadRequest("_facet= is not allowed")
+ async def execute_count():
+ # Calculate the total count for this query
+ filtered_table_rows_count = None
+ if (
+ not db.is_mutable
+ and self.ds.inspect_data
+ and count_sql == f"select count(*) from {table} "
+ ):
+ # We can use a previously cached table row count
+ try:
+ filtered_table_rows_count = self.ds.inspect_data[database][
+ "tables"
+ ][table]["count"]
+ except KeyError:
+ pass
+
+ if count_sql and filtered_table_rows_count is None and not nocount:
+ try:
+ count_rows = list(await db.execute(count_sql, from_sql_params))
+ filtered_table_rows_count = count_rows[0][0]
+ except QueryInterrupted:
+ pass
+
+ return filtered_table_rows_count
+
+ filtered_table_rows_count = await execute_count()
# pylint: disable=no-member
facet_classes = list(
itertools.chain.from_iterable(pm.hook.register_facet_classes())
)
- facet_results = {}
- facets_timed_out = []
facet_instances = []
for klass in facet_classes:
facet_instances.append(
@@ -668,33 +662,58 @@ class TableView(RowTableShared):
)
)
- if not nofacet:
- for facet in facet_instances:
- (
- instance_facet_results,
- instance_facets_timed_out,
- ) = await facet.facet_results()
- for facet_info in instance_facet_results:
- base_key = facet_info["name"]
- key = base_key
- i = 1
- while key in facet_results:
- i += 1
- key = f"{base_key}_{i}"
- facet_results[key] = facet_info
- facets_timed_out.extend(instance_facets_timed_out)
-
- # Calculate suggested facets
- suggested_facets = []
- if (
- self.ds.setting("suggest_facets")
- and self.ds.setting("allow_facet")
- and not _next
- and not nofacet
- and not nosuggest
- ):
- for facet in facet_instances:
- suggested_facets.extend(await facet.suggest())
+ async def execute_suggested_facets():
+ # Calculate suggested facets
+ suggested_facets = []
+ if (
+ self.ds.setting("suggest_facets")
+ and self.ds.setting("allow_facet")
+ and not _next
+ and not nofacet
+ and not nosuggest
+ ):
+ for facet in facet_instances:
+ suggested_facets.extend(await facet.suggest())
+ return suggested_facets
+
+ async def execute_facets():
+ facet_results = {}
+ facets_timed_out = []
+ if not self.ds.setting("allow_facet") and any(
+ arg.startswith("_facet") for arg in request.args
+ ):
+ raise BadRequest("_facet= is not allowed")
+
+ if not nofacet:
+ for facet in facet_instances:
+ (
+ instance_facet_results,
+ instance_facets_timed_out,
+ ) = await facet.facet_results()
+ for facet_info in instance_facet_results:
+ base_key = facet_info["name"]
+ key = base_key
+ i = 1
+ while key in facet_results:
+ i += 1
+ key = f"{base_key}_{i}"
+ facet_results[key] = facet_info
+ facets_timed_out.extend(instance_facets_timed_out)
+
+ return facet_results, facets_timed_out
+
+ # Execute the main query, facets and facet suggestions in parallel:
+ (
+ results,
+ suggested_facets,
+ (facet_results, facets_timed_out),
+ ) = await asyncio.gather(
+ db.execute(sql, params, truncate=True, **extra_args),
+ execute_suggested_facets(),
+ execute_facets(),
+ )
+
+ results = await db.execute(sql, params, truncate=True, **extra_args)
# Figure out columns and rows for the query
columns = [r[0] for r in results.description] Here's the trace for |
The reason they aren't showing up in the traces is that traces are stored just for the currently executing Lines 13 to 25 in ace8656
This is so traces for other incoming requests don't end up mixed together. But there's no current mechanism to track async tasks that are effectively "child tasks" of the current request, and hence should be tracked the same. https://stackoverflow.com/a/69349501/6083 suggests that you pass the task ID as an argument to the child tasks that are executed using |
Did this in the 1.0 alphas. |
Split from #878. The current
TableView
class is by far the most complex part of Datasette, and the most difficult to work on: https://github.com/simonw/datasette/blob/0.59.2/datasette/views/table.pyIn #878 I started exploring a new pattern for building views. In doing so it became clear that
TableView
is the first beast that I need to slay - if I can refactor that into something neat the pattern for building other views will emerge as a natural consequence.I've been trying to build this as a
register_routes()
plugin, as originally suggested in #870 - though unfortunately it looks like those plugins can't replace existing Datasette default views at the moment, see #1517. [UPDATE: I was wrong about this, plugins can over-ride default views just fine]I also know that I want to have a fully documented template context for
table.html
as a major step on the way to Datasette 1.0, see #1510.All of this adds up to the
TableView
factor being a major project that will unblock a whole flurry of other things - so I'm going to work on that in this separate issue.The text was updated successfully, but these errors were encountered: