Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deduplicate dependencies for HTMLTextDocument #72

Merged
merged 2 commits into from
Nov 9, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [UNRELEASED]

* Changed the type annotation of `_add_ws` from `bool` to `TagAttrValue`. This makes it easier to write functions which call `Tag` functions and pass along `**kwargs`. (#67)

* Changed the type annotation of `collapse_` from `str` to `str | float | None`. This makes it easier to write calls to `css()` pass along `**kwargs`. (#68)

* Enhanced the type definition of `TagAttrs` to include `TagAttrDict`, the type of a `Tag`'s `attrs` property. (#55)

* For `HTMLTextDocument` objects, deduplicate HTML dependencies. (#72)

## [0.4.1] 2023-10-30

Expand Down
8 changes: 7 additions & 1 deletion htmltools/_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -1157,7 +1157,13 @@ def _static_extract_serialized_html_deps(
# HTMLdependency.get_tag_representation()
pattern = r'<script type="application/json" data-html-dependency="">((?:.|\r|\n)*?)</script>'
dep_strs = re.findall(pattern, html)
# html = re.sub(pattern, "", html)
# Deduplicate dependencies. htmltools normally would dedupe dependencies, but
# with HTMLTextDocuments, the input HTML would usually have been generated by
# something else (like Quarto) and may not have the dependencies deduped.
dep_strs = list(set(dep_strs))

# Remove the serialized HTML dependencies from the HTML string
html = re.sub(pattern, "", html)

deps: list[HTMLDependency] = []
for dep_str in dep_strs:
Expand Down
36 changes: 36 additions & 0 deletions tests/test_html_document.py
Original file line number Diff line number Diff line change
Expand Up @@ -291,14 +291,50 @@ def test_json_roundtrip():
div("hello world", testdep),
# Also make sure it would work even with indents
ht.HTML(testdep2.serialize_to_script_json(indent=2)),
# Add another copy of testdep, explicitly serialized to script json.
# Normally htmltools will dedupe dependencies when .render() is called,
# but we do this here because when these deps are embedded in a Quarto
# document, Quarto can add each dep independent of the others and
# therefore have duplicates. Since we're using .render, to get
# duplicates, we need to force the duplication.
#
div(
"hello again",
ht.HTML(testdep.serialize_to_script_json()),
),
]
)

# Get a string representation which hasn't been passed through
# HTMLTextDocument().
x_str = str(x)

# Make sure that we successfully forced testdep to show up twice in the HTML,
# before we pass it to HTMLTextDocument() and call .render().
assert x_str.count('"name": "testdep"') == 2

# Make sure that there are three of these HTML dependency script tags.
assert (
x_str.count('<script type="application/json" data-html-dependency="">') == 3
)

rendered = ht.HTMLTextDocument(
x_str, deps_replace_pattern='<meta data-foo="">'
).render()

# Make sure both deps are present.
assert "testdep" in [d.name for d in rendered["dependencies"]]
assert "testdep2" in [d.name for d in rendered["dependencies"]]

# Make sure testdep was deduplicated by HTMLTextDocument().render().
assert rendered["dependencies"].count(testdep) == 1
assert len(rendered["dependencies"]) == 2

# Make sure the HTML dependency script tags were stripped out.
assert (
'<script type="application/json" data-html-dependency="">'
not in rendered["html"]
)

finally:
ht.html_dependency_render_mode = old_mode