Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: Keep data-attributes of spans #41

Closed
pawamoy opened this issue Feb 23, 2024 · 4 comments · Fixed by #42
Closed

feature: Keep data-attributes of spans #41

pawamoy opened this issue Feb 23, 2024 · 4 comments · Fixed by #42
Assignees

Comments

@pawamoy
Copy link
Member

pawamoy commented Feb 23, 2024

Is your feature request related to a problem? Please describe.

I'd like to take advantage of mkdocs-material's instant previews, by adding data-preview to autorefs spans generated by mkdocstrings-python. Unfortunately, autorefs matches spans with a regex, and this regex is strict and only matches spans with exactly one data-autorefs-* attribute.

Describe the solution you'd like

I'd like autorefs to allow other data- attributes to appear in its spans, and report them to anchors when it transforms them.

Describe alternatives you've considered

/

Additional context

squidfunk/mkdocs-material#6704

@pawamoy pawamoy self-assigned this Feb 23, 2024
@pawamoy
Copy link
Member Author

pawamoy commented Feb 23, 2024

Quick solution:

AUTO_REF_RE = re.compile(
    r"<span data-(?P<kind>autorefs-identifier|autorefs-optional|autorefs-optional-hover)="
    r'("?)(?P<identifier>[^"<>]*)\2(?P<attrs> [^>]*)?>(?P<title>.*?)</span>',
    flags=re.DOTALL,
)
>>> from mkdocs_autorefs.references import AUTO_REF_RE
>>> AUTO_REF_RE.search('<span data-autorefs-identifier="hey" data-preview data-preview="0" data-preview>hello</span>').groupdict()
{'kind': 'autorefs-identifier', 'identifier': 'hey', 'attrs': ' data-preview data-preview="0" data-preview', 'title': 'hello'}

The data-autorefs attribute must still appear first, and the regex now also captures everything after this first attribute and before the closing >. This captured group can then be reinjected as-is in the anchors.

@pawamoy
Copy link
Member Author

pawamoy commented Feb 23, 2024

A bigger change that could maybe bring more robustness to future changes and features, would be to use a custom tag to delimitate auto-references, something like <autoref ...>...</autoref>. With this, it becomes easy to match autorefs as just plain strings, and maybe parse their attributes with a custom HTML parser::

AUTO_REF_RE = re.compile(r"<autoref (?P<attrs>.*?)>(?P<title>.*?)</autoref>")

from html.parser import HTMLParser
class AttrsParser(HTMLParser):
    def __init__(self):
        super().__init__(self)
        self.attrs = []

    def parse(self, html):
        self.attrs.clear()
        self.feed(html)
        return self.attrs

    def handle_starttag(self, tag, attrs):
        self.attrs.extend(attrs)


# for each match, build f"<a {match.group("attrs")}></a>" and pass it to the parser
AttrsParser().parse('<a data-preview data-identifier="pathlib.Path" data-other="0">some title</a>')
# [('data-preview', None), ('data-identifier', 'pathlib.Path'), ('data-other', '0')]

I don't expect much impact on perfs since we'd only parse the auto-references attributes and nothing else.

This change would also let us keep complex HTML inside the autoref tag (see #40).

@oprypin
Copy link
Member

oprypin commented Feb 23, 2024

The regex approach sounds good to me.
I only had the urge to tweak some (even pre-existing) parts of that regex 😅

AUTO_REF_RE = re.compile(
    r"<span data-(?P<kind>autorefs-(?:identifier|optional|optional-hover))="
    r'("?)(?P<identifier>[^"<>]+)\2(?P<attrs> [^<>]+)?>(?P<title>.*?)</span>',
    flags=re.DOTALL,
)

To get you the rest of the way there, you can still use this AttrsParser in a slightly creative way

@pawamoy
Copy link
Member Author

pawamoy commented Feb 23, 2024

To get you the rest of the way there, you can still use this AttrsParser in a slightly creative way

This wouldn't be needed as we already got the kind and identifier from the regex. The rest (attrs) can just be copy-pasted into the anchor 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants