-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a new references
builder
#12190
base: master
Are you sure you want to change the base?
Add a new references
builder
#12190
Conversation
(cc also @webknjaz, as I can't add you as a reviewer) |
(test failure is likely because of a side effect) |
yeh hmm works locally (when calling the singular test), but I perhaps I can't "piggy-back" on the existing |
anyway, whilst I fix that, interested to hear your thoughts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit of comments (I'll be less available from now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For new files like that, could we have explicit __all__
(empty by default if possible, since we don't really know what should be public or not).
|
||
class LocalReference(TypedDict, total=False): | ||
type: Literal['local'] | ||
document: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, we use document for something else I think. Would it be possible to use docname
instead? (I didn't see yet but is it a full path or not?). If so, I'd suggest path
instead of filepath
. Because document
is generally.. the document node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, its the full path as thats more helpful for users, so yeh could change to filepath
from __future__ import annotations | ||
|
||
import json | ||
from os import path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use os.path
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Actually, it's essentially to reduce the possibility of having a variable shadowing the import)
data[domainname][otype_name] = {'items': {}} | ||
if otype := domain.object_types.get(otype_name): | ||
data[domainname][otype_name]['roles'] = list(otype.roles) | ||
data.setdefault(domainname, {}).setdefault(otype_name, {})['items'].setdefault( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data.setdefault(domainname, {}).setdefault(otype_name, {})['items'].setdefault( | |
data[domainname].setdefault(otype_name, {})['items'].setdefault( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or better: use intermediate variables here (because it's a bit hard to parse).
'url': url, | ||
} | ||
# only add dispname if it is set and not the same as name | ||
if not (dispname == name or not dispname or dispname == '-'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
otype := local_domain.object_types.get(otype_name) | ||
): | ||
data[domainname][otype_name]['roles'] = list(otype.roles) | ||
data.setdefault(domainname, {}).setdefault(otype_name, {})[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idem
# test the content of the reference file | ||
content = (app.outdir / 'references.json').read_text('utf-8') | ||
data = json.loads(content) | ||
assert data == { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this one might need a factory for the sake of readability.
# test the content of the reference file | ||
content = (app.outdir / 'references.json').read_text('utf-8') | ||
data = json.loads(content) | ||
assert data == { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idem
If you are worried about that, use |
Thanks for the review @picnixz, but perhaps I could nudge you for some quick general feedback on the concept 😅 |
Read through the new But, here are some other thoughts. Reaction to the 'generate a complete local & remote references list' idea --- +0.25.It might be helpful having all targets, local or intersphinx, in one artifact? But after thinking about it, I don't think it's very important to me, personally---and, it seems to me the bigger problem is the object-type lossiness of the current v2 I think I would rather have better/more accurate information about the targets in my intersphinx-referenced docsets---which would require a new inventory format, as best I figure---than a list of all local and remote references, where the info I get on the remote references in that all-in-one artifact requires as much work to transform into a working cross-reference as the info I can get out of If I'm trying to reference something in another project, I know which project it is, and I don't mind pointing a single-docset tool at that project's docs. (And, there's a good chance I might prefer that single-docset tool if I don't have to mess with an intermediate data file as part of the process.) The 'all in one place' aspect of this may have a broad appeal, but it's less important to me, personally. Reaction to the layout of
|
I'll comment tomorrow (for this one, I need a bit of sleep) |
A core problem is the use of Currently |
Since we are talking about a new Intersphinx format, I would like you to also think about how to serialize the entries in the inventories, especially concerning #11932. After reading Jakob's argument, I also think that domains should be responsible for serializing their intersphinx part however they see fit. It could also solve multiple issues that I could not necessarily find when implementing #11932 but if each domain knows how to properly represent their references in intersphinx, it would be better. In addition, we could change the format of a specific domain (e.g., if there are bugs) without affecting the format of other domains. I suggest using the same approach as for ELF where there is a header section containing the location of each program section. Then each domain would serialize its own intersphinx inventory and intersphinx would only be responsible for merging the parts together. Then, each domain would deserialize its dedicated section and recover its references mapping. The references builder you are suggesting would be responsible to normalize each domain output in a more human-readable format. In the JSON output, you would include "human-readable" entries + an offset and buffer size to the serialized data in the objects.inv binary file. That way, you can use it to recover a single referencable entity, and using in a standalone manner as well. |
See https://github.com/orgs/sphinx-doc/discussions/12204 |
This PR add a new builder
references
, to build a singlereferences.json
, which provides a mapping for almost* all targets available to reference in the project, including:objects.inv
configured viaintersphinx_mapping
(when using thesphinx.ext.intersphinx
extension)* I say almost, because this assumes the objects returned from
domain.get_objects
account for the majority of referencable items in a project, but there are currently some notable exceptions, like themath
domain not returning any (but that is for another PR to fix)This partialy addresses #12152, to allow for a clear way for users to understand:
Crucially, the
references.json
includes the mapping ofobject type
torole names
(this can be one-to-many),since a role name is required for the reference syntax, not the object type.
I would also invisage other tools (like VS Code extension) could utilise this, to provide things like auto-completions, and "jump to target/references"
Some considerations:
I feel a builder is really the only way to do this comprehensively; having a standalone CLI (like the current
python -m sphinx.ext.intersphinx
) can only get you so far, and then you will have to start re-implementing features of a normal sphinx build (like reading configuration, etc)Perhaps in a follow up PR I could introduce a complimentary CLI, that reads the
references.json
and allows users to quickly generate references. Something likesphinx-ref find 're.Match'
returning:class:`~re.Match`
(i.e https://github.com/orgs/sphinx-doc/discussions/12152#discussioncomment-8862652)There are cases where an
object type
has no matchingrole names
, this PR is not addressing that (although I want to eventually)As I mention in Make intersphinx (a.k.a. external references) more user friendly #12152, it would be ideal for this to include, not just the document path where a local target is defined, but also the line number (if available). But this is not within the scope of this PR
Creating a singular
references.json
is probably the simplest way to do this. But, it could get rather large, for a large project, or one with lots of intersphinx mappings.Is this ok, or do we think another format would be better, like one JSON file per domain / object type, or even something like an sqlite database file?
The other non included in this PR, is any additions to the documention, I could do this here or in a follow-up PR