Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add modeling for intersphinx data #5289

Merged
merged 34 commits into from
Feb 27, 2019
Merged

Add modeling for intersphinx data #5289

merged 34 commits into from
Feb 27, 2019

Conversation

ericholscher
Copy link
Member

@ericholscher ericholscher commented Feb 13, 2019

This adds modeling for Intersphinx data. It will be automatically updated on build, so that we have it indexed. The goal here will be to add this to search results, but I broke up the PR's to make them easier to review and merge.

Closes #4767

@ericholscher ericholscher requested a review from a team February 13, 2019 19:05
@safwanrahman
Copy link
Member

I was thinking about relating the domain data with HTMLFile object. @ericholscher what do you think?

readthedocs/domaindata/models.py Outdated Show resolved Hide resolved
)
version = models.ForeignKey(Version, verbose_name=_('Version'),
related_name='domain_data')
modified_date = models.DateTimeField(_('Publication date'), auto_now=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could start using this standard #4864

version = models.ForeignKey(Version, verbose_name=_('Version'),
related_name='domain_data')
modified_date = models.DateTimeField(_('Publication date'), auto_now=True)
commit = models.CharField(_('Commit'), max_length=255)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this? I think we already have this from the version 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a file can be removed from a specific commit, and we need to track this to delete it.

'''

@property
def doc_type(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is the correct name, I wasn't able to find something that could be used instead. I mean, this is just the full object not, the type of document.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more like a "role name" or "directive name" in the Sphinx language as I can read from their docs. Considering that this is used as .. py:function:: or py:function:'spam'.

Maybe "role_name" is a better name.

readthedocs/projects/tasks.py Outdated Show resolved Hide resolved
for name, einfo in sorted(invdata[key].items()):
url = einfo[2]
if '#' in url:
doc_name, anchor = url.split('#')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use urlparse https://docs.python.org/3/library/urllib.parse.html, but the logic here is very simple, so not really sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, feels unnecessary. It's also not a full URL, so it might be confused by it.

@ericholscher
Copy link
Member Author

ericholscher commented Feb 20, 2019

I was thinking about relating the domain data with HTMLFile object. @ericholscher what do you think?

@safwanrahman Can you say a bit more about this? They aren't representing a file that we have on disk, so I don't think it's a good match. It could be useful to optionally have them point to an HTMLFile on the model, but I'm not sure what the value would be?

@ericholscher ericholscher requested a review from a team February 20, 2019 19:40
Copy link
Member

@humitos humitos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and I think this has potential. Although, I'm not 100% sure how or where you are thinking to use this.

I left some Python style comments that I think can be considered.

As I understand this PR, we are converting the objects.inv into data on our db. If I'm correct, won't this be a huge amount of data going to our db that could potentially cause an issue?

from .models import DomainData


class DomainDataAdmin(admin.ModelAdmin):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DomainData name conflicts with our Domain model to me.

Considering that we are talking about Sphinx Domains here, I'd name all of these ones (including the Django App) as SphinxDomain. I think it will be better to get the meaning immediately when re reading about this.


project = models.ForeignKey(
Project,
related_name='domain_data',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, we will have project.domains and project.domain_data. I think it will be clearer as project.domains (current relationship) and project.sphinx_domains (or project.sphinxdomains), IMO.

def __str__(self):
return f'''
DomainData [{self.project.slug}:{self.version.slug}]
[{self.domain}:{self.type}] {self.name} -> {self.doc_name}#{self.anchor}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be wrapped.

'''

@property
def doc_type(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more like a "role name" or "directive name" in the Sphinx language as I can read from their docs. Considering that this is used as .. py:function:: or py:function:'spam'.

Maybe "role_name" is a better name.

return f'{self.domain}:{self.type}'

@property
def doc_url(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd name it as docs_url to follow the same pattern that we have in other places.

invdata = intersphinx.fetch_inventory(MockApp(), '', object_file)
for key in sorted(invdata or {}):
domain, _type = key.split(':')
for name, einfo in sorted(invdata[key].items()):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it clearer to use .items() in the first loop here?

for key, value in invdata.items():
    domain, _type = key.split(':')
    for name, einfo in value:

log.warning('Sphinx MockApp: %s', msg)

invdata = intersphinx.fetch_inventory(MockApp(), '', object_file)
for key in sorted(invdata or {}):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these two loops need to be sorted?

for key in sorted(invdata or {}):
domain, _type = key.split(':')
for name, einfo in sorted(invdata[key].items()):
url = einfo[2]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a small comment with previous to this line, with the expected structure of this einfo object. Like

# (key, name, url, doc type, etc)
# ('py', 'function', '/some-nice-url/python-function/', 'func', ...)

or... a link to the docs.

I found debugging this complicated later.

else:
doc_name, anchor = url, ''
display_name = einfo[3]
obj, _ = DomainData.objects.get_or_create(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will fail the first time because commit is not passed here and it's not marked as null=True.

'name',
'display_name',
'doc_type',
'doc_url',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this is the most important field we want for the use case you mentioned, as we will want to embed the section when hovering with the mouse or something like that, right?

@safwanrahman
Copy link
Member

I was thinking about relating the domain data with HTMLFile object. @ericholscher what do you think?

@safwanrahman Can you say a bit more about this? They aren't representing a file that we have on disk, so I don't think it's a good match. It could be useful to optionally have them point to an HTMLFile on the model, but I'm not sure what the value would be?

@ericholscher If each of the domain data could pointed to a HTMLFile object, we can index the HTMLFile and DomainData object together in Elasticsearch. So while searching, we can show results based on the page and the section.
In current situation, as there are no relation among them, we need to index them separately and its not possible to show a user about the subsection in search results.

@@ -364,7 +364,7 @@ def setUp(self):
'api_webhook_gitlab': {'status_code': 405},
'api_webhook_bitbucket': {'status_code': 405},
'api_webhook_generic': {'status_code': 403},
'domaindata-detail': {'status_code': 404},
'sphinx_domains-detail': {'status_code': 404},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it doesn't matter too much, but to keep consistency here with the other endpoints, I'd say that it should be called sphinxdomains-detail (without the _).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is autogenerated by REST framework.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. It also shouldn't be plural. Fixed.

@ericholscher ericholscher requested review from humitos, stsewd and a team and removed request for humitos February 26, 2019 18:07
@ericholscher
Copy link
Member Author

Would love to get this merged for deploy this week as well, if folks have time to review.

stsewd
stsewd previously approved these changes Feb 26, 2019
readthedocs/projects/tasks.py Outdated Show resolved Hide resolved
humitos
humitos previously approved these changes Feb 27, 2019
Copy link
Member

@humitos humitos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ready to merge to me.

Just wanted to re-raise my comment about consistency on the DRF URLs. I'm not blocking on that, anyway but I think it worth to change it if possible.

@@ -364,7 +364,7 @@ def setUp(self):
'api_webhook_gitlab': {'status_code': 405},
'api_webhook_bitbucket': {'status_code': 405},
'api_webhook_generic': {'status_code': 403},
'domaindata-detail': {'status_code': 404},
'sphinx_domains-detail': {'status_code': 404},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericholscher ericholscher dismissed stale reviews from humitos and stsewd via 4ea9121 February 27, 2019 12:50
@ericholscher ericholscher merged commit 1fd489c into master Feb 27, 2019
@delete-merged-branch delete-merged-branch bot deleted the intersphinx-modeling branch February 27, 2019 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants