Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in Search API show fileCount for datasets #6601 #6623

Merged
merged 1 commit into from
Feb 19, 2020
Merged

Conversation

pdurbin
Copy link
Member

@pdurbin pdurbin commented Feb 6, 2020

What this PR does / why we need it:

Search API users have requested counts of files in datasets.

Which issue(s) this PR closes:

Closes #6601

Special notes for your reviewer:

Despite the noise I made in the issue about reindexing, I went for a database hit, nestled in several lines of other database hits. We already have the datasetVersion in our hands. It seemed pretty quick with my local testing, admittedly only a few files.

Suggestions on how to test this:

Look for "fileCount" in the response for datasets. It might be nice to test a dataset with many files to see if it's slow. A zero is returned, as expected, with no files.

Does this PR introduce a user interface change?:

No.

Is there a release notes update needed for this change?:

I didn't bother since it seems like a small feature. I'm reminded that some day we might want to keep an API change log. See also http://guides.dataverse.org/en/4.19/api/faq.html#is-there-a-changelog-of-api-functionality-that-has-been-added-over-time

Additional documentation:

None. I did update the JSON examples in the Search API docs to show "fileCount".

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.0003%) to 19.459% when pulling 055fa5f on 6601-file-count into c4e6065 on develop.

Copy link
Contributor

@landreev landreev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to ask about the decision not to index the number.
Are we always guaranteed to have the dataset/datasetversion db objects instantiated? SearchServiceBean.search() has the boolean retrieveEntities; are we ever calling it with this set to FALSE? If it does get called that way, and the entity is not instantiated, the code in this PR will be safe, it looks like (it will be skipped, so no null pointer exception...)
I guess my question is, are we ok with the fact that the fileCount field is not going to be available if search() is called with the "do not instantiate" option? Is it ever called that way?
[I guess the answer is, we DO always instantiate the entity when search is called in the API context; and this issue is for adding fileCont to the API output only - and not to the search cards on the page?]

Instantiating entity objects IS expensive; I recall we made an effort in the past to keep everything we needed in the search result object indexed, and to avoid instantiating entities, or at least to avoid using unoptimized .find() methods to do so. It appears that we have abandoned that approach since - and it may be worth revisiting. (This is of course outside the scope for this issue!)

@pdurbin
Copy link
Member Author

pdurbin commented Feb 10, 2020

Are we always guaranteed to have the dataset/datasetversion db objects instantiated?

Yes, we are. Here's how we talk about this in https://github.com/IQSS/dataverse/releases/tag/v4.19

'The boolean parameter query_entities has been removed from the Search API. The former "true" behavior of "whether entities are queried via direct database calls (for developer use)" is now always true.'

This change was introduced in pull request #6441

@landreev
Copy link
Contributor

OK, I remember it now.
We do have some hacky/optimized methods for instantiating entities with custom queries, instead of relying on the expensive .find() methods (we use them to populate the information in the search cards on the dataverse page, for example). We may consider using them for the Search API as well... but it may require some research, or at least testing.
Definitely outside the scope for this.
Thanks.

Yes, we are. Here's how we talk about this in https://github.com/IQSS/dataverse/releases/tag/v4.19

'The boolean parameter query_entities has been removed from the Search API. The former "true" behavior of "whether entities are queried via direct database calls (for developer use)" is now always true.'

This change was introduced in pull request #6441

@kcondon kcondon self-assigned this Feb 10, 2020
@kcondon kcondon merged commit 638ca77 into develop Feb 19, 2020
@kcondon kcondon deleted the 6601-file-count branch February 19, 2020 16:53
@djbrooke djbrooke added this to the 4.20 milestone Feb 19, 2020
@pdurbin
Copy link
Member Author

pdurbin commented Nov 9, 2022

I meant to ask about the decision not to index the number.

Probably we should have indexed the number (of files, fileCount). We now have an open issue about this:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dataset file count in search results from API
5 participants