Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4230 search query suggested changes #4479

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

grossir
Copy link
Contributor

@grossir grossir commented Sep 21, 2024

Some details and questions:

  1. I didn't want to commit directly on Sergei's branch, so I intended for this to be PR over that branch, but I see Github has interpreted it as a PR on main, even when I branched from it

  2. Setting on_delete=models.CASCADE to the User FK follows this comment from the original issue:

When accounts are deleted, the history must be nuked, not just anonymized.

  1. Should we really use AbstractDateTimeModel as base class for SearchQuery? Are we ever going to modify the SearchQuery once it happens? In case we don't, there is no use for the date_modified field and it's index; and expecting this new table to have so many rows, we would save some space / speed

  2. About computing the "query_time_ms" using time.perf_counter_ns, it seems to be the most precise. Check the section in Python docs https://docs.python.org/3/library/time.html#time.perf_counter.

  3. Also, I noted that with these changes, we will only be capturing search queries from the main homepage/search for Case Law; it won't log queries for RECAP archive, citation search, parenthetical search, etc. Is it of interes to log that?

  4. I took some freedom to refactor the search.views.show_results function, it could use early returns to reduce indentation levels, which were confusing since the if/else clauses are so long.

  5. I don't know how to compute the hit_cache field, we should probably ask Alberto how to know if ES or Solr were using the cache. I see that the do_es_search functions have a cache argument, but it's not used on the show_results function. For now, I have set it to False always

legaltextai and others added 2 commits September 18, 2024 20:59
 changes made to models.py and views.py to save user queries

 Fixes: #4230
Copy link

sentry-io bot commented Sep 21, 2024

🔍 Existing Issues For Review

Your pull request is modifying functions with the following pre-existing issues:

📄 File: cl/search/views.py

Function Unhandled Issue
show_results OpinionCluster.MultipleObjectsReturned: get() returned more than one OpinionCluster -- it returned 2! ...
Event Count: 1

Did you find this useful? React with a 👍 or 👎

Comment on lines +501 to +505
"{path}?next={next}{encoded_params}".format(
path=reverse("sign-in"),
next=request.path,
encoded_params=quote(f"?{request.GET.urlencode()}"),
)

Check warning

Code scanning / CodeQL

URL redirection from remote source Medium

Untrusted URL redirection depends on a
user-provided value
.

Copilot Autofix AI 11 days ago

Copilot could not generate an autofix suggestion

Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.

- Add tests for SearchQuery creation
- Update user ForeignKey to on_delete=models.CASCADE, to match specs
@grossir grossir force-pushed the 4230_search_query_suggested_changes branch from f3d2abf to 4652fe6 Compare September 21, 2024 01:07
@mlissner
Copy link
Member

Should we really use AbstractDateTimeModel as base class for SearchQuery?

Yeah, you're right. Let's just do a date_created field.

time.perf_counter_ns

I'm not sure, but I think that might not do what we want. Do you know if it keeps counting while we want for network requests? I think the Python speed isn't very interesting and the important part is the Elastic speed.

Also: Does counting nanoseconds take more CPU time? If so, it's not really worth doing. I think they'll be harder to reason about too.

it won't log queries for RECAP archive, citation search, parenthetical search, etc. Is it of interes to log that?

Yes. We should log all this stuff and the API too. We might want to add a boolean for API requests though, so they're separate from the rest. Some API users do really weird (and slow) stuff compared to actual users.

hit_cache

We'll need to get that from the downstream function. Maybe it can return whether the cache was used?

Thank you, Gianfranco!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants