Accept recap uploads for docket iQuery pages #2374

albertisfu · 2022-11-21T21:51:00Z

As requested in freelawproject/recap#251

This PR adds support for a new recap upload type: IQUERY_PAGE = 12

So now we can store Docket Iquery pages and process them.

As commented in freelawproject/juriscraper#589 a Docket Iquery page can be parsed by Juriscraper DocketHistoryReport.

A new task was added process_docket_iquery_page this task adds or updates the docket mentioned in the uploaded docket Iquery page.

@mlissner I have one question, should we add/update the SOLR index for dockets added/updated via a Docket Iquery page using add_or_update_recap_docket?

I saw that for Dockets and Docket history reports uploads the Solr indexes for all its related documents are updated if there are new docket entries/recap documents, or if the docket date_last_index is older than 1 hour ago.

I also read within add_or_update_recap_docket docstring that if the case name changes all the docket entries must be updated.

Since Docket Iquery pages don't have docket entries and Iquery pages might be uploaded many times (due to they are free). I think we only need to update Solr indexes in case we detected the case name have changed after an Iquery page upload?

mlissner · 2022-11-22T15:53:23Z

I think we only need to update Solr indexes in case we detected the case name have changed after an Iquery page upload?

Yeah, I think that's a good way to do it.

mlissner · 2022-11-22T15:55:39Z

cl/recap/tasks.py

+            mark_pq_status(pq, msg, PROCESSING_STATUS.QUEUED_FOR_RETRY)
+            raise self.retry(exc=exc)
+
+    report = DocketHistoryReport(map_cl_to_pacer_id(pq.court_id))


As in the PR in Juriscraper, this surprises me.

mlissner · 2022-11-22T15:57:13Z

cl/recap/tasks.py

+@app.task(
+    bind=True, max_retries=3, interval_start=5 * 60, interval_step=5 * 60
+)
+def process_docket_iquery_page(self, pk):


This all looks fine (aside from my one other comment), but did you look around to see if we had other code that'd do this instead? Alternatively, if we have other tasks that are very similar, can/should they be redone to use this code? I just fear repetition, though it's not the end of the world in this case.

mlissner

Back at you. A couple questions.

…ecap extension

albertisfu · 2022-11-22T23:30:26Z

Thanks! now using CaseQuery to parse iquery page uploads.

We have already tasks for CaseQuery like do_case_query_by_pacer_case_id and make_docket_by_iquery but they query PACER to get the iquery page so I think they are not too similar to the new one process_case_query_page that processes the upload from a ProcessingQueue.

I think this could be merged into a task like process_recap_docket_history_report however I'm not sure since right now each upload_type has its own task, but it could be a new task like: process_case_query_page_or_docket_history_report. Does it sound good?

Also, added add_bankruptcy_data_to_docket in case the iquery page contains bankruptcy_data.
Added the logic to detect if this is a new docket or if the case_name has changed, so content_updated is assigned as True in order to add or update in Solr.

Let me know what you think.

mlissner

Looks great. One suggestion. If you think it makes sense, please accept it and merge.

For your question about using the docket history parser, that's interesting. I have to admit I never noticed that the docket history page has the same info as the case query page! I was confused why you were using the docket history parser and had to go look at a docket history page before I figured out what you were thinking!

I think for now there's no need to merge them. I kind of like that they're isolated in case there's some minor difference we don't know about.

I also think we could refactor some of our other case_query code to generate a processing queue item (and then process it), but for now it doesn't seem worth the effort.

To summarize:

One suggestion for you to consider.
Woah, Case Query and Docket History are same?
We could refactor code, but let's worry about that another time.

Thank you.

mlissner · 2022-11-23T06:29:12Z

cl/recap/tasks.py

+    content_updated = False
+    current_case_name = d.case_name
+    d.add_recap_source()
+    update_docket_metadata(d, data)
+
+    if current_case_name != d.case_name or not d.pk:
+        # This docket should be added to Solr or updated since is new or the
+        # case name has changed.
+        content_updated = True


Two things. The search engine only works if we have RECAPDocument items in a docket. Without a RECAPDocument object, we can't add it to Solr, unfortunately. All case query pages do not have enough information to make a RECAP Document for the docket, so if this is a new docket, that's great, BUT no reason to add it to solr (it lacks any RECAP Documents).

Second, we should put content_updated closer to where it's doing work.

So:

Suggested change

content_updated = False

current_case_name = d.case_name

d.add_recap_source()

update_docket_metadata(d, data)

if current_case_name != d.case_name or not d.pk:

# This docket should be added to Solr or updated since is new or the

# case name has changed.

content_updated = True

current_case_name = d.case_name

d.add_recap_source()

update_docket_metadata(d, data)

content_updated = False

if current_case_name != d.case_name:

# Update the docket in PACER since is new or the

# case name has changed.

content_updated = True

…ontains docket entries

albertisfu · 2022-11-23T17:23:44Z

Thanks for your suggestion!

I applied it, just changed it a bit since we need to only update in SOLR dockets that are not new, we need to confirm it has a pk already. Due to if it's new the current_case_name would be blank that's different from the updated case_name.

Since you commented that the docket only should be updated in SOLR if it contains RECAP Documents I added an additional condition to check if the docket contains docket entries.

content_updated = False
if current_case_name != d.case_name and d.pk:
    if d.docket_entries.exists():
        content_updated = True

Does it make sense?

About your question:

Case Query and Docket History are same?

Yeah, seems that they are pretty similar, the difference I could see is that the Docket History report contains some docket entries.

mlissner · 2022-11-27T18:22:39Z

Great. Merging, thank you.

mlissner reviewed Nov 22, 2022

View reviewed changes

albertisfu added 2 commits November 22, 2022 17:00

feat(recap): Accept recap uploads for docket iQuery pages

9832fb9

fix(recap): Use CaseQuery report to parse iquery pages uploaded via r…

5aaf1fd

…ecap extension

albertisfu force-pushed the iquery-pages-support-for-recap-uploads branch from 42d47e6 to 5aaf1fd Compare November 22, 2022 23:01

mlissner reviewed Nov 23, 2022

View reviewed changes

fix(recap): Only update case query report in SOLR if is not new and c…

c166e50

…ontains docket entries

mlissner merged commit 188c575 into main Nov 27, 2022

mlissner deleted the iquery-pages-support-for-recap-uploads branch November 27, 2022 18:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accept recap uploads for docket iQuery pages #2374

Accept recap uploads for docket iQuery pages #2374

albertisfu commented Nov 21, 2022

mlissner commented Nov 22, 2022

mlissner Nov 22, 2022

mlissner Nov 22, 2022

mlissner left a comment

albertisfu commented Nov 22, 2022

mlissner left a comment

mlissner Nov 23, 2022

albertisfu commented Nov 23, 2022

mlissner commented Nov 27, 2022

Accept recap uploads for docket iQuery pages #2374

Accept recap uploads for docket iQuery pages #2374

Conversation

albertisfu commented Nov 21, 2022

mlissner commented Nov 22, 2022

mlissner Nov 22, 2022

Choose a reason for hiding this comment

mlissner Nov 22, 2022

Choose a reason for hiding this comment

mlissner left a comment

Choose a reason for hiding this comment

albertisfu commented Nov 22, 2022

mlissner left a comment

Choose a reason for hiding this comment

mlissner Nov 23, 2022

Choose a reason for hiding this comment

albertisfu commented Nov 23, 2022

mlissner commented Nov 27, 2022